File size: 2,959 Bytes
f467091 5198f5c 856eccb 499e3bf 1f3af95 499e3bf 0688674 499e3bf fc11dd7 499e3bf fc11dd7 499e3bf 1e3415b 499e3bf 7458a9b 6777adb fc11dd7 f158b54 6777adb 7b8f31a 499e3bf 1f3af95 fb56a5a 1f3af95 fb56a5a 1f3af95 fb56a5a 1f3af95 fb56a5a 1f3af95 fb56a5a 1f3af95 fb56a5a 1f3af95 fb56a5a 499e3bf 1f3af95 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
language:
- ru
license: apache-2.0
---
# FRED-T5 1.7B (Full-scale Russian Enhanced Denoisers T5)
Model was trained by [SberDevices](https://sberdevices.ru/).
Architecture based on T5.
It has 24 layers and 1536 hidden size. More details in config.json.
The model trained on a mixture of 7 denoisers like UL2 with several differences (https://arxiv.org/abs/2205.05131).
It was trained on Russian language corpus (300GB). The dataset is the same as for ruT5 models.
Bbpe tokenizer. 50257 + special tokens 107. Prefix tokens: '\<LM\>', '\<SC1>',.. '\<SC6>'
First half of the time model trained on the small part of all dataset (1%,3GB) and without prefixes in each task.
For RSG, we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
RSG submit here https://russiansuperglue.com/login/submit_info/1936
Total training time was around 45 days on 112 A100 GPUs.
## Usage (HuggingFace Models Repository)
```python
import torch
from transformers import GPT2Tokenizer, T5ForConditionalGeneration
tokenizer = GPT2Tokenizer.from_pretrained('ai-forever/FRED-T5-1.7B',eos_token='</s>')
model = T5ForConditionalGeneration.from_pretrained(('ai-forever/FRED-T5-1.7B')
device='cuda'
model.to(device)
#Prefix <LM>
lm_text='<LM>Принялся Кутузов рассказывать свою историю как он сюда попал. Началось'
input_ids=torch.tensor([tokenizer.encode(prefix_LM+lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True)
print(tokenizer.decode(outputs[0][1:]))
# print result: с того, что он был в армии, служил в артиллерии</s>.
#Prefix <SC1>
lm_text='<SC1>Принялся Кутузов рассказывать свою историю <extra_id_0>. Началось с того, что он был в армии, служил в артиллерии.'
input_ids=torch.tensor([tokenizer.encode(prefix_LM+lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True)
print(tokenizer.decode(outputs[0][1:]))
#print result: '<extra_id_0> с самого начала</s>'
# Prefix <SC5>
lm_text='<SC5>Принялся Кутузов рассказывать свою историю <extra_id_0>. Началось с того, что он был в армии, служил в артиллерии.'
input_ids=torch.tensor([tokenizer.encode(prefix_LM+lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True)
tokenizer.decode(outputs[0][1:])
#print result: '<extra_id_0>, и она оказалась очень длинной</s>'
```
# Authors
+ NLP core team RnD [Telegram channel](https://t.me/nlpcoreteam):
+ Dmitry Zmitrovich
+ Andrei Kalmykov
+ Vitaly Kadulin
+ Mikhail Novikov
|