|
--- |
|
language: |
|
- ru |
|
--- |
|
|
|
# FRED-T5 1.7B (Full-scale Russian Enhanced Denoisers T5) |
|
|
|
Architecture based on T5. |
|
|
|
It has 24 layers and 1536 hidden size. |
|
|
|
Model was trained on a mixture of 7 denoisers like UL2 with several differences . |
|
|
|
It trained on Russian language corpus (300GB). The dataset is the same as for ruT5 models. |
|
|
|
Bbpe tokenizer. First half of the time model was trained on the small part of all datasets (1%). |
|
|
|
|
|
We continue to experiment... |
|
|
|
We'll tell you more and release checkpoint to the public soon. |