File size: 527 Bytes
f467091
5198f5c
 
499e3bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
language:
- ru
---

# FRED-T5 1.7B (Full-scale Russian Enhanced Denoisers T5) 

Architecture based on T5. 

It has 24 layers and 1536 hidden size.

Model was trained on a mixture of 7 denoisers like UL2 with several differences .

It trained on Russian language corpus (300GB).   The dataset is the same as for ruT5 models. 

Bbpe tokenizer. First half of the time model was trained on the small part of all datasets (1%).  


We continue to experiment... 

We'll tell you more  and release  checkpoint  to the public soon.