sberbank-ai commited on
Commit
6777adb
1 Parent(s): 856eccb

Update README.md

Browse files

![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)

Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -10,12 +10,18 @@ Architecture based on T5.
10
 
11
  It has 24 layers and 1536 hidden size.
12
 
13
- Model was trained on a mixture of 7 denoisers like UL2 with several differences .
14
 
15
  It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models.
16
 
17
- Bbpe tokenizer. First half of the time model was trained on the small part of all datasets (1%).
18
 
 
 
 
 
 
 
19
 
20
  We continue to experiment...
21
 
 
10
 
11
  It has 24 layers and 1536 hidden size.
12
 
13
+ Model trained on a mixture of 7 denoisers like UL2 with several differences .
14
 
15
  It trained on Russian language corpus (300GB). Dataset is the same as for ruT5 models.
16
 
17
+ Bbpe tokenizer.
18
 
19
+ First half of the time model trained on the small part of all datasets (1%,3GB) and without prefixes in each task.
20
+
21
+ For RSG we trained as described in the T5 paper. First, we trained multitask for all tasks. Then we took the best checkpoint for the task and trained it further.
22
+
23
+ Training loss:
24
+ ![Screenshot 2023-01-21 at 11.36.52.png](https://s3.amazonaws.com/moonup/production/uploads/1674290304538-5f91b1208a61a359f44e1851.png)
25
 
26
  We continue to experiment...
27