stefan-it commited on
Commit
0685c7d
1 Parent(s): a67d2ab

readme: add some interesting details about umT5

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md CHANGED
@@ -4,6 +4,23 @@ license: mit
4
 
5
  # umT5 Small
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  Overview of umT5 model integration:
8
 
9
  * Transformers Integration is on-going, see this awesome [PR](https://github.com/huggingface/transformers/pull/22626) by @agemagician!
 
4
 
5
  # umT5 Small
6
 
7
+ The UMT5 model was proposed in [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi)
8
+ by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
9
+
10
+ The abstract from the paper is the following:
11
+
12
+ *Pretrained multilingual large language models have typically used heuristic temperature-based sampling to balance
13
+ between different languages. However previous work has not systematically evaluated the efficacy of different
14
+ pretraining language distributions across model scales. In this paper, we propose a new sampling method, UniMax,
15
+ that delivers more uniform coverage of head languages while mitigating overfitting on tail languages by explicitly
16
+ capping the number of repeats over each language's corpus. We perform an extensive series of ablations testing a
17
+ range of sampling strategies on a suite of multilingual benchmarks, while varying model scale. We find that UniMax
18
+ outperforms standard temperature-based sampling, and the benefits persist as scale increases. As part of our
19
+ contribution, we release: (i) an improved and refreshed mC4 multilingual corpus consisting of 29 trillion characters
20
+ across 107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained with UniMax sampling.*
21
+
22
+ # Integration into Transformers
23
+
24
  Overview of umT5 model integration:
25
 
26
  * Transformers Integration is on-going, see this awesome [PR](https://github.com/huggingface/transformers/pull/22626) by @agemagician!