TimeRobber
commited on
Commit
•
0c1a4cf
1
Parent(s):
88a280b
Update README.md
Browse files
README.md
CHANGED
@@ -105,29 +105,38 @@ language:
|
|
105 |
- zu
|
106 |
datasets:
|
107 |
- mc4
|
108 |
-
- xP3
|
109 |
---
|
110 |
|
111 |
-
|
112 |
-
|
113 |
-
Multilingual Text-to-Text Transfer Transformer Zero (mt0)
|
114 |
-
Version 1. / 28 Octo 2022
|
115 |
-
|
116 |
-
// TODO @thomasw21
|
117 |
-
Current Checkpoint:
|
118 |
-
|
119 |
-
// TODO @thomasw21
|
120 |
-
Total seen tokens:
|
121 |
|
122 |
---
|
123 |
|
124 |
-
#
|
125 |
|
126 |
mT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:
|
127 |
|
128 |
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.
|
129 |
|
130 |
-
mt5 was then finetuned on
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
131 |
|
132 |
## Basics
|
133 |
*This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
|
@@ -150,14 +159,11 @@ mt5 was then finetuned on xP3 to obtain mt0.
|
|
150 |
|
151 |
**Release Date Estimate:** Friday, 28.October.2022
|
152 |
|
153 |
-
|
154 |
-
**Send Questions to:**
|
155 |
|
156 |
-
// TODO @thomas21
|
157 |
-
**Cite as:**
|
158 |
-
|
159 |
-
// TODO @thomas21
|
160 |
**Funded by:**
|
|
|
|
|
161 |
|
162 |
</details>
|
163 |
|
@@ -173,7 +179,7 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
|
|
173 |
|
174 |
### Model Architecture and Objective
|
175 |
|
176 |
-
* Same architecture as [mt5
|
177 |
|
178 |
* Encoder-decoder architecture
|
179 |
|
@@ -205,15 +211,11 @@ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bi
|
|
205 |
## Training Data
|
206 |
*This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
|
207 |
|
208 |
-
It was pretrained on mC4 and then finetuned on xP3
|
209 |
|
210 |
### Languages
|
211 |
|
212 |
// TODO @thomasw21: Copy list from mt5
|
213 |
-
|
214 |
-
### Preprocessing
|
215 |
-
|
216 |
-
// TODO @thomasw21
|
217 |
|
218 |
## Speeds, Sizes, Times
|
219 |
|
@@ -253,10 +255,16 @@ The evaluation supercomputer, [Jean Zay](http://www.idris.fr/eng/jean-zay/), use
|
|
253 |
This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
|
254 |
|
255 |
```python
|
256 |
-
from transformers import
|
257 |
|
258 |
checkpoint = "..." # "checkpoint_1006000" for example
|
259 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
260 |
```
|
261 |
|
262 |
## Intended Use
|
@@ -408,3 +416,10 @@ model = AutoModel.from_pretrained("bigscience/mt0-xxl", revision=checkpoint, tor
|
|
408 |
## Original checkpoints
|
409 |
|
410 |
The checkpoints in this repo correspond to the HuggingFace Transformers format. We'll provide T5X checkpoints as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
- zu
|
106 |
datasets:
|
107 |
- mc4
|
108 |
+
- bigscience/xP3
|
109 |
---
|
110 |
|
111 |
+
Multilingual Text-to-Text Transfer Transformer Zero (MT0)
|
112 |
+
Version 1. / 28 October 2022
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
|
114 |
---
|
115 |
|
116 |
+
# Models
|
117 |
|
118 |
mT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:
|
119 |
|
120 |
Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.
|
121 |
|
122 |
+
mt5 was then finetuned on:
|
123 |
+
- [xP3](https://huggingface.co/bigscience/xP3) to obtain [mt0-small](https://huggingface.co/bigscience/mt0-small)/[mt0-base](https://huggingface.co/bigscience/mt0-base)/[mt0-large](https://huggingface.co/bigscience/mt0-large)/[mt0-xl](https://huggingface.co/bigscience/mt0-xl)/[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)
|
124 |
+
- [P3](https://huggingface.co/bigscience/P3) to obtain [mt0-p3-xxl](https://huggingface.co/bigscience/mt0-p3-xxl)
|
125 |
+
- [xP3mt](https://huggingface.co/bigscience/xP3mt) to obtain [mt0-mt-xxl](https://huggingface.co/bigscience/mt5-mt-xxl)
|
126 |
+
|
127 |
+
## Model Flavors
|
128 |
+
|
129 |
+
Multilingual model capable of following user instructions in a variety of languages. Together with our paper [TODO: LINK], we release the following models:
|
130 |
+
|
131 |
+
----
|
132 |
+
- [mt0-small](https://huggingface.co/bigscience/mt0-small): 300M parameters multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/bigscience/xP3)
|
133 |
+
- [mt0-base](https://huggingface.co/bigscience/mt0-base): 580M parameters multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/bigscience/xP3)
|
134 |
+
- [mt0-large](https://huggingface.co/bigscience/mt0-large): 1.2B parameters multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/bigscience/xP3)
|
135 |
+
- [mt0-xl](https://huggingface.co/bigscience/mt0-xl): 3.7B parameters multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/bigscience/xP3)
|
136 |
+
- [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3)
|
137 |
+
----
|
138 |
+
- [mt0-p3-xxl](https://huggingface.co/bigscience/mt0-p3-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/bigscience/P3)
|
139 |
+
- [mt0-mt-xxl](https://huggingface.co/bigscience/mt5-mt-xxl): 13B parameters multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3mt](https://huggingface.co/bigscience/xP3mt)
|
140 |
|
141 |
## Basics
|
142 |
*This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
|
|
|
159 |
|
160 |
**Release Date Estimate:** Friday, 28.October.2022
|
161 |
|
162 |
+
**Send Questions to:** niklas@huggingface.co
|
|
|
163 |
|
|
|
|
|
|
|
|
|
164 |
**Funded by:**
|
165 |
+
* The French government.
|
166 |
+
* Hugging Face ([website](https://huggingface.co)).
|
167 |
|
168 |
</details>
|
169 |
|
|
|
179 |
|
180 |
### Model Architecture and Objective
|
181 |
|
182 |
+
* Same architecture as [mt5](https://arxiv.org/abs/2010.11934)
|
183 |
|
184 |
* Encoder-decoder architecture
|
185 |
|
|
|
211 |
## Training Data
|
212 |
*This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
|
213 |
|
214 |
+
It was pretrained on mC4 and then finetuned on xP3, P3 or xP3mt.
|
215 |
|
216 |
### Languages
|
217 |
|
218 |
// TODO @thomasw21: Copy list from mt5
|
|
|
|
|
|
|
|
|
219 |
|
220 |
## Speeds, Sizes, Times
|
221 |
|
|
|
255 |
This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
|
256 |
|
257 |
```python
|
258 |
+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
259 |
|
260 |
checkpoint = "..." # "checkpoint_1006000" for example
|
261 |
+
model_name = "bigscience/mt0-xxl"
|
262 |
+
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, revision=checkpoint, torch_dtype="auto", device_map="auto")
|
263 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, revision=checkpoint)
|
264 |
+
|
265 |
+
inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
|
266 |
+
outputs = model.generate(inputs)
|
267 |
+
print(tokenizer.decode(outputs[0]))
|
268 |
```
|
269 |
|
270 |
## Intended Use
|
|
|
416 |
## Original checkpoints
|
417 |
|
418 |
The checkpoints in this repo correspond to the HuggingFace Transformers format. We'll provide T5X checkpoints as well.
|
419 |
+
|
420 |
+
# Citing MT0
|
421 |
+
|
422 |
+
Please use the following bibtex entry to cite T0:
|
423 |
+
```bibtex
|
424 |
+
TODO @niklas
|
425 |
+
```
|