--- datasets: - bigscience/xP3 - mc4 license: apache-2.0 language: - af - am - ar - az - be - bg - bn - ca - ceb - co - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fil - fr - fy - ga - gd - gl - gu - ha - haw - hi - hmn - ht - hu - hy - ig - is - it - iw - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lb - lo - lt - lv - mg - mi - mk - ml - mn - mr - ms - mt - my - ne - nl - 'no' - ny - pa - pl - ps - pt - ro - ru - sd - si - sk - sl - sm - sn - so - sq - sr - st - su - sv - sw - ta - te - tg - th - tr - uk - und - ur - uz - vi - xh - yi - yo - zh - zu pipeline_tag: text2text-generation widget: - text: >- 一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the previous review as positive, neutral or negative? example_title: zh-en sentiment - text: 一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评? example_title: zh-zh sentiment - text: Suggest at least five related search terms to "Mạng neural nhân tạo". example_title: vi-en query - text: >- Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels». example_title: fr-fr query - text: Explain in a sentence in Telugu what is backpropagation in neural networks. example_title: te-en qa - text: Why is the sky blue? example_title: en-en qa - text: >- Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is "Heroes Come in All Shapes and Sizes". Story (in Spanish): example_title: es-en fable - text: >- Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is "Violence is the last refuge of the incompetent". Fable (in Hindi): example_title: hi-en fable model-index: - name: mt0-xl results: - task: type: Coreference resolution dataset: type: winogrande name: Winogrande XL (xl) config: xl split: validation revision: a80f460359d1e9a67c006011c94de42a8759430c metrics: - type: Accuracy value: 52.49 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (en) config: en split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 61.89 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (fr) config: fr split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 59.04 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (jp) config: jp split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 60.27 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (pt) config: pt split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 66.16 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (ru) config: ru split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 59.05 - task: type: Coreference resolution dataset: type: Muennighoff/xwinograd name: XWinograd (zh) config: zh split: test revision: 9dd5ea5505fad86b7bedad667955577815300cee metrics: - type: Accuracy value: 62.9 - task: type: Natural language inference dataset: type: anli name: ANLI (r1) config: r1 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 38.2 - task: type: Natural language inference dataset: type: anli name: ANLI (r2) config: r2 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 34.8 - task: type: Natural language inference dataset: type: anli name: ANLI (r3) config: r3 split: validation revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094 metrics: - type: Accuracy value: 39 - task: type: Natural language inference dataset: type: super_glue name: SuperGLUE (cb) config: cb split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 85.71 - task: type: Natural language inference dataset: type: super_glue name: SuperGLUE (rte) config: rte split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 78.7 - task: type: Natural language inference dataset: type: xnli name: XNLI (ar) config: ar split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 51.85 - task: type: Natural language inference dataset: type: xnli name: XNLI (bg) config: bg split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 54.18 - task: type: Natural language inference dataset: type: xnli name: XNLI (de) config: de split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 54.78 - task: type: Natural language inference dataset: type: xnli name: XNLI (el) config: el split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 53.78 - task: type: Natural language inference dataset: type: xnli name: XNLI (en) config: en split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 56.83 - task: type: Natural language inference dataset: type: xnli name: XNLI (es) config: es split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 54.78 - task: type: Natural language inference dataset: type: xnli name: XNLI (fr) config: fr split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 54.22 - task: type: Natural language inference dataset: type: xnli name: XNLI (hi) config: hi split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 50.24 - task: type: Natural language inference dataset: type: xnli name: XNLI (ru) config: ru split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 53.09 - task: type: Natural language inference dataset: type: xnli name: XNLI (sw) config: sw split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 49.6 - task: type: Natural language inference dataset: type: xnli name: XNLI (th) config: th split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 52.13 - task: type: Natural language inference dataset: type: xnli name: XNLI (tr) config: tr split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 50.56 - task: type: Natural language inference dataset: type: xnli name: XNLI (ur) config: ur split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 47.91 - task: type: Natural language inference dataset: type: xnli name: XNLI (vi) config: vi split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 53.21 - task: type: Natural language inference dataset: type: xnli name: XNLI (zh) config: zh split: validation revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16 metrics: - type: Accuracy value: 50.64 - task: type: Program synthesis dataset: type: openai_humaneval name: HumanEval config: None split: test revision: e8dc562f5de170c54b5481011dd9f4fa04845771 metrics: - type: Pass@1 value: 0 - type: Pass@10 value: 0 - type: Pass@100 value: 0 - task: type: Sentence completion dataset: type: story_cloze name: StoryCloze (2016) config: '2016' split: validation revision: e724c6f8cdf7c7a2fb229d862226e15b023ee4db metrics: - type: Accuracy value: 79.1 - task: type: Sentence completion dataset: type: super_glue name: SuperGLUE (copa) config: copa split: validation revision: 9e12063561e7e6c79099feb6d5a493142584e9e2 metrics: - type: Accuracy value: 72 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (et) config: et split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 70 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (ht) config: ht split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 66 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (id) config: id split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 71 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (it) config: it split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 70 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (qu) config: qu split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 56 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (sw) config: sw split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 53 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (ta) config: ta split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 64 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (th) config: th split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 60 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (tr) config: tr split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 58 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (vi) config: vi split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 68 - task: type: Sentence completion dataset: type: xcopa name: XCOPA (zh) config: zh split: validation revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187 metrics: - type: Accuracy value: 65 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (ar) config: ar split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 70.09 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (es) config: es split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 77.17 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (eu) config: eu split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 69.03 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (hi) config: hi split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 71.08 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (id) config: id split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 75.71 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (my) config: my split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 65.65 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (ru) config: ru split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 74.85 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (sw) config: sw split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 71.14 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (te) config: te split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 68.89 - task: type: Sentence completion dataset: type: Muennighoff/xstory_cloze name: XStoryCloze (zh) config: zh split: validation revision: 8bb76e594b68147f1a430e86829d07189622b90d metrics: - type: Accuracy value: 72.93 --- ![xmtf](https://github.com/bigscience-workshop/xmtf/blob/master/xmtf_banner.png?raw=true) # Table of Contents 1. [Model Summary](#model-summary) 2. [Use](#use) 3. [Limitations](#limitations) 4. [Training](#training) 5. [Evaluation](#evaluation) 7. [Citation](#citation) # Model Summary > We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find our resulting models capable of crosslingual generalization to unseen tasks & languages. - **Repository:** [bigscience-workshop/xmtf](https://github.com/bigscience-workshop/xmtf) - **Paper:** [Crosslingual Generalization through Multitask Finetuning](https://arxiv.org/abs/2211.01786) - **Point of Contact:** [Niklas Muennighoff](mailto:niklas@hf.co) - **Languages:** Refer to [mc4](https://huggingface.co/datasets/mc4) for pretraining & [xP3](https://huggingface.co/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages. - **BLOOMZ & mT0 Model Family:**
Multitask finetuned on xP3. Recommended for prompting in English. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | mt0-small | mt0-base | mt0-large | mt0-xl | mt0-xxl | bloomz-560m | bloomz-1b1 | bloomz-1b7 | bloomz-3b | bloomz-7b1 | bloomz |
Multitask finetuned on xP3mt. Recommended for prompting in non-English. | |||||||||||
Finetuned Model | mt0-xxl-mt | bloomz-7b1-mt | bloomz-mt | Multitask finetuned on P3. Released for research purposes only. Strictly inferior to above models! | |||||||
Finetuned Model | mt0-xxl-p3 | bloomz-7b1-p3 | bloomz-p3 | Original pretrained checkpoints. Not recommended. | |||||||
Pretrained Model | mt5-small | mt5-base | mt5-large | mt5-xl | mt5-xxl | bloom-560m | bloom-1b1 | bloom-1b7 | bloom-3b | bloom-7b1 | bloom |