bloomz / README.md
Muennighoff's picture
Update README.md
028d295
|
raw
history blame
8.53 kB
---
datasets:
- bigscience/xP3
license: bigscience-bloom-rail-1.0
language:
- ak
- ar
- as
- bm
- bn
- ca
- code
- en
- es
- eu
- fon
- fr
- gu
- hi
- id
- ig
- ki
- kn
- lg
- ln
- ml
- mr
- ne
- nso
- ny
- or
- pa
- pt
- rn
- rw
- sn
- st
- sw
- ta
- te
- tn
- ts
- tum
- tw
- ur
- vi
- wo
- xh
- yo
- zh
- zu
programming_language:
- C
- C++
- C#
- Go
- Java
- JavaScript
- Lua
- PHP
- Python
- Ruby
- Rust
- Scala
- TypeScript
pipeline_tag: text-generation
widget:
- text: "一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the previous review as positive, neutral or negative?"
example_title: "zh-en sentiment"
- text: "一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评?"
example_title: "zh-zh sentiment"
- text: "Suggest at least five related search terms to \"Mạng neural nhân tạo\"."
example_title: "vi-en query"
- text: "Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels»."
example_title: "fr-fr query"
- text: "Explain in a sentence in Telugu what is backpropagation in neural networks."
example_title: "te-en qa"
- text: "Why is the sky blue?"
example_title: "en-en qa"
- text: "Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is \"Heroes Come in All Shapes and Sizes\". Story (in Spanish):"
example_title: "es-en fable"
- text: "Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is \"Violence is the last refuge of the incompetent\". Fable (in Hindi):"
example_title: "hi-en fable"
---
# Table of Contents
1. [Model Summary](#model=summary)
2. [Use](#use)
3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
4. [Training Details](#training-details)
5. [Evaluation](#evaluation)
6. [Environmental Impact](#environmental-impact)
7. [Citation](#citation)
8. [Model Card Authors](#model-card-authors)
9. [How To Get Started With the Model](#how-to-get-started-with-the-model)
# Model Summary
> We present BLOOMZ & mT0, a family of models capable of following human instructions in hundreds of languages. By finetuning large BLOOM & mT5 pretrained multilingual language models on our multilingual task mixture (xP3), we discover various generalization properties of our finetuned models acrosss tasks and languages.
- **Repository:** [bigscience-workshop/xmtf](https://github.com/bigscience-workshop/xmtf)
- **Paper:** [TODO]
- **Funded by:** The French government & Hugging Face
- **Point of Contact:** [Niklas Muennighoff](mailto:niklas@hf.co)
- **BLOOMZ & mT0 Model Family:**
|Name|Explanation|
|----|-----------|
|[bloomz-560m](https://huggingface.co/bigscience/bloomz-560m)| 560M parameter multitask finetuned version of [bloom-560m](https://huggingface.co/bigscience/bloom-560m) on [xP3](https://huggingface.co/bigscience/xP3)|
|[bloomz-1b1](https://huggingface.co/bigscience/bloomz-1b1)| 1.1B parameter multitask finetuned version of [bloom-1b1](https://huggingface.co/bigscience/bloom-1b1) on [xP3](https://huggingface.co/bigscience/xP3)|
|[bloomz-1b7](https://huggingface.co/bigscience/bloomz-1b7)| 1.7B parameter multitask finetuned version of [bloom-1b7](https://huggingface.co/bigscience/bloom-1b7) on [xP3](https://huggingface.co/bigscience/xP3)|
|[bloomz-3b](https://huggingface.co/bigscience/bloomz-3b)| 3B parameter multitask finetuned version of [bloom-3b](https://huggingface.co/bigscience/bloom-3b) on [xP3](https://huggingface.co/bigscience/xP3)|
|[bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/bigscience/xP3)|
|[bloomz](https://huggingface.co/bigscience/bloomz)|176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/bigscience/xP3)|
|||
|[bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt)|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). **Better than [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1) when prompting in non-English**|
|[bloomz-mt](https://huggingface.co/bigscience/bloomz-mt)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). **Better than [bloomz](https://huggingface.co/bigscience/bloomz) when prompting in non-English**|
|||
|[bloomz-7b1-p3](https://huggingface.co/bigscience/bloomz-7b1)| 7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [P3](https://huggingface.co/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)**|
|[bloomz-p3](https://huggingface.co/bigscience/bloomz)| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [P3](https://huggingface.co/bigscience/P3). **Released for research purposes, performance is inferior to [bloomz](https://huggingface.co/bigscience/bloomz)**|
|||
|||
|[mt0-small](https://huggingface.co/bigscience/mt0-xxl)|300M parameter multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/bigscience/xP3)|
|[mt0-base](https://huggingface.co/bigscience/mt0-xxl)|580M parameter multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/bigscience/xP3)|
|[mt0-large](https://huggingface.co/bigscience/mt0-xxl)|1.2B parameter multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/bigscience/xP3)|
|[mt0-xl](https://huggingface.co/bigscience/mt0-xxl)|3.7B parameter multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/bigscience/xP3)|
|[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3)|
|||
|[mt0-xxl-mt](https://huggingface.co/bigscience/mt0-xxl-mt)|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). **Better than [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl) when prompting in non-English**|
|||
|[mt0-xxl-p3](https://huggingface.co/bigscience/mt0-xxl-p3)| 13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/bigscience/P3). **Released for research purposes, performance is inferior to [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)**|
|----|-----------|
# Intended uses
You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask *"Translate this to Chinese: Je t'aime."*, and the model will hopefully generate *"我爱你"*.
# How to use
Here is how to use the model in PyTorch:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-560m")
inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```
To use another checkpoint, replace the path in `AutoTokenizer` and `AutoModelForCausalLM`.
**Note: 176B models are trained with bfloat16, while smaller models are trained with fp16. We recommend using the same precision type or fp32 at inference**
# Limitations
- Large model size may require large computational resources
- High performance variance depending on the prompt
# BibTeX entry and citation info
```bibtex
TODO
```