bloomz / README.md

Update README.md

028d295 about 2 years ago

8.53 kB

	---
	datasets:
	- bigscience/xP3
	license: bigscience-bloom-rail-1.0
	language:
	- ak
	- ar
	- as
	- bm
	- bn
	- ca
	- code
	- en
	- es
	- eu
	- fon
	- fr
	- gu
	- hi
	- id
	- ig
	- ki
	- kn
	- lg
	- ln
	- ml
	- mr
	- ne
	- nso
	- ny
	- or
	- pa
	- pt
	- rn
	- rw
	- sn
	- st
	- sw
	- ta
	- te
	- tn
	- ts
	- tum
	- tw
	- ur
	- vi
	- wo
	- xh
	- yo
	- zh
	- zu
	programming_language:
	- C
	- C++
	- C#
	- Go
	- Java
	- JavaScript
	- Lua
	- PHP
	- Python
	- Ruby
	- Rust
	- Scala
	- TypeScript
	pipeline_tag: text-generation
	widget:
	- text: "一个传奇的开端，一个不灭的神话，这不仅仅是一部电影，而是作为一个走进新时代的标签，永远彪炳史册。Would you rate the previous review as positive, neutral or negative?"
	example_title: "zh-en sentiment"
	- text: "一个传奇的开端，一个不灭的神话，这不仅仅是一部电影，而是作为一个走进新时代的标签，永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评？"
	example_title: "zh-zh sentiment"
	- text: "Suggest at least five related search terms to \"Mạng neural nhân tạo\"."
	example_title: "vi-en query"
	- text: "Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels»."
	example_title: "fr-fr query"
	- text: "Explain in a sentence in Telugu what is backpropagation in neural networks."
	example_title: "te-en qa"
	- text: "Why is the sky blue?"
	example_title: "en-en qa"
	- text: "Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is \"Heroes Come in All Shapes and Sizes\". Story (in Spanish):"
	example_title: "es-en fable"
	- text: "Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is \"Violence is the last refuge of the incompetent\". Fable (in Hindi):"
	example_title: "hi-en fable"
	---

	# Table of Contents

	1. [Model Summary](#model=summary)
	2. [Use](#use)
	3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
	4. [Training Details](#training-details)
	5. [Evaluation](#evaluation)
	6. [Environmental Impact](#environmental-impact)
	7. [Citation](#citation)
	8. [Model Card Authors](#model-card-authors)
	9. [How To Get Started With the Model](#how-to-get-started-with-the-model)

	# Model Summary

	> We present BLOOMZ & mT0, a family of models capable of following human instructions in hundreds of languages. By finetuning large BLOOM & mT5 pretrained multilingual language models on our multilingual task mixture (xP3), we discover various generalization properties of our finetuned models acrosss tasks and languages.

	- Repository: [bigscience-workshop/xmtf](https://github.com/bigscience-workshop/xmtf)
	- Paper: [TODO]
	- Funded by: The French government & Hugging Face
	- Point of Contact: [Niklas Muennighoff](mailto:niklas@hf.co)
	- BLOOMZ & mT0 Model Family:
	\|Name\|Explanation\|
	\|----\|-----------\|
	\|[bloomz-560m](https://huggingface.co/bigscience/bloomz-560m)\| 560M parameter multitask finetuned version of [bloom-560m](https://huggingface.co/bigscience/bloom-560m) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[bloomz-1b1](https://huggingface.co/bigscience/bloomz-1b1)\| 1.1B parameter multitask finetuned version of [bloom-1b1](https://huggingface.co/bigscience/bloom-1b1) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[bloomz-1b7](https://huggingface.co/bigscience/bloomz-1b7)\| 1.7B parameter multitask finetuned version of [bloom-1b7](https://huggingface.co/bigscience/bloom-1b7) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[bloomz-3b](https://huggingface.co/bigscience/bloomz-3b)\| 3B parameter multitask finetuned version of [bloom-3b](https://huggingface.co/bigscience/bloom-3b) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)\|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[bloomz](https://huggingface.co/bigscience/bloomz)\|176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|\|\|
	\|[bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt)\|7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). Better than [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1) when prompting in non-English\|
	\|[bloomz-mt](https://huggingface.co/bigscience/bloomz-mt)\| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). Better than [bloomz](https://huggingface.co/bigscience/bloomz) when prompting in non-English\|
	\|\|\|
	\|[bloomz-7b1-p3](https://huggingface.co/bigscience/bloomz-7b1)\| 7.1B parameter multitask finetuned version of [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1) on [P3](https://huggingface.co/bigscience/P3). Released for research purposes, performance is inferior to [bloomz-7b1](https://huggingface.co/bigscience/bloomz-7b1)\|
	\|[bloomz-p3](https://huggingface.co/bigscience/bloomz)\| 176B parameter multitask finetuned version of [bloom](https://huggingface.co/bigscience/bloom) on [P3](https://huggingface.co/bigscience/P3). Released for research purposes, performance is inferior to [bloomz](https://huggingface.co/bigscience/bloomz)\|
	\|\|\|
	\|\|\|
	\|[mt0-small](https://huggingface.co/bigscience/mt0-xxl)\|300M parameter multitask finetuned version of [mt5-small](https://huggingface.co/google/mt5-small) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[mt0-base](https://huggingface.co/bigscience/mt0-xxl)\|580M parameter multitask finetuned version of [mt5-base](https://huggingface.co/google/mt5-base) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[mt0-large](https://huggingface.co/bigscience/mt0-xxl)\|1.2B parameter multitask finetuned version of [mt5-large](https://huggingface.co/google/mt5-large) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[mt0-xl](https://huggingface.co/bigscience/mt0-xxl)\|3.7B parameter multitask finetuned version of [mt5-xl](https://huggingface.co/google/mt5-xl) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|[mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)\|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3)\|
	\|\|\|
	\|[mt0-xxl-mt](https://huggingface.co/bigscience/mt0-xxl-mt)\|13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [xP3](https://huggingface.co/bigscience/xP3) & [xP3mt](https://huggingface.co/bigscience/xP3mt). Better than [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl) when prompting in non-English\|
	\|\|\|
	\|[mt0-xxl-p3](https://huggingface.co/bigscience/mt0-xxl-p3)\| 13B parameter multitask finetuned version of [mt5-xxl](https://huggingface.co/google/mt5-xxl) on [P3](https://huggingface.co/bigscience/P3). Released for research purposes, performance is inferior to [mt0-xxl](https://huggingface.co/bigscience/mt0-xxl)\|
	\|----\|-----------\|





	# Intended uses

	You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask "Translate this to Chinese: Je t'aime.", and the model will hopefully generate "我爱你".

	# How to use

	Here is how to use the model in PyTorch:
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
	model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-560m")

	inputs = tokenizer.encode("Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", return_tensors="pt")
	outputs = model.generate(inputs)
	print(tokenizer.decode(outputs[0]))
	```

	To use another checkpoint, replace the path in `AutoTokenizer` and `AutoModelForCausalLM`.

	Note: 176B models are trained with bfloat16, while smaller models are trained with fp16. We recommend using the same precision type or fp32 at inference

	# Limitations

	- Large model size may require large computational resources
	- High performance variance depending on the prompt

	# BibTeX entry and citation info

	```bibtex
	TODO
	```