ybelkada
/

bloom-1b7-8bit

Text Generation

Inference Endpoints

text-generation-inference

8-bit precision

Model card Files Files and versions Community

bloom-1b7-8bit / README.md

ybelkada's picture

Update README.md

9874680 about 1 year ago

|

raw history blame contribute delete

No virus

1.94 kB

	---
	license: bigscience-bloom-rail-1.0
	language:
	- ak
	- ar
	- as
	- bm
	- bn
	- ca
	- code
	- en
	- es
	- eu
	- fon
	- fr
	- gu
	- hi
	- id
	- ig
	- ki
	- kn
	- lg
	- ln
	- ml
	- mr
	- ne
	- nso
	- ny
	- or
	- pa
	- pt
	- rn
	- rw
	- sn
	- st
	- sw
	- ta
	- te
	- tn
	- ts
	- tum
	- tw
	- ur
	- vi
	- wo
	- xh
	- yo
	- zh
	- zhs
	- zht
	- zu
	pipeline_tag: text-generation
	---

	<h1 style='text-align: center '>BLOOM LM - 8bit</h1>
	<h2 style='text-align: center '><em>BigScience Large Open-science Open-access Multilingual Language Model - 8bit</em> </h2>
	<h3 style='text-align: center '>Model Card</h3>
	<img src="https://s3.amazonaws.com/moonup/production/uploads/1657124309515-5f17f0a0925b9863e28ad517.png" alt="BigScience Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>

	Version 1.0 / 26.May.2022

	Related paper: https://arxiv.org/abs/2208.07339

	## TL;DR

	This repository contains 8bit weights of `bloom-1b7` model. You can load this model using `transformers==4.28.0` and `bitsandbytes>0.37.2` out of the box !

	```python
	# pip install accelerate bitsandbytes
	from transformers import AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained("ybelkada/bloom-1b7-8bit")
	```

	## How to push 8bit weights?

	First, make sure you are using `transformers` & `bitsandbytes` versions stated above. Then load your 8bit model as usual using `load_in_8bit=True`!

	```python
	# pip install accelerate bitsandbytes
	from transformers import AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-1b7", device_map="auto", load_in_8bit=True)
	```

	Then just call `push_to_hub` method or `save_pretrained` method if you want to save your 8bit model locally

	```python
	model.push_to_hub("{your_username}/bloom-1b7-8bit")
	```

	That's it!

	## What is inside the model's `state_dict`?

	Inside the state dict of the model (`pytorch_model.bin` file) you have

	- the quantized `int8` weights
	- the quantization statistics in `float16`