|
--- |
|
license: apache-2.0 |
|
tags: |
|
- text2text-generation |
|
pipeline_tag: text2text-generation |
|
language: |
|
- zh |
|
- en |
|
--- |
|
# GPTQ-for-Bloom |
|
4 bits quantization of [Bloom](https://arxiv.org/pdf/2211.05100.pdf) using [GPTQ](https://arxiv.org/abs/2210.17323) |
|
|
|
GPTQ is SOTA one-shot weight quantization method. |
|
|
|
The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/gptq. |
|
|
|
**This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)** |
|
|
|
## Model list |
|
|
|
| model name | file size | GPU memory | |
|
| -------------------------------------------------- | ------------------- | ------------------ | |
|
| bloom7b-2m-8bit-128g.pt | 9.7G | 11G | |
|
| bloom7b-2m-4bit-128g.pt | 6.9G | 8G | |
|
| bloom7b-2m-3bit-128g.pt | 6.2G | 7.7G | |
|
|
|
|