csdc-atl
/

Baichuan2-13B-Chat-GPTQ-Int4

@@ -7,8 +7,80 @@ tasks:
   - text-generation
 ---
 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable html -->
 <div align="center">
 <h1>
   Baichuan 2

   - text-generation
 ---
 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable html -->
+# Baichuan 2 7B Chat - Int4
+<!-- description start -->
+## 描述
+该repo包含[Baichuan 2 7B Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)的Int4 GPTQ模型文件。
+<!-- description end -->
+<!-- README_GPTQ.md-provided-files start -->
+## GPTQ参数
+该GPTQ文件都是用AutoGPTQ生成的。
+- Bits: 4/8
+- GS: 32/128
+- Act Order: True
+- Damp %: 0.1
+- GPTQ dataset: 中文、英文混合数据集
+- Sequence Length: 4096
+| 模型版本 | agieval | ceval | cmmlu | size | 推理速度(A100-40G) |
+|---|---|---|---|---|---|
+| [Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) | ~ | ~ | ~ | 27.79g | 31.55 tokens/s |
+| [Baichuan2-13B-Chat-4bits](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat-4bits) | ~ | ~ | ~ | 9.08g | 18.45 tokens/s |
+| [GPTQ-4bit-32g](https://huggingface.co/csdc-atl/Baichuan2-13B-Chat-GPTQ-Int4/tree/4bit-32g) | ~ | ~ | ~ | 9.87g | 27.35(hf) \ 38.28(autogptq) tokens/s   |
+| [GPTQ-4bit-128g](https://huggingface.co/csdc-atl/Baichuan2-13B-Chat-GPTQ-Int4/tree/main) | 38.78 | 56.42 | 57.78 | 9.14g | 28.74(hf) \ 39.24(autogptq) tokens/s  |
+<!-- README_GPTQ.md-provided-files end -->
+## 如何在Python代码中使用此GPTQ模型
+### 安装必要的依赖
+必须： Transformers 4.32.0以上、Optimum 1.12.0以上、AutoGPTQ 0.4.2以上
+```shell
+pip3 install transformers>=4.32.0 optimum>=1.12.0
+pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # Use cu117 if on CUDA 11.7
+```
+如果您在使用预构建的pip包安装AutoGPTQ时遇到问题，请改为从源代码安装：
+```shell
+pip3 uninstall -y auto-gptq
+git clone https://github.com/PanQiWei/AutoGPTQ
+cd AutoGPTQ
+pip3 install .
+```
+### 然后可以使用以下代码
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.generation.utils import GenerationConfig
+model_name_or_path = "csdc-atl/Baichuan2-7B-Chat-Int4"
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
+                                             torch_dtype=torch.float16,
+                                             device_map="auto",
+                                             trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True, trust_remote_code=True)
+model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan2-7B-Chat")
+messages = []
+messages.append({"role": "user", "content": "解释一下“温故而知新”"})
+response = model.chat(tokenizer, messages)
+print(response)
+"温故而知新"是一句中国古代的成语，出自《论语·为政》篇。这句话的意思是：通过回顾过去，我们可以发现新的知识和理解。换句话说，学习历史和经验可以让我们更好地理解现在和未来。
+这句话鼓励我们在学习和生活中不断地回顾和反思过去的经验，从而获得新的启示和成长。通过重温旧的知识和经历，我们可以发现新的观点和理解，从而更好地应对不断变化的世界和挑战。
+```
+<!-- README_GPTQ.md-use-from-python end -->
+<!-- README_GPTQ.md-compatibility start -->
+---
 <div align="center">
 <h1>
   Baichuan 2