yi-01-ai
commited on
Commit
•
3e8c475
1
Parent(s):
9f4e006
Auto Sync from git://github.com/01-ai/Yi.git/commit/704d5c148e087e9d1c83fb51e02790b197ce1aba
Browse files
README.md
CHANGED
@@ -276,11 +276,11 @@ Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K)
|
|
276 |
|
277 |
- For chat and base models
|
278 |
|
279 |
-
Model | Intro | Default context window | Pretrained tokens | Training Data Date
|
280 |
-
|---|---|---|---|---
|
281 |
-
6B series models |They are suitable for personal and academic use. | 4K | 3T | Up to June 2023
|
282 |
-
9B model| It is the best at coding and math in the Yi series models.|4K | Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens. | Up to June 2023
|
283 |
-
34B series models | They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It's a cost-effective solution that's affordable and equipped with emergent ability.|4K | 3T | Up to June 2023
|
284 |
|
285 |
- For chat models
|
286 |
|
@@ -773,11 +773,11 @@ pip install torch==2.0.1 deepspeed==0.10 tensorboard transformers datasets sente
|
|
773 |
|
774 |
#### Hardware Setup
|
775 |
|
776 |
-
For the Yi-6B model, a node with 4 GPUs, each
|
777 |
|
778 |
-
For the Yi-34B model, because the usage of zero-offload technique
|
779 |
|
780 |
-
A typical hardware setup for finetuning 34B model is a node with
|
781 |
|
782 |
#### Quick Start
|
783 |
|
@@ -864,8 +864,8 @@ python quantization/gptq/eval_quantized_model.py \
|
|
864 |
|
865 |
#### GPT-Q quantization
|
866 |
|
867 |
-
[GPT-Q](https://github.com/IST-DASLab/gptq) is a PTQ(Post-Training Quantization)
|
868 |
-
method. It
|
869 |
of the model.
|
870 |
|
871 |
Yi models can be GPT-Q quantized without a lot of efforts.
|
@@ -911,11 +911,11 @@ python quantization/awq/eval_quantized_model.py \
|
|
911 |
--model /quantized_model \
|
912 |
--trust_remote_code
|
913 |
```
|
914 |
-
<details style="display: inline;"><summary>For
|
915 |
|
916 |
#### AWQ quantization
|
917 |
|
918 |
-
[AWQ](https://github.com/mit-han-lab/llm-awq) is a PTQ(Post-Training Quantization)
|
919 |
method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs.
|
920 |
|
921 |
Yi models can be AWQ quantized without a lot of efforts.
|
|
|
276 |
|
277 |
- For chat and base models
|
278 |
|
279 |
+
Model | Intro | Default context window | Pretrained tokens | Training Data Date
|
280 |
+
|---|---|---|---|---
|
281 |
+
6B series models |They are suitable for personal and academic use. | 4K | 3T | Up to June 2023
|
282 |
+
9B model| It is the best at coding and math in the Yi series models.|4K | Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens. | Up to June 2023
|
283 |
+
34B series models | They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It's a cost-effective solution that's affordable and equipped with emergent ability.|4K | 3T | Up to June 2023
|
284 |
|
285 |
- For chat models
|
286 |
|
|
|
773 |
|
774 |
#### Hardware Setup
|
775 |
|
776 |
+
For the Yi-6B model, a node with 4 GPUs, each with GPU memory larger than 60GB, is recommended.
|
777 |
|
778 |
+
For the Yi-34B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh).
|
779 |
|
780 |
+
A typical hardware setup for finetuning the 34B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB.
|
781 |
|
782 |
#### Quick Start
|
783 |
|
|
|
864 |
|
865 |
#### GPT-Q quantization
|
866 |
|
867 |
+
[GPT-Q](https://github.com/IST-DASLab/gptq) is a PTQ (Post-Training Quantization)
|
868 |
+
method. It saves memory and provides potential speedups while retaining the accuracy
|
869 |
of the model.
|
870 |
|
871 |
Yi models can be GPT-Q quantized without a lot of efforts.
|
|
|
911 |
--model /quantized_model \
|
912 |
--trust_remote_code
|
913 |
```
|
914 |
+
<details style="display: inline;"><summary>For details, see the explanations below. ⬇️</summary> <ul>
|
915 |
|
916 |
#### AWQ quantization
|
917 |
|
918 |
+
[AWQ](https://github.com/mit-han-lab/llm-awq) is a PTQ (Post-Training Quantization)
|
919 |
method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs.
|
920 |
|
921 |
Yi models can be AWQ quantized without a lot of efforts.
|