Text Generation
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints
itsliupeng commited on
Commit
5eabb93
·
2 Parent(s): d6d03ad f5ced3c

Merge branch 'main' of https://huggingface.co/01-ai/Yi-9B-200K into v20240321

Browse files
Files changed (2) hide show
  1. CHANGELOG.md +2 -2
  2. README.md +30 -15
CHANGELOG.md CHANGED
@@ -5,8 +5,8 @@
5
 
6
  ## version: v20240318
7
  - train 12B tokens with 256k context window
8
- - recall of "Needle in A HayStack": 97.5% [](./images/v20240318.png)
9
 
10
  ## version: initial
11
  - train 6B tokens with 256k context window
12
- - recall of "Needle in A HayStack": 87.1% [](./images/initail.png)
 
5
 
6
  ## version: v20240318
7
  - train 12B tokens with 256k context window
8
+ - recall of "Needle in A HayStack": 97.5% ![](./images/v20240318.png)
9
 
10
  ## version: initial
11
  - train 6B tokens with 256k context window
12
+ - recall of "Needle in A HayStack": 87.1% ![](./images/initail.png)
README.md CHANGED
@@ -60,9 +60,20 @@ pipeline_tag: text-generation
60
  </p>
61
 
62
  <p align="center">
63
- 👋 Join us 💬 <a href="https://github.com/01-ai/Yi/issues/43#issuecomment-1827285245" target="_blank"> WeChat (Chinese) </a>!
64
  </p>
65
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  <!-- DO NOT REMOVE ME -->
68
 
@@ -153,6 +164,10 @@ pipeline_tag: text-generation
153
 
154
  ## News
155
 
 
 
 
 
156
  <details open>
157
  <summary>🎯 <b>2024-03-08</b>: <a href="https://arxiv.org/abs/2403.04652">Yi Tech Report</a> is published! </summary>
158
  </details>
@@ -250,8 +265,8 @@ Yi-6B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-C
250
  |---|---|
251
  Yi-34B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B/summary)
252
  Yi-34B-200K|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-200K/summary)
253
- Yi-9B|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-9B)
254
- Yi-9B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-9B-200K)
255
  Yi-6B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B/summary)
256
  Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-200K/summary)
257
 
@@ -261,11 +276,11 @@ Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K)
261
 
262
  - For chat and base models
263
 
264
- Model | Intro | Default context window | Pretrained tokens | Training Data Date
265
- |---|---|---|---|---
266
- 6B series models |They are suitable for personal and academic use. | 4K | 3T | Up to June 2023
267
- 9B model| It is the best at coding and math in the Yi series models.|4K | Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens. | Up to June 2023
268
- 34B series models | They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It's a cost-effective solution that's affordable and equipped with emergent ability.|4K | 3T | Up to June 2023
269
 
270
  - For chat models
271
 
@@ -758,11 +773,11 @@ pip install torch==2.0.1 deepspeed==0.10 tensorboard transformers datasets sente
758
 
759
  #### Hardware Setup
760
 
761
- For the Yi-6B model, a node with 4 GPUs, each has GPU mem larger than 60GB is recommended.
762
 
763
- For the Yi-34B model, because the usage of zero-offload technique takes a lot CPU memory, please be careful to limit the GPU numbers in 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the GPU number (as shown in scripts/run_sft_Yi_34b.sh).
764
 
765
- A typical hardware setup for finetuning 34B model is a node with 8GPUS (limit to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each has GPU mem larger than 80GB, with total CPU mem larger than 900GB.
766
 
767
  #### Quick Start
768
 
@@ -849,8 +864,8 @@ python quantization/gptq/eval_quantized_model.py \
849
 
850
  #### GPT-Q quantization
851
 
852
- [GPT-Q](https://github.com/IST-DASLab/gptq) is a PTQ(Post-Training Quantization)
853
- method. It's memory saving and provides potential speedups while retaining the accuracy
854
  of the model.
855
 
856
  Yi models can be GPT-Q quantized without a lot of efforts.
@@ -896,11 +911,11 @@ python quantization/awq/eval_quantized_model.py \
896
  --model /quantized_model \
897
  --trust_remote_code
898
  ```
899
- <details style="display: inline;"><summary>For detailed explanations, see the explanations below. ⬇️</summary> <ul>
900
 
901
  #### AWQ quantization
902
 
903
- [AWQ](https://github.com/mit-han-lab/llm-awq) is a PTQ(Post-Training Quantization)
904
  method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs.
905
 
906
  Yi models can be AWQ quantized without a lot of efforts.
 
60
  </p>
61
 
62
  <p align="center">
63
+ 👩‍🚀 Ask questions or discuss ideas on <a href="01-ai/Yi · Discussions" target="_blank"> GitHub </a>
64
  </p>
65
 
66
+ <p align="center">
67
+ 👋 Join us on <a href="https://discord.gg/hYUwWddeAu" target="_blank"> 👾 Discord </a> or <a href="有官方的微信群嘛 · Issue #43 · 01-ai/Yi" target="_blank"> 💬 WeChat </a>
68
+ </p>
69
+
70
+ <p align="center">
71
+ 📝 Check out <a href="https://arxiv.org/abs/2403.04652"> Yi Tech Report </a>
72
+ </p>
73
+
74
+ <p align="center">
75
+ 📚 Grow at <a href="#learning-hub"> Yi Learning Hub </a>
76
+ </p>
77
 
78
  <!-- DO NOT REMOVE ME -->
79
 
 
164
 
165
  ## News
166
 
167
+ <details>
168
+ <summary>🎯 <b>2023-03-16</b>: The <code>Yi-9B-200K</code> is open-sourced and available to the public.</summary>
169
+ </details>
170
+
171
  <details open>
172
  <summary>🎯 <b>2024-03-08</b>: <a href="https://arxiv.org/abs/2403.04652">Yi Tech Report</a> is published! </summary>
173
  </details>
 
265
  |---|---|
266
  Yi-34B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B/summary)
267
  Yi-34B-200K|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-200K/summary)
268
+ Yi-9B|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-9B) • [🤖 ModelScope](https://wisemodel.cn/models/01.AI/Yi-9B)
269
+ Yi-9B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-9B-200K) • [🤖 ModelScope](https://wisemodel.cn/models/01.AI/Yi-9B-200K)
270
  Yi-6B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B/summary)
271
  Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-200K/summary)
272
 
 
276
 
277
  - For chat and base models
278
 
279
+ Model | Intro | Default context window | Pretrained tokens | Training Data Date
280
+ |---|---|---|---|---
281
+ 6B series models |They are suitable for personal and academic use. | 4K | 3T | Up to June 2023
282
+ 9B model| It is the best at coding and math in the Yi series models.|4K | Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens. | Up to June 2023
283
+ 34B series models | They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It's a cost-effective solution that's affordable and equipped with emergent ability.|4K | 3T | Up to June 2023
284
 
285
  - For chat models
286
 
 
773
 
774
  #### Hardware Setup
775
 
776
+ For the Yi-6B model, a node with 4 GPUs, each with GPU memory larger than 60GB, is recommended.
777
 
778
+ For the Yi-34B model, because the usage of the zero-offload technique consumes a lot of CPU memory, please be careful to limit the number of GPUs in the 34B finetune training. Please use CUDA_VISIBLE_DEVICES to limit the number of GPUs (as shown in scripts/run_sft_Yi_34b.sh).
779
 
780
+ A typical hardware setup for finetuning the 34B model is a node with 8 GPUs (limited to 4 in running by CUDA_VISIBLE_DEVICES=0,1,2,3), each with GPU memory larger than 80GB, and total CPU memory larger than 900GB.
781
 
782
  #### Quick Start
783
 
 
864
 
865
  #### GPT-Q quantization
866
 
867
+ [GPT-Q](https://github.com/IST-DASLab/gptq) is a PTQ (Post-Training Quantization)
868
+ method. It saves memory and provides potential speedups while retaining the accuracy
869
  of the model.
870
 
871
  Yi models can be GPT-Q quantized without a lot of efforts.
 
911
  --model /quantized_model \
912
  --trust_remote_code
913
  ```
914
+ <details style="display: inline;"><summary>For details, see the explanations below. ⬇️</summary> <ul>
915
 
916
  #### AWQ quantization
917
 
918
+ [AWQ](https://github.com/mit-han-lab/llm-awq) is a PTQ (Post-Training Quantization)
919
  method. It's an efficient and accurate low-bit weight quantization (INT3/4) for LLMs.
920
 
921
  Yi models can be AWQ quantized without a lot of efforts.