yi-01-ai commited on
Commit
ba4e790
·
1 Parent(s): bb8d50e

Auto Sync from git://github.com/01-ai/Yi.git/commit/10e269d0da1f29d937d3930b1fe1a10c08adf575

Browse files
Files changed (1) hide show
  1. README.md +92 -28
README.md CHANGED
@@ -151,6 +151,12 @@ pipeline_tag: text-generation
151
 
152
  ## 🎉 News
153
 
 
 
 
 
 
 
154
  <details open>
155
  <summary>🎯 <b>2024/01/23</b>: The Yi-VL models, <code><a href="https://huggingface.co/01-ai/Yi-VL-34B">Yi-VL-34B</a></code> and <code><a href="https://huggingface.co/01-ai/Yi-VL-6B">Yi-VL-6B</a></code>, are open-sourced and available to the public.</summary>
156
  <br>
@@ -231,26 +237,23 @@ Yi-6B-Chat-8bits | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-C
231
  |---|---|
232
  Yi-34B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B/summary)
233
  Yi-34B-200K|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-200K/summary)
 
234
  Yi-6B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B/summary)
235
  Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-200K/summary)
236
 
237
  <sub><sup> - 200k is roughly equivalent to 400,000 Chinese characters. </sup></sub>
238
 
239
- ### Other info
240
 
241
- - For chat and base models:
242
 
243
- - 6B series models are suitable for personal and academic use.
244
-
245
- - 34B series models suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It's a cost-effective solution that's affordable and equipped with emergent ability.
246
-
247
- - The **default context window** is **4k tokens**.
248
-
249
- - The pretrained tokens are 3T.
250
-
251
- - The training data are up to June 2023.
252
 
253
- - For chat models:
254
 
255
  <details style="display: inline;"><summary>For chat model limitations, see the explanations below. ⬇️</summary>
256
  <ul>
@@ -340,7 +343,7 @@ If you want to chat with Yi with more customizable options (e.g., system prompt,
340
  <a href="#top">Back to top ⬆️ </a> ]
341
  </p>
342
 
343
- ### Quick start - pip
344
 
345
  This tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference.
346
 
@@ -417,31 +420,67 @@ You can perform inference with Yi chat or base models as below.
417
 
418
  ##### Perform inference with Yi base model
419
 
420
- The steps are similar to [pip - Perform inference with Yi chat model](#perform-inference-with-yi-chat-model).
421
 
422
- You can use the existing file [`text_generation.py`](https://github.com/01-ai/Yi/tree/main/demo).
423
 
424
- ```bash
425
- python demo/text_generation.py --model <your-model-path>
426
- ```
427
 
428
- Then you can see an output similar to the one below. 🥳
 
 
429
 
430
- <details>
431
 
432
- <summary>Output. ⬇️ </summary>
433
 
434
- <br>
435
 
436
- **Prompt**: Let me tell you an interesting story about cat Tom and mouse Jerry,
437
 
438
- **Generation**: Let me tell you an interesting story about cat Tom and mouse Jerry, which happened in my childhood. My father had a big house with two cats living inside it to kill mice. One day when I was playing at home alone, I found one of the tomcats lying on his back near our kitchen door, looking very much like he wanted something from us but couldn’t get up because there were too many people around him! He kept trying for several minutes before finally giving up...
439
 
440
- </details>
441
 
442
- <p align="right"> [
443
- <a href="#top">Back to top ⬆️ </a> ]
444
- </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
445
 
446
  ### Quick start - Docker
447
  <details>
@@ -924,6 +963,7 @@ Below are detailed minimum VRAM requirements under different batch use cases.
924
  |----------------------|--------------|:-------------------------------------:|
925
  | Yi-6B | 15 GB | RTX3090 <br> RTX4090 <br> A10 <br> A30 |
926
  | Yi-6B-200K | 50 GB | A800 (80 GB) |
 
927
  | Yi-34B | 72 GB | 4 x RTX 4090 <br> A800 (80 GB) |
928
  | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) |
929
 
@@ -1094,6 +1134,8 @@ Yi-34B-Chat model demonstrates exceptional performance, ranking first among all
1094
 
1095
  ### 📊 Base model performance
1096
 
 
 
1097
  The Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMMLU, common-sense reasoning, reading comprehension, and more.
1098
 
1099
  ![Base model performance](https://github.com/01-ai/Yi/blob/main/assets/img/benchmark_base.png?raw=true)
@@ -1110,6 +1152,28 @@ The Yi-34B and Yi-34B-200K models stand out as the top performers among open-sou
1110
  - **Falcon-180B caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated.
1111
  </details>
1112
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1113
  <p align="right"> [
1114
  <a href="#top">Back to top ⬆️ </a> ]
1115
  </p>
 
151
 
152
  ## 🎉 News
153
 
154
+ <details open>
155
+ <summary>🎯 <b>2024/03/06</b>: The Yi-9B is open-sourced and available to the public.</summary>
156
+ <br>
157
+ Yi-9B stands out as the top performer among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension.
158
+ </details>
159
+
160
  <details open>
161
  <summary>🎯 <b>2024/01/23</b>: The Yi-VL models, <code><a href="https://huggingface.co/01-ai/Yi-VL-34B">Yi-VL-34B</a></code> and <code><a href="https://huggingface.co/01-ai/Yi-VL-6B">Yi-VL-6B</a></code>, are open-sourced and available to the public.</summary>
162
  <br>
 
237
  |---|---|
238
  Yi-34B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B/summary)
239
  Yi-34B-200K|• [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-34B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-34B-200K/summary)
240
+ Yi-9B|• [🤗 Hugging Face](TBD)
241
  Yi-6B| • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B/summary)
242
  Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K) • [🤖 ModelScope](https://www.modelscope.cn/models/01ai/Yi-6B-200K/summary)
243
 
244
  <sub><sup> - 200k is roughly equivalent to 400,000 Chinese characters. </sup></sub>
245
 
246
+ ### Model info
247
 
248
+ - For chat and base models
249
 
250
+ Model | Intro | Default context window | Pretrained tokens | Training Data Date
251
+ |---|---|---|---|---
252
+ 6B series models |They are suitable for personal and academic use. | 4K | 3T | Up to June 2023
253
+ 9B model| It is the best at coding and math in the Yi series models.|4K | Yi-9B is continuously trained based on Yi-6B, using 0.8T tokens. | Up to June 2023
254
+ 34B series models | They are suitable for personal, academic, and commercial (particularly for small and medium-sized enterprises) purposes. It's a cost-effective solution that's affordable and equipped with emergent ability.|4K | 3T | Up to June 2023
 
 
 
 
255
 
256
+ - For chat models
257
 
258
  <details style="display: inline;"><summary>For chat model limitations, see the explanations below. ⬇️</summary>
259
  <ul>
 
343
  <a href="#top">Back to top ⬆️ </a> ]
344
  </p>
345
 
346
+ ### Quick start - pip
347
 
348
  This tutorial guides you through every step of running **Yi-34B-Chat locally on an A800 (80G)** and then performing inference.
349
 
 
420
 
421
  ##### Perform inference with Yi base model
422
 
423
+ - Yi-34B
424
 
425
+ The steps are similar to [pip - Perform inference with Yi chat model](#perform-inference-with-yi-chat-model).
426
 
427
+ You can use the existing file [`text_generation.py`](https://github.com/01-ai/Yi/tree/main/demo).
 
 
428
 
429
+ ```bash
430
+ python demo/text_generation.py --model <your-model-path>
431
+ ```
432
 
433
+ Then you can see an output similar to the one below. 🥳
434
 
435
+ <details>
436
 
437
+ <summary>Output. ⬇️ </summary>
438
 
439
+ <br>
440
 
441
+ **Prompt**: Let me tell you an interesting story about cat Tom and mouse Jerry,
442
 
443
+ **Generation**: Let me tell you an interesting story about cat Tom and mouse Jerry, which happened in my childhood. My father had a big house with two cats living inside it to kill mice. One day when I was playing at home alone, I found one of the tomcats lying on his back near our kitchen door, looking very much like he wanted something from us but couldn’t get up because there were too many people around him! He kept trying for several minutes before finally giving up...
444
 
445
+ </details>
446
+
447
+ - Yi-9B
448
+
449
+ Input
450
+
451
+ ```bash
452
+ from transformers import AutoModelForCausalLM, AutoTokenizer
453
+
454
+ MODEL_DIR = "01-ai/Yi-9B"
455
+ model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype="auto")
456
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR, use_fast=False)
457
+
458
+ input_text = "# write the quick sort algorithm"
459
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
460
+ outputs = model.generate(**inputs, max_length=256)
461
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
462
+ ```
463
+
464
+ Output
465
+
466
+ ```bash
467
+ # write the quick sort algorithm
468
+ def quick_sort(arr):
469
+ if len(arr) <= 1:
470
+ return arr
471
+ pivot = arr[len(arr) // 2]
472
+ left = [x for x in arr if x < pivot]
473
+ middle = [x for x in arr if x == pivot]
474
+ right = [x for x in arr if x > pivot]
475
+ return quick_sort(left) + middle + quick_sort(right)
476
+
477
+ # test the quick sort algorithm
478
+ print(quick_sort([3, 6, 8, 10, 1, 2, 1]))
479
+ ```
480
+
481
+ <p align="right"> [
482
+ <a href="#top">Back to top ⬆️ </a> ]
483
+ </p>
484
 
485
  ### Quick start - Docker
486
  <details>
 
963
  |----------------------|--------------|:-------------------------------------:|
964
  | Yi-6B | 15 GB | RTX3090 <br> RTX4090 <br> A10 <br> A30 |
965
  | Yi-6B-200K | 50 GB | A800 (80 GB) |
966
+ | Yi-9B | 20 GB | 1 x RTX 4090 (24 GB) |
967
  | Yi-34B | 72 GB | 4 x RTX 4090 <br> A800 (80 GB) |
968
  | Yi-34B-200K | 200 GB | 4 x A800 (80 GB) |
969
 
 
1134
 
1135
  ### 📊 Base model performance
1136
 
1137
+ #### Yi-34B and Yi-34B-200K
1138
+
1139
  The Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMMLU, common-sense reasoning, reading comprehension, and more.
1140
 
1141
  ![Base model performance](https://github.com/01-ai/Yi/blob/main/assets/img/benchmark_base.png?raw=true)
 
1152
  - **Falcon-180B caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated.
1153
  </details>
1154
 
1155
+ #### Yi-9B
1156
+
1157
+ Yi-9B is almost the best among a range of similar-sized open-source models (including Mistral-7B, SOLAR-10.7B, Gemma-7B, DeepSeek-Coder-7B-Base-v1.5 and more), particularly excelling in code, math, common-sense reasoning, and reading comprehension.
1158
+
1159
+ ![Yi-9B benchmark - details](TBD)
1160
+
1161
+ - In terms of **overall** ability (`Mean-All), Yi-9B performs the best among similarly sized open-source models, surpassing DeepSeek-Coder, DeepSeek-Math, Mistral-7B, SOLAR-10.7B, and Gemma-7B.
1162
+
1163
+ ![Yi-9B benchmark - overall](TBD)
1164
+
1165
+ - In terms of **coding** ability (Mean-Code), Yi-9B's performance is second only to DeepSeek-Coder-7B, surpassing Yi-34B, SOLAR-10.7B, Mistral-7B, and Gemma-7B.
1166
+
1167
+ ![Yi-9B benchmark - code](TBD)
1168
+
1169
+ - In terms of **math** ability (Mean-Math), Yi-9B's performance is second only to DeepSeek-Math-7B, surpassing SOLAR-10.7B, Mistral-7B, and Gemma-7B.
1170
+
1171
+ ![Yi-9B benchmark - math](TBD)
1172
+
1173
+ - In terms of **common sense and reasoning** ability (Mean-Text), Yi-9B's performance is on par with Mistral-7B, SOLAR-10.7B, and Gemma-7B.
1174
+
1175
+ ![Yi-9B benchmark - text](TBD)
1176
+
1177
  <p align="right"> [
1178
  <a href="#top">Back to top ⬆️ </a> ]
1179
  </p>