yi-01-ai commited on
Commit
764eb52
·
1 Parent(s): de636e3

Auto Sync from git://github.com/01-ai/Yi.git/commit/ed84c741ee48d104263fda3de1a71c15b0f12d53

Browse files
Files changed (1) hide show
  1. README.md +22 -22
README.md CHANGED
@@ -122,7 +122,7 @@ pipeline_tag: text-generation
122
  - For Chinese language capability, the Yi series models landed in 2nd place (following GPT-4), surpassing other LLMs (such as Baidu ERNIE, Qwen, and Baichuan) on the [SuperCLUE](https://www.superclueai.com/) in Oct 2023.
123
 
124
  - 🙏 (Credits to LLaMA) Thanks to the Transformer and LLaMA open-source communities, as they reducing the efforts required to build from scratch and enabling the utilization of the same tools within the AI ecosystem.
125
- <details style="display: inline;"><summary> If you're interested in Yi's adoption of LLaMA architecture and license usage policy, see <span style="color: green;">Yi's relation with LLaMA</span> ⬇️</summary> <ul> <br>
126
  > 💡 TL;DR
127
  >
128
  > The Yi series models adopt the same model architecture as LLaMA but are **NOT** derivatives of LLaMA.
@@ -193,7 +193,7 @@ Yi-6B-200K | • [🤗 Hugging Face](https://huggingface.co/01-ai/Yi-6B-200K)
193
 
194
  - For chat models:
195
 
196
- <details style="display: inline;"><summary>For chat model limitations, see ⬇️</summary>
197
  <ul>
198
  <br>The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training.
199
 
@@ -414,7 +414,7 @@ Then you can see an output similar to the one below. 🥳
414
 
415
  <details>
416
 
417
- <summary>Output ⬇️ </summary>
418
 
419
  <br>
420
 
@@ -426,7 +426,7 @@ Then you can see an output similar to the one below. 🥳
426
 
427
  ### Quick start - Docker
428
  <details>
429
- <summary> Run Yi-34B-chat locally with Docker: a step-by-step guide ⬇️</summary>
430
  <br>This tutorial guides you through every step of running <strong>Yi-34B-Chat on an A800 GPU</strong> locally and then performing inference.
431
  <h4>Step 0: Prerequisites</h4>
432
  <p>Make sure you've installed <a href="https://docs.docker.com/engine/install/?open_in_browser=true">Docker</a> and <a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html">nvidia-container-toolkit</a>.</p>
@@ -453,7 +453,7 @@ ghcr.io/01-ai/yi:latest
453
 
454
  ### Quick start - llama.cpp
455
  <details>
456
- <summary> Run Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide ⬇️</summary>
457
  <br>This tutorial guides you through every step of running a quantized model (<a href="https://huggingface.co/XeIaso/yi-chat-6B-GGUF/tree/main">Yi-chat-6B-2bits</a>) locally and then performing inference.</p>
458
 
459
  - [Step 0: Prerequisites](#step-0-prerequisites)
@@ -602,7 +602,7 @@ python demo/web_demo.py -c <your-model-path>
602
 
603
  You can access the web UI by entering the address provided in the console into your browser.
604
 
605
- ![Quick start - web demo](./assets/img/yi_34b_chat_web_demo.gif)
606
 
607
  ### Finetuning
608
 
@@ -615,7 +615,7 @@ Once finished, you can compare the finetuned model and the base model with the f
615
  ```bash
616
  bash finetune/scripts/run_eval.sh
617
  ```
618
- <details style="display: inline;"><summary>For advanced usage (like fine-tuning based on your custom data), see ⬇️</summary> <ul>
619
 
620
  ### Finetune code for Yi 6B and 34B
621
 
@@ -747,7 +747,7 @@ python quantization/gptq/eval_quantized_model.py \
747
  --trust_remote_code
748
  ```
749
 
750
- <details style="display: inline;"><summary>For a more detailed explanation, see ⬇️</summary> <ul>
751
 
752
  #### GPT-Q quantization
753
 
@@ -798,7 +798,7 @@ python quantization/awq/eval_quantized_model.py \
798
  --model /quantized_model \
799
  --trust_remote_code
800
  ```
801
- <details style="display: inline;"><summary>For detailed explanations, see ⬇️</summary> <ul>
802
 
803
  #### AWQ quantization
804
 
@@ -883,7 +883,7 @@ Below are detailed minimum VRAM requirements under different batch use cases.
883
  ### Learning hub
884
 
885
  <details>
886
- <summary> If you want to learn Yi, you can find a wealth of helpful educational resources here ⬇️</summary>
887
  <br>
888
 
889
  Welcome to the Yi learning hub!
@@ -1010,10 +1010,10 @@ If you're seeking to explore the diverse capabilities within Yi's thriving famil
1010
 
1011
  Yi-34B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more.
1012
 
1013
- ![Chat model performance](./assets/img/benchmark_chat.png)
1014
 
1015
  <details>
1016
- <summary> Evaluation methods and challenges ⬇️ </summary>
1017
 
1018
  - **Evaluation methods**: we evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA.
1019
  - **Zero-shot vs. few-shot**: in chat models, the zero-shot approach is more commonly employed.
@@ -1027,18 +1027,18 @@ Yi-34B-Chat model demonstrates exceptional performance, ranking first among all
1027
 
1028
  The Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMML, common-sense reasoning, reading comprehension, and more.
1029
 
1030
- ![Base model performance](./assets/img/benchmark_base.png)
1031
 
1032
  <details>
1033
- <summary> Evaluation methods ⬇️</summary>
1034
-
1035
- - **Disparity in Results**: while benchmarking open-source models, a disparity has been noted between results from our pipeline and those reported by public sources like OpenCompass.
1036
- - **Investigation Findings**: a deeper investigation reveals that variations in prompts, post-processing strategies, and sampling techniques across models may lead to significant outcome differences.
1037
- - **Uniform Benchmarking Process**: our methodology aligns with the original benchmarks—consistent prompts and post-processing strategies are used, and greedy decoding is applied during evaluations without any post-processing for the generated content.
1038
- - **Efforts to Retrieve Unreported Scores**: for scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline.
1039
- - **Extensive Model Evaluation**: to evaluate the model’s capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension.
1040
- - **Special Configurations**: CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category "Math & Code".
1041
- - **Falcon-180B Caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated.
1042
  </details>
1043
 
1044
  # 🟢 Who can use Yi?
 
122
  - For Chinese language capability, the Yi series models landed in 2nd place (following GPT-4), surpassing other LLMs (such as Baidu ERNIE, Qwen, and Baichuan) on the [SuperCLUE](https://www.superclueai.com/) in Oct 2023.
123
 
124
  - 🙏 (Credits to LLaMA) Thanks to the Transformer and LLaMA open-source communities, as they reducing the efforts required to build from scratch and enabling the utilization of the same tools within the AI ecosystem.
125
+ <details style="display: inline;"><summary> If you're interested in Yi's adoption of LLaMA architecture and license usage policy, see <span style="color: green;">Yi's relation with LLaMA.</span> ⬇️</summary> <ul> <br>
126
  > 💡 TL;DR
127
  >
128
  > The Yi series models adopt the same model architecture as LLaMA but are **NOT** derivatives of LLaMA.
 
193
 
194
  - For chat models:
195
 
196
+ <details style="display: inline;"><summary>For chat model limitations, see the explanations below. ⬇️</summary>
197
  <ul>
198
  <br>The released chat model has undergone exclusive training using Supervised Fine-Tuning (SFT). Compared to other standard chat models, our model produces more diverse responses, making it suitable for various downstream tasks, such as creative scenarios. Furthermore, this diversity is expected to enhance the likelihood of generating higher quality responses, which will be advantageous for subsequent Reinforcement Learning (RL) training.
199
 
 
414
 
415
  <details>
416
 
417
+ <summary>Output. ⬇️ </summary>
418
 
419
  <br>
420
 
 
426
 
427
  ### Quick start - Docker
428
  <details>
429
+ <summary> Run Yi-34B-chat locally with Docker: a step-by-step guide. ⬇️</summary>
430
  <br>This tutorial guides you through every step of running <strong>Yi-34B-Chat on an A800 GPU</strong> locally and then performing inference.
431
  <h4>Step 0: Prerequisites</h4>
432
  <p>Make sure you've installed <a href="https://docs.docker.com/engine/install/?open_in_browser=true">Docker</a> and <a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html">nvidia-container-toolkit</a>.</p>
 
453
 
454
  ### Quick start - llama.cpp
455
  <details>
456
+ <summary> Run Yi-chat-6B-2bits locally with llama.cpp: a step-by-step guide. ⬇️</summary>
457
  <br>This tutorial guides you through every step of running a quantized model (<a href="https://huggingface.co/XeIaso/yi-chat-6B-GGUF/tree/main">Yi-chat-6B-2bits</a>) locally and then performing inference.</p>
458
 
459
  - [Step 0: Prerequisites](#step-0-prerequisites)
 
602
 
603
  You can access the web UI by entering the address provided in the console into your browser.
604
 
605
+ ![Quick start - web demo](https://github.com/01-ai/Yi/blob/main/assets/img/yi_34b_chat_web_demo.gif)
606
 
607
  ### Finetuning
608
 
 
615
  ```bash
616
  bash finetune/scripts/run_eval.sh
617
  ```
618
+ <details style="display: inline;"><summary>For advanced usage (like fine-tuning based on your custom data), see the explanations below. ⬇️ </summary> <ul>
619
 
620
  ### Finetune code for Yi 6B and 34B
621
 
 
747
  --trust_remote_code
748
  ```
749
 
750
+ <details style="display: inline;"><summary>For a more detailed explanation, see the explanations below. ⬇️</summary> <ul>
751
 
752
  #### GPT-Q quantization
753
 
 
798
  --model /quantized_model \
799
  --trust_remote_code
800
  ```
801
+ <details style="display: inline;"><summary>For detailed explanations, see the explanations below. ⬇️</summary> <ul>
802
 
803
  #### AWQ quantization
804
 
 
883
  ### Learning hub
884
 
885
  <details>
886
+ <summary> If you want to learn Yi, you can find a wealth of helpful educational resources here. ⬇️</summary>
887
  <br>
888
 
889
  Welcome to the Yi learning hub!
 
1010
 
1011
  Yi-34B-Chat model demonstrates exceptional performance, ranking first among all existing open-source models in the benchmarks including MMLU, CMMLU, BBH, GSM8k, and more.
1012
 
1013
+ ![Chat model performance](https://github.com/01-ai/Yi/blob/main/assets/img/benchmark_base.png)
1014
 
1015
  <details>
1016
+ <summary> Evaluation methods and challenges. ⬇️ </summary>
1017
 
1018
  - **Evaluation methods**: we evaluated various benchmarks using both zero-shot and few-shot methods, except for TruthfulQA.
1019
  - **Zero-shot vs. few-shot**: in chat models, the zero-shot approach is more commonly employed.
 
1027
 
1028
  The Yi-34B and Yi-34B-200K models stand out as the top performers among open-source models, especially excelling in MMLU, CMML, common-sense reasoning, reading comprehension, and more.
1029
 
1030
+ ![Base model performance](https://github.com/01-ai/Yi/blob/main/assets/img/benchmark_base.png)
1031
 
1032
  <details>
1033
+ <summary> Evaluation methods. ⬇️</summary>
1034
+
1035
+ - **Disparity in results**: while benchmarking open-source models, a disparity has been noted between results from our pipeline and those reported by public sources like OpenCompass.
1036
+ - **Investigation findings**: a deeper investigation reveals that variations in prompts, post-processing strategies, and sampling techniques across models may lead to significant outcome differences.
1037
+ - **Uniform benchmarking process**: our methodology aligns with the original benchmarks—consistent prompts and post-processing strategies are used, and greedy decoding is applied during evaluations without any post-processing for the generated content.
1038
+ - **Efforts to retrieve unreported scores**: for scores that were not reported by the original authors (including scores reported with different settings), we try to get results with our pipeline.
1039
+ - **Extensive model evaluation**: to evaluate the model’s capability extensively, we adopted the methodology outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ were incorporated to evaluate reading comprehension.
1040
+ - **Special configurations**: CSQA was exclusively tested using a 7-shot setup, while all other tests were conducted with a 0-shot configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), HumanEval (0-shot@1), and MBPP (3-shot@1) under the category "Math & Code".
1041
+ - **Falcon-180B caveat**: Falcon-180B was not tested on QuAC and OBQA due to technical constraints. Its performance score is an average from other tasks, and considering the generally lower scores of these two tasks, Falcon-180B's capabilities are likely not underestimated.
1042
  </details>
1043
 
1044
  # 🟢 Who can use Yi?