|
--- |
|
license: other |
|
license_name: yi-license |
|
license_link: LICENSE |
|
--- |
|
<div align="center"> |
|
|
|
<img src="./Yi.svg" width="200px"> |
|
|
|
</div> |
|
|
|
## Introduction |
|
|
|
The **Yi** series models are large language models trained from scratch by |
|
developers at [01.AI](https://01.ai/). The first public release contains two |
|
bilingual(English/Chinese) base models with the parameter sizes of 6B and 34B. |
|
Both of them are trained with 4K sequence length and can be extended to 32K |
|
during inference time. |
|
|
|
## News |
|
|
|
- 🎯 **2023/11/02**: The base model of `Yi-6B` and `Yi-34B`. |
|
|
|
|
|
## Model Performance |
|
|
|
| Model | MMLU | CMMLU | C-Eval | GAOKAO | BBH | Common-sense Reasoning | Reading Comprehension | Math & Code | |
|
| :------------ | :------: | :------: | :------: | :------: | :------: | :--------------------: | :-------------------: | :---------: | |
|
| | 5-shot | 5-shot | 5-shot | 0-shot | 3-shot@1 | - | - | - | |
|
| LLaMA2-34B | 62.6 | - | - | - | 44.1 | 69.9 | 68.0 | 26.0 | |
|
| LLaMA2-70B | 68.9 | 53.3 | - | 49.8 | 51.2 | 71.9 | 69.4 | 36.8 | |
|
| Baichuan2-13B | 59.2 | 62.0 | 58.1 | 54.3 | 48.8 | 64.3 | 62.4 | 23.0 | |
|
| Qwen-14B | 66.3 | 71.0 | 72.1 | 62.5 | 53.4 | 73.3 | 72.5 | **39.8** | |
|
| Skywork-13B | 62.1 | 61.8 | 60.6 | 68.1 | 41.7 | 72.4 | 61.4 | 24.9 | |
|
| InternLM-20B | 62.1 | 59.0 | 58.8 | 45.5 | 52.5 | 78.3 | - | 30.4 | |
|
| Aquila-34B | 67.8 | 71.4 | 63.1 | - | - | - | - | - | |
|
| Falcon-180B | 70.4 | 58.0 | 57.8 | 59.0 | 54.0 | 77.3 | 68.8 | 34.0 | |
|
| Yi-6B | 63.2 | 75.5 | 72.0 | 72.2 | 42.8 | 72.3 | 68.7 | 19.8 | |
|
| **Yi-34B** | **76.3** | **83.7** | **81.4** | **82.8** | **54.3** | **80.1** | **76.4** | 37.1 | |
|
|
|
|
|
While benchmarking open-source models, we have observed a disparity between the |
|
results generated by our pipeline and those reported in public sources (e.g. |
|
OpenCompass). Upon conducting a more in-depth investigation of this difference, |
|
we have discovered that various models may employ different prompts, |
|
post-processing strategies, and sampling techniques, potentially resulting in |
|
significant variations in the outcomes. Our prompt and post-processing strategy |
|
remains consistent with the original benchmark, and greedy decoding is employed |
|
during evaluation without any post-processing for the generated content. For |
|
scores that were not reported by the original authors (including scores reported |
|
with different settings), we try to get results with our pipeline. |
|
|
|
To evaluate the model's capability extensively, we adopted the methodology |
|
outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, |
|
ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ |
|
were incorporated to evaluate reading comprehension. CSQA was exclusively tested |
|
using a 7-shot setup, while all other tests were conducted with a 0-shot |
|
configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), |
|
HumanEval (0-shot@1), and MBPP (3-shot@1) under the category "Math & Code". Due |
|
to technical constraints, we did not test Falcon-180 on QuAC and OBQA; the score |
|
is derived by averaging the scores on the remaining tasks. Since the scores for |
|
these two tasks are generally lower than the average, we believe that |
|
Falcon-180B's performance was not underestimated. |
|
|
|
## Usage |
|
|
|
Please visit our [github repository](https://github.com/01-ai/) for general |
|
guidance on how to use this model. |
|
|
|
## Disclaimer |
|
|
|
Although we use data compliance checking algorithms during the training process |
|
to ensure the compliance of the trained model to the best of our ability, due to |
|
the complexity of the data and the diversity of language model usage scenarios, |
|
we cannot guarantee that the model will generate correct and reasonable output |
|
in all scenarios. Please be aware that there is still a risk of the model |
|
producing problematic outputs. We will not be responsible for any risks and |
|
issues resulting from misuse, misguidance, illegal usage, and related |
|
misinformation, as well as any associated data security concerns. |
|
|
|
## License |
|
|
|
The **Yi** series models are fully open for academic research and free |
|
commercial usage. All usage must be adhered to the [Model License |
|
Agreement](https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE). To apply for |
|
the official commercial license, please contact us |
|
([yi@01.ai](mailto:yi@01.ai)). |
|
|