Update README.md
Browse files
README.md
CHANGED
@@ -119,3 +119,24 @@ model = model.quantize(4).cuda()
|
|
119 |
|
120 |
> 说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到,其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果.
|
121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
119 |
|
120 |
> 说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到,其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果.
|
121 |
|
122 |
+
### English Leaderboard
|
123 |
+
In addition to Chinese, we also tested the model's performance in English.
|
124 |
+
|
125 |
+
#### MMLU
|
126 |
+
|
127 |
+
[MMLU](https://arxiv.org/abs/2009.03300) is an English evaluation dataset that includes 57 multiple-choice tasks, covering elementary mathematics, American history, computer science, law, etc. The difficulty ranges from high school level to expert level, making it a mainstream LLM evaluation dataset.
|
128 |
+
|
129 |
+
We adopted the [open-source]((https://github.com/hendrycks/test)) evaluation scheme, and the final 5-shot results are as follows:
|
130 |
+
|
131 |
+
| Model | Humanities | Social Sciences | STEM | Other | Average |
|
132 |
+
|----------------------------------------|-----------:|:---------------:|:----:|:-----:|:-------:|
|
133 |
+
| LLaMA-7B<sup>2</sup> | 34.0 | 38.3 | 30.5 | 38.1 | 35.1 |
|
134 |
+
| Falcon-7B<sup>1</sup> | - | - | - | - | 35.0 |
|
135 |
+
| mpt-7B<sup>1</sup> | - | - | - | - | 35.6 |
|
136 |
+
| ChatGLM-6B<sup>0</sup> | 35.4 | 41.0 | 31.3 | 40.5 | 36.9 |
|
137 |
+
| BLOOM 7B<sup>0</sup> | 25.0 | 24.4 | 26.5 | 26.4 | 25.5 |
|
138 |
+
| BLOOMZ 7B<sup>0</sup> | 31.3 | 42.1 | 34.4 | 39.0 | 36.1 |
|
139 |
+
| moss-moon-003-base (16B)<sup>0</sup> | 24.2 | 22.8 | 22.4 | 24.4 | 23.6 |
|
140 |
+
| moss-moon-003-sft (16B)<sup>0</sup> | 30.5 | 33.8 | 29.3 | 34.4 | 31.9 |
|
141 |
+
| Baichuan-7B<sup>0</sup> | 38.4 | 48.9 | 35.6 | 48.1 | 42.3 |
|
142 |
+
| **Baichuan-7B<sup>0</sup>** | **38.9** | **49.0** | **35.3** | **48.8** | **42.6** |
|