Russal commited on
Commit
829a3db
·
1 Parent(s): 28d8d00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -119,3 +119,24 @@ model = model.quantize(4).cuda()
119
 
120
  > 说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到,其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果.
121
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
 
120
  > 说明:CMMLU 是一个综合性的中文评估基准,专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到,其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果.
121
 
122
+ ### English Leaderboard
123
+ In addition to Chinese, we also tested the model's performance in English.
124
+
125
+ #### MMLU
126
+
127
+ [MMLU](https://arxiv.org/abs/2009.03300) is an English evaluation dataset that includes 57 multiple-choice tasks, covering elementary mathematics, American history, computer science, law, etc. The difficulty ranges from high school level to expert level, making it a mainstream LLM evaluation dataset.
128
+
129
+ We adopted the [open-source]((https://github.com/hendrycks/test)) evaluation scheme, and the final 5-shot results are as follows:
130
+
131
+ | Model | Humanities | Social Sciences | STEM | Other | Average |
132
+ |----------------------------------------|-----------:|:---------------:|:----:|:-----:|:-------:|
133
+ | LLaMA-7B<sup>2</sup> | 34.0 | 38.3 | 30.5 | 38.1 | 35.1 |
134
+ | Falcon-7B<sup>1</sup> | - | - | - | - | 35.0 |
135
+ | mpt-7B<sup>1</sup> | - | - | - | - | 35.6 |
136
+ | ChatGLM-6B<sup>0</sup> | 35.4 | 41.0 | 31.3 | 40.5 | 36.9 |
137
+ | BLOOM 7B<sup>0</sup> | 25.0 | 24.4 | 26.5 | 26.4 | 25.5 |
138
+ | BLOOMZ 7B<sup>0</sup> | 31.3 | 42.1 | 34.4 | 39.0 | 36.1 |
139
+ | moss-moon-003-base (16B)<sup>0</sup> | 24.2 | 22.8 | 22.4 | 24.4 | 23.6 |
140
+ | moss-moon-003-sft (16B)<sup>0</sup> | 30.5 | 33.8 | 29.3 | 34.4 | 31.9 |
141
+ | Baichuan-7B<sup>0</sup> | 38.4 | 48.9 | 35.6 | 48.1 | 42.3 |
142
+ | **Baichuan-7B<sup>0</sup>** | **38.9** | **49.0** | **35.3** | **48.8** | **42.6** |