pom
commited on
Commit
•
3c8b1bc
1
Parent(s):
0194585
update readme
Browse files
README.md
CHANGED
@@ -55,17 +55,14 @@ In order to validate the various abilities of the model, we have chosen several
|
|
55 |
| :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
|
56 |
| Baichuan-7B | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> | 34.4<sup>2</sup> | 36.3<sup>2</sup> | 44.3 |
|
57 |
| Baichuan2-7B-Base | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> | 42.7<sup>2</sup> | 47.5<sup>2</sup> | 53.1 |
|
58 |
-
| Baichuan2-7B-Chat | fine-tuned | 53.2 | 52.2 | 41.3 | 49.7 | 66.6 |
|
59 |
| ChatGLM2-6B | fine-tuned | 45.5<sup>2</sup> | 50.1<sup>2</sup> | 42.6 | 54.2 | 59.7 |
|
60 |
| Falcon-7B | pretrained | 27.8<sup>2</sup> | 25.8 | 26.2 | 26.3 | 29.9 |
|
61 |
| InternLM-7B | pretrained | 51.0<sup>2</sup> | 52.4 | 34.1 | 53.6 | 32.3 |
|
62 |
-
| InternLM-7B-Chat | fine-tuned | 50.8<sup>2</sup> | 52.8 | 39.0 | 67.4 | 43.9 |
|
63 |
| Llama-7B | pretrained | 35.1<sup>2</sup> | 27.0 | 27.4 | 26.0 | 30.1 |
|
64 |
| Llama-2-7B | pretrained | 45.3<sup>2</sup> | 28.9 | 27.0 | 27.8 | 47.8 |
|
65 |
| MPT-7B | pretrained | 29.6<sup>2</sup> | 27.8 | 24.2 | 25.3 | 28.1 |
|
66 |
| Vicuna-7B-v1.5 | fine-tuned | 49.8<sup>2</sup> | 22.9 | 26.7 | 24.4 | 61.1 |
|
67 |
| **XVERSE-7B** | pretrained | 56.6 | **57.1** | 46.9 | **61.7** | 71.1 |
|
68 |
-
| **XVERSE-7B-Chat** | fine-tuned | **63.7** | 55.4 | **48.9** | 57.5 | **78.2** |
|
69 |
|
70 |
> <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
|
71 |
> <sup>2: Reporting results from official results of each model.</sup>
|
@@ -76,18 +73,14 @@ In order to validate the various abilities of the model, we have chosen several
|
|
76 |
|
77 |
MMLU Category Results
|
78 |
|
79 |
-
| Models | Type |
|
80 |
-
| :----------------: | :--------: |
|
81 |
-
| Baichuan-7B | pretrained |
|
82 |
-
|
|
83 |
-
|
|
84 |
-
|
|
85 |
-
|
|
86 |
-
|
|
87 |
-
| Llama-2-7B | pretrained | 45.3<sup>2</sup> | 28.9 | 27.0 | 27.8 | 47.8 |
|
88 |
-
| MPT-7B | pretrained | 29.6<sup>2</sup> | 27.8 | 24.2 | 25.3 | 28.1 |
|
89 |
-
| Vicuna-7B-v1.5 | fine-tuned | 49.8<sup>2</sup> | 22.9 | 26.7 | 24.4 | 61.1 |
|
90 |
-
| **XVERSE-7B** | pretrained | 56.6 | **57.1** | 46.9 | **61.7** | 71.1 |
|
91 |
|
92 |
### C-Eval 各类别指标
|
93 |
|
|
|
55 |
| :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
|
56 |
| Baichuan-7B | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> | 34.4<sup>2</sup> | 36.3<sup>2</sup> | 44.3 |
|
57 |
| Baichuan2-7B-Base | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> | 42.7<sup>2</sup> | 47.5<sup>2</sup> | 53.1 |
|
|
|
58 |
| ChatGLM2-6B | fine-tuned | 45.5<sup>2</sup> | 50.1<sup>2</sup> | 42.6 | 54.2 | 59.7 |
|
59 |
| Falcon-7B | pretrained | 27.8<sup>2</sup> | 25.8 | 26.2 | 26.3 | 29.9 |
|
60 |
| InternLM-7B | pretrained | 51.0<sup>2</sup> | 52.4 | 34.1 | 53.6 | 32.3 |
|
|
|
61 |
| Llama-7B | pretrained | 35.1<sup>2</sup> | 27.0 | 27.4 | 26.0 | 30.1 |
|
62 |
| Llama-2-7B | pretrained | 45.3<sup>2</sup> | 28.9 | 27.0 | 27.8 | 47.8 |
|
63 |
| MPT-7B | pretrained | 29.6<sup>2</sup> | 27.8 | 24.2 | 25.3 | 28.1 |
|
64 |
| Vicuna-7B-v1.5 | fine-tuned | 49.8<sup>2</sup> | 22.9 | 26.7 | 24.4 | 61.1 |
|
65 |
| **XVERSE-7B** | pretrained | 56.6 | **57.1** | 46.9 | **61.7** | 71.1 |
|
|
|
66 |
|
67 |
> <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
|
68 |
> <sup>2: Reporting results from official results of each model.</sup>
|
|
|
73 |
|
74 |
MMLU Category Results
|
75 |
|
76 |
+
| Models | Type | Average | STEM | Social Science | Humanities | Others |
|
77 |
+
| :----------------: | :--------: | :------: | :------: | :------------: | :--------: | :------: |
|
78 |
+
| Baichuan-7B | pretrained | 42.3 | 35.6 | 48.9 | 38.4 | 48.1 |
|
79 |
+
| ChatGLM2-6B | pretrained | 45.5 | 40.1 | 51.6 | 41.2 | 51.2 |
|
80 |
+
| InternLM-7B | pretrained | 51.0 | **58.7** | 43.5 | **52.7** | 53.2 |
|
81 |
+
| LLaMA-7B | pretrained | 35.1 | 30.5 | 38.3 | 34.0 | 38.1 |
|
82 |
+
| LLaMA2-7B | pretrained | 45.3 | 36.4 | 51.2 | 42.9 | 52.2 |
|
83 |
+
| **XVERSE-7B** | pretrained | **56.6** | 45.6 | **65.3** | 50.4 | **65.5** |
|
|
|
|
|
|
|
|
|
84 |
|
85 |
### C-Eval 各类别指标
|
86 |
|