pom commited on
Commit
3c8b1bc
1 Parent(s): 0194585

update readme

Browse files
Files changed (1) hide show
  1. README.md +8 -15
README.md CHANGED
@@ -55,17 +55,14 @@ In order to validate the various abilities of the model, we have chosen several
55
  | :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
56
  | Baichuan-7B | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> | 34.4<sup>2</sup> | 36.3<sup>2</sup> | 44.3 |
57
  | Baichuan2-7B-Base | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> | 42.7<sup>2</sup> | 47.5<sup>2</sup> | 53.1 |
58
- | Baichuan2-7B-Chat | fine-tuned | 53.2 | 52.2 | 41.3 | 49.7 | 66.6 |
59
  | ChatGLM2-6B | fine-tuned | 45.5<sup>2</sup> | 50.1<sup>2</sup> | 42.6 | 54.2 | 59.7 |
60
  | Falcon-7B | pretrained | 27.8<sup>2</sup> | 25.8 | 26.2 | 26.3 | 29.9 |
61
  | InternLM-7B | pretrained | 51.0<sup>2</sup> | 52.4 | 34.1 | 53.6 | 32.3 |
62
- | InternLM-7B-Chat | fine-tuned | 50.8<sup>2</sup> | 52.8 | 39.0 | 67.4 | 43.9 |
63
  | Llama-7B | pretrained | 35.1<sup>2</sup> | 27.0 | 27.4 | 26.0 | 30.1 |
64
  | Llama-2-7B | pretrained | 45.3<sup>2</sup> | 28.9 | 27.0 | 27.8 | 47.8 |
65
  | MPT-7B | pretrained | 29.6<sup>2</sup> | 27.8 | 24.2 | 25.3 | 28.1 |
66
  | Vicuna-7B-v1.5 | fine-tuned | 49.8<sup>2</sup> | 22.9 | 26.7 | 24.4 | 61.1 |
67
  | **XVERSE-7B** | pretrained | 56.6 | **57.1** | 46.9 | **61.7** | 71.1 |
68
- | **XVERSE-7B-Chat** | fine-tuned | **63.7** | 55.4 | **48.9** | 57.5 | **78.2** |
69
 
70
  > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
71
  > <sup>2: Reporting results from official results of each model.</sup>
@@ -76,18 +73,14 @@ In order to validate the various abilities of the model, we have chosen several
76
 
77
  MMLU Category Results
78
 
79
- | Models | Type | MMLU | C-Eval | AGIEval<sup>1</sup> | GAOKAO-Bench<sup>1</sup> | GAOKAO-English<sup>1</sup> |
80
- | :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
81
- | Baichuan-7B | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> | 34.4<sup>2</sup> | 36.3 | 44.3 |
82
- | Baichuan2-7B-Base | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> | 42.7<sup>2</sup> | 47.5 | 53.1 |
83
- | ChatGLM2-6B | fine-tuned | 45.5<sup>2</sup> | 39.9 | 42.6 | 54.2 | 59.7 |
84
- | Falcon-7B | pretrained | 27.8<sup>2</sup> | 25.8 | 26.2 | 26.3 | 29.9 |
85
- | InternLM-7B | pretrained | 51.0<sup>2</sup> | 52.4 | 34.1 | 53.6 | 32.3 |
86
- | Llama-7B | pretrained | 35.1<sup>2</sup> | 27.0 | 27.4 | 26.0 | 30.1 |
87
- | Llama-2-7B | pretrained | 45.3<sup>2</sup> | 28.9 | 27.0 | 27.8 | 47.8 |
88
- | MPT-7B | pretrained | 29.6<sup>2</sup> | 27.8 | 24.2 | 25.3 | 28.1 |
89
- | Vicuna-7B-v1.5 | fine-tuned | 49.8<sup>2</sup> | 22.9 | 26.7 | 24.4 | 61.1 |
90
- | **XVERSE-7B** | pretrained | 56.6 | **57.1** | 46.9 | **61.7** | 71.1 |
91
 
92
  ### C-Eval 各类别指标
93
 
 
55
  | :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
56
  | Baichuan-7B | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> | 34.4<sup>2</sup> | 36.3<sup>2</sup> | 44.3 |
57
  | Baichuan2-7B-Base | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> | 42.7<sup>2</sup> | 47.5<sup>2</sup> | 53.1 |
 
58
  | ChatGLM2-6B | fine-tuned | 45.5<sup>2</sup> | 50.1<sup>2</sup> | 42.6 | 54.2 | 59.7 |
59
  | Falcon-7B | pretrained | 27.8<sup>2</sup> | 25.8 | 26.2 | 26.3 | 29.9 |
60
  | InternLM-7B | pretrained | 51.0<sup>2</sup> | 52.4 | 34.1 | 53.6 | 32.3 |
 
61
  | Llama-7B | pretrained | 35.1<sup>2</sup> | 27.0 | 27.4 | 26.0 | 30.1 |
62
  | Llama-2-7B | pretrained | 45.3<sup>2</sup> | 28.9 | 27.0 | 27.8 | 47.8 |
63
  | MPT-7B | pretrained | 29.6<sup>2</sup> | 27.8 | 24.2 | 25.3 | 28.1 |
64
  | Vicuna-7B-v1.5 | fine-tuned | 49.8<sup>2</sup> | 22.9 | 26.7 | 24.4 | 61.1 |
65
  | **XVERSE-7B** | pretrained | 56.6 | **57.1** | 46.9 | **61.7** | 71.1 |
 
66
 
67
  > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
68
  > <sup>2: Reporting results from official results of each model.</sup>
 
73
 
74
  MMLU Category Results
75
 
76
+ | Models | Type | Average | STEM | Social Science | Humanities | Others |
77
+ | :----------------: | :--------: | :------: | :------: | :------------: | :--------: | :------: |
78
+ | Baichuan-7B | pretrained | 42.3 | 35.6 | 48.9 | 38.4 | 48.1 |
79
+ | ChatGLM2-6B | pretrained | 45.5 | 40.1 | 51.6 | 41.2 | 51.2 |
80
+ | InternLM-7B | pretrained | 51.0 | **58.7** | 43.5 | **52.7** | 53.2 |
81
+ | LLaMA-7B | pretrained | 35.1 | 30.5 | 38.3 | 34.0 | 38.1 |
82
+ | LLaMA2-7B | pretrained | 45.3 | 36.4 | 51.2 | 42.9 | 52.2 |
83
+ | **XVERSE-7B** | pretrained | **56.6** | 45.6 | **65.3** | 50.4 | **65.5** |
 
 
 
 
84
 
85
  ### C-Eval 各类别指标
86