xverse
/

XVERSE-7B

Text Generation

Transformers

PyTorch

xverse

custom_code

Model card Files Files and versions Community

pom commited on Sep 26, 2023

Commit

3c8b1bc

•

1 Parent(s): 0194585

update readme

Browse files

Files changed (1) hide show

README.md +8 -15

README.md CHANGED Viewed

@@ -55,17 +55,14 @@ In order to validate the various abilities of the model, we have chosen several
 | :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
 |    Baichuan-7B     | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> |  34.4<sup>2</sup>   |     36.3<sup>2</sup>     |            44.3            |
 | Baichuan2-7B-Base  | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> |  42.7<sup>2</sup>   |     47.5<sup>2</sup>     |            53.1            |
-| Baichuan2-7B-Chat  | fine-tuned |       53.2       |       52.2       |        41.3         |           49.7           |            66.6            |
 |    ChatGLM2-6B     | fine-tuned | 45.5<sup>2</sup> | 50.1<sup>2</sup> |        42.6         |           54.2           |            59.7            |
 |     Falcon-7B      | pretrained | 27.8<sup>2</sup> |       25.8       |        26.2         |           26.3           |            29.9            |
 |    InternLM-7B     | pretrained | 51.0<sup>2</sup> |       52.4       |        34.1         |           53.6           |            32.3            |
-|  InternLM-7B-Chat  | fine-tuned | 50.8<sup>2</sup> |       52.8       |        39.0         |           67.4           |            43.9            |
 |      Llama-7B      | pretrained | 35.1<sup>2</sup> |       27.0       |        27.4         |           26.0           |            30.1            |
 |     Llama-2-7B     | pretrained | 45.3<sup>2</sup> |       28.9       |        27.0         |           27.8           |            47.8            |
 |       MPT-7B       | pretrained | 29.6<sup>2</sup> |       27.8       |        24.2         |           25.3           |            28.1            |
 |   Vicuna-7B-v1.5   | fine-tuned | 49.8<sup>2</sup> |       22.9       |        26.7         |           24.4           |            61.1            |
 |   **XVERSE-7B**    | pretrained |       56.6       |     **57.1**     |        46.9         |         **61.7**         |            71.1            |
-| **XVERSE-7B-Chat** | fine-tuned |     **63.7**     |       55.4       |      **48.9**       |           57.5           |          **78.2**          |
 > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
 > <sup>2: Reporting results from official results of each model.</sup>
@@ -76,18 +73,14 @@ In order to validate the various abilities of the model, we have chosen several
 MMLU Category Results
-|       Models       |    Type    |       MMLU       |      C-Eval      | AGIEval<sup>1</sup> | GAOKAO-Bench<sup>1</sup> | GAOKAO-English<sup>1</sup> |
-| :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
-|    Baichuan-7B     | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> |  34.4<sup>2</sup>   |           36.3           |            44.3            |
-| Baichuan2-7B-Base  | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> |  42.7<sup>2</sup>   |           47.5           |            53.1            |
-|    ChatGLM2-6B     | fine-tuned | 45.5<sup>2</sup> |       39.9       |        42.6         |           54.2           |            59.7            |
-|     Falcon-7B      | pretrained | 27.8<sup>2</sup> |       25.8       |        26.2         |           26.3           |            29.9            |
-|    InternLM-7B     | pretrained | 51.0<sup>2</sup> |       52.4       |        34.1         |           53.6           |            32.3            |
-|      Llama-7B      | pretrained | 35.1<sup>2</sup> |       27.0       |        27.4         |           26.0           |            30.1            |
-|     Llama-2-7B     | pretrained | 45.3<sup>2</sup> |       28.9       |        27.0         |           27.8           |            47.8            |
-|       MPT-7B       | pretrained | 29.6<sup>2</sup> |       27.8       |        24.2         |           25.3           |            28.1            |
-|   Vicuna-7B-v1.5   | fine-tuned | 49.8<sup>2</sup> |       22.9       |        26.7         |           24.4           |            61.1            |
-|   **XVERSE-7B**    | pretrained |       56.6       |     **57.1**     |        46.9         |         **61.7**         |            71.1            |
 ### C-Eval 各类别指标

 | :----------------: | :--------: | :--------------: | :--------------: | :-----------------: | :----------------------: | :------------------------: |
 |    Baichuan-7B     | pretrained | 42.3<sup>2</sup> | 42.8<sup>2</sup> |  34.4<sup>2</sup>   |     36.3<sup>2</sup>     |            44.3            |
 | Baichuan2-7B-Base  | pretrained | 54.2<sup>2</sup> | 54.0<sup>2</sup> |  42.7<sup>2</sup>   |     47.5<sup>2</sup>     |            53.1            |
 |    ChatGLM2-6B     | fine-tuned | 45.5<sup>2</sup> | 50.1<sup>2</sup> |        42.6         |           54.2           |            59.7            |
 |     Falcon-7B      | pretrained | 27.8<sup>2</sup> |       25.8       |        26.2         |           26.3           |            29.9            |
 |    InternLM-7B     | pretrained | 51.0<sup>2</sup> |       52.4       |        34.1         |           53.6           |            32.3            |
 |      Llama-7B      | pretrained | 35.1<sup>2</sup> |       27.0       |        27.4         |           26.0           |            30.1            |
 |     Llama-2-7B     | pretrained | 45.3<sup>2</sup> |       28.9       |        27.0         |           27.8           |            47.8            |
 |       MPT-7B       | pretrained | 29.6<sup>2</sup> |       27.8       |        24.2         |           25.3           |            28.1            |
 |   Vicuna-7B-v1.5   | fine-tuned | 49.8<sup>2</sup> |       22.9       |        26.7         |           24.4           |            61.1            |
 |   **XVERSE-7B**    | pretrained |       56.6       |     **57.1**     |        46.9         |         **61.7**         |            71.1            |
 > <sup>1: Tests are conducted only on single-answer multiple-choice questions, thus excluding fill-in-the-blanks, open-ended questions, and multiple-answer multiple-choice questions.</sup>
 > <sup>2: Reporting results from official results of each model.</sup>
 MMLU Category Results
+|       Models       |    Type    | Average  |   STEM   | Social Science | Humanities |  Others  |
+| :----------------: | :--------: | :------: | :------: | :------------: | :--------: | :------: |
+|    Baichuan-7B     | pretrained |   42.3   |   35.6   |      48.9      |    38.4    |   48.1   |
+|    ChatGLM2-6B     | pretrained |   45.5   |   40.1   |      51.6      |    41.2    |   51.2   |
+|    InternLM-7B     | pretrained |   51.0   | **58.7** |      43.5      |  **52.7**  |   53.2   |
+|      LLaMA-7B      | pretrained |   35.1   |   30.5   |      38.3      |    34.0    |   38.1   |
+|     LLaMA2-7B      | pretrained |   45.3   |   36.4   |      51.2      |    42.9    |   52.2   |
+|   **XVERSE-7B**    | pretrained | **56.6** |   45.6   |    **65.3**    |    50.4    | **65.5** |
 ### C-Eval 各类别指标