llm-blender
/

PairRM

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Dongfu Jiang commited on Nov 25, 2023

Commit

240e3c1

•

1 Parent(s): c1046ee

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -224,9 +224,9 @@ We test the pairwise comparison on
 |    Vicuna -13B-v1.5   |    30.6   |    23.6   |     35    |    28.3   |    36.1   |    37.5   |  45.5 |   39.8   |    37.3   |
 |   WizardLM -13B-v1.2  |    22.2   |    20.8   |    32.5   |    19.2   |    28.7   |    25.4   |  29.2 |    33    |    27.8   |
 |   LLAMA -2-chat -70B  |    34.7   |    33.3   |    36.7   |    35.8   |    51.4   |    54.2   |  47.2 |   47.7   |    45.9   |
-|       AUTO -J (13b)       |    45.8   |    38.9   |    **59.2**   |    47.5   |    54.6   |    57.1   |   **58**  |   57.6   |    54.8   |
-|       UltraRM (13b)       |    56.94  |    43.06  |    55.0   |    53.33  |    67.13  |   **64.17**   |   56.25  |   59.85   |    **59.85**   |
-|         **PairRM (0.4b)**       | **56.94** | **52.78** | 58.33 | **55.83** | **61.57** | 59.17 | 57.64 | **62.5** | 59.05 |
 #### HHH-Alignment and MT-bench human judgements

 |    Vicuna -13B-v1.5   |    30.6   |    23.6   |     35    |    28.3   |    36.1   |    37.5   |  45.5 |   39.8   |    37.3   |
 |   WizardLM -13B-v1.2  |    22.2   |    20.8   |    32.5   |    19.2   |    28.7   |    25.4   |  29.2 |    33    |    27.8   |
 |   LLAMA -2-chat -70B  |    34.7   |    33.3   |    36.7   |    35.8   |    51.4   |    54.2   |  47.2 |   47.7   |    45.9   |
+|       AUTO -J (13b)       |    45.8   |    38.9   |  **59.2** |    47.5   |    54.6   |    57.1   |  **58**  |   57.6    |    54.8   |
+|       UltraRM (13b)       |    56.94  |    43.06  |    55.0   |    53.33  | **67.13** | **64.17** |   56.25  |   59.85   |    **59.85**   |
+|         **PairRM (0.4b)**       | **56.94** | **52.78** | 58.33 | **55.83** |   61.57   | 59.17 | 57.64 | **62.5** | 59.05 |
 #### HHH-Alignment and MT-bench human judgements