Dongfu Jiang
commited on
Commit
•
240e3c1
1
Parent(s):
c1046ee
Update README.md
Browse files
README.md
CHANGED
@@ -224,9 +224,9 @@ We test the pairwise comparison on
|
|
224 |
| Vicuna -13B-v1.5 | 30.6 | 23.6 | 35 | 28.3 | 36.1 | 37.5 | 45.5 | 39.8 | 37.3 |
|
225 |
| WizardLM -13B-v1.2 | 22.2 | 20.8 | 32.5 | 19.2 | 28.7 | 25.4 | 29.2 | 33 | 27.8 |
|
226 |
| LLAMA -2-chat -70B | 34.7 | 33.3 | 36.7 | 35.8 | 51.4 | 54.2 | 47.2 | 47.7 | 45.9 |
|
227 |
-
| AUTO -J (13b) | 45.8 | 38.9 |
|
228 |
-
| UltraRM (13b) | 56.94 | 43.06 | 55.0 | 53.33 |
|
229 |
-
| **PairRM (0.4b)** | **56.94** | **52.78** | 58.33 | **55.83** |
|
230 |
|
231 |
#### HHH-Alignment and MT-bench human judgements
|
232 |
|
|
|
224 |
| Vicuna -13B-v1.5 | 30.6 | 23.6 | 35 | 28.3 | 36.1 | 37.5 | 45.5 | 39.8 | 37.3 |
|
225 |
| WizardLM -13B-v1.2 | 22.2 | 20.8 | 32.5 | 19.2 | 28.7 | 25.4 | 29.2 | 33 | 27.8 |
|
226 |
| LLAMA -2-chat -70B | 34.7 | 33.3 | 36.7 | 35.8 | 51.4 | 54.2 | 47.2 | 47.7 | 45.9 |
|
227 |
+
| AUTO -J (13b) | 45.8 | 38.9 | **59.2** | 47.5 | 54.6 | 57.1 | **58** | 57.6 | 54.8 |
|
228 |
+
| UltraRM (13b) | 56.94 | 43.06 | 55.0 | 53.33 | **67.13** | **64.17** | 56.25 | 59.85 | **59.85** |
|
229 |
+
| **PairRM (0.4b)** | **56.94** | **52.78** | 58.33 | **55.83** | 61.57 | 59.17 | 57.64 | **62.5** | 59.05 |
|
230 |
|
231 |
#### HHH-Alignment and MT-bench human judgements
|
232 |
|