Dongfu Jiang
commited on
Commit
•
a2f8211
1
Parent(s):
edac579
Update README.md
Browse files
README.md
CHANGED
@@ -48,11 +48,11 @@ We test the pairwise comparison on
|
|
48 |
|
49 |
| Model | Summ | Exam | Code | Rewriting | Crea W | Func W | Comm | NLP | Overall |
|
50 |
|:---------------------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:-----:|:--------:|:---------:|
|
51 |
-
| Closed -source Models |
|
52 |
| ChatGPT | 33.3 | 40.3 | 36.6 | 31.6 | 48.2 | 40.4 | 47.6 | 45.8 | 42.7 |
|
53 |
| Claude -2 | 30.6 | 36.1 | 41.7 | 34.2 | 48.1 | 42.5 | 40.6 | 48.5 | 42.4 |
|
54 |
| GPT -4 | 59.7 | 51.4 | 69.2 | 58.3 | 66.7 | 60.4 | 58.3 | 65.2 | 61.9 |
|
55 |
-
| Open -source Models |
|
56 |
| SteamSHP | 33.3 | 29.2 | 26.7 | 33.3 | 40.7 | 31.3 | 51.4 | 51.9 | 40.6 |
|
57 |
| PandaLM | 29.2 | 33.3 | 31.7 | 23.3 | 43.5 | 32.9 | 44.8 | 48.9 | 38.9 |
|
58 |
| LLaMA -2-Chat -13B | 20.8 | 27.8 | 19.2 | 20 | 31.5 | 27.5 | 35.8 | 31.8 | 29 |
|
|
|
48 |
|
49 |
| Model | Summ | Exam | Code | Rewriting | Crea W | Func W | Comm | NLP | Overall |
|
50 |
|:---------------------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:-----:|:--------:|:---------:|
|
51 |
+
| Closed -source Models |
|
52 |
| ChatGPT | 33.3 | 40.3 | 36.6 | 31.6 | 48.2 | 40.4 | 47.6 | 45.8 | 42.7 |
|
53 |
| Claude -2 | 30.6 | 36.1 | 41.7 | 34.2 | 48.1 | 42.5 | 40.6 | 48.5 | 42.4 |
|
54 |
| GPT -4 | 59.7 | 51.4 | 69.2 | 58.3 | 66.7 | 60.4 | 58.3 | 65.2 | 61.9 |
|
55 |
+
| Open -source Models |
|
56 |
| SteamSHP | 33.3 | 29.2 | 26.7 | 33.3 | 40.7 | 31.3 | 51.4 | 51.9 | 40.6 |
|
57 |
| PandaLM | 29.2 | 33.3 | 31.7 | 23.3 | 43.5 | 32.9 | 44.8 | 48.9 | 38.9 |
|
58 |
| LLaMA -2-Chat -13B | 20.8 | 27.8 | 19.2 | 20 | 31.5 | 27.5 | 35.8 | 31.8 | 29 |
|