Update README.md
Browse files
README.md
CHANGED
@@ -21,66 +21,69 @@ Window context = 4k tokens
|
|
21 |
|
22 |
### OpenLLM Leaderboard
|
23 |
|
24 |
-
TBD
|
25 |
|
26 |
### MT-Bench-French
|
27 |
|
28 |
-
Chocolatine-14B-Instruct-DPO-v1.2
|
29 |
|
30 |
```
|
31 |
########## First turn ##########
|
32 |
-
|
33 |
-
model
|
34 |
-
gpt-4o-mini
|
35 |
-
Chocolatine-14B-Instruct-DPO
|
36 |
-
|
37 |
-
|
38 |
-
Chocolatine-3B-Instruct-DPO-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
|
|
48 |
|
49 |
########## Second turn ##########
|
50 |
-
|
51 |
-
model
|
52 |
-
gpt-4o-mini
|
53 |
-
Chocolatine-14B-Instruct-DPO-v1.2
|
54 |
-
Chocolatine-3B-Instruct-DPO-Revised
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
|
|
66 |
|
67 |
########## Average ##########
|
68 |
-
|
69 |
-
model
|
70 |
-
gpt-4o-mini
|
71 |
-
Chocolatine-14B-Instruct-DPO-v1.2
|
72 |
-
|
73 |
-
Chocolatine-3B-Instruct-DPO-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
|
|
84 |
```
|
85 |
|
86 |
### Usage
|
|
|
21 |
|
22 |
### OpenLLM Leaderboard
|
23 |
|
24 |
+
TBD.
|
25 |
|
26 |
### MT-Bench-French
|
27 |
|
28 |
+
Chocolatine-14B-Instruct-DPO-v1.2 outperforms its previous versions and its base model Phi-3-medium-4k-instruct on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) and GPT-4-Turbo as LLM-judge.
|
29 |
|
30 |
```
|
31 |
########## First turn ##########
|
32 |
+
score
|
33 |
+
model turn
|
34 |
+
gpt-4o-mini 1 9.2875
|
35 |
+
Chocolatine-14B-Instruct-4k-DPO 1 8.6375
|
36 |
+
Chocolatine-14B-Instruct-DPO-v1.2 1 8.6125
|
37 |
+
Phi-3.5-mini-instruct 1 8.5250
|
38 |
+
Chocolatine-3B-Instruct-DPO-v1.2 1 8.3750
|
39 |
+
Phi-3-medium-4k-instruct 1 8.2250
|
40 |
+
gpt-3.5-turbo 1 8.1375
|
41 |
+
Chocolatine-3B-Instruct-DPO-Revised 1 7.9875
|
42 |
+
Daredevil-8B 1 7.8875
|
43 |
+
Meta-Llama-3.1-8B-Instruct 1 7.0500
|
44 |
+
vigostral-7b-chat 1 6.7875
|
45 |
+
Mistral-7B-Instruct-v0.3 1 6.7500
|
46 |
+
gemma-2-2b-it 1 6.4500
|
47 |
+
French-Alpaca-7B-Instruct_beta 1 5.6875
|
48 |
+
vigogne-2-7b-chat 1 5.6625
|
49 |
|
50 |
########## Second turn ##########
|
51 |
+
score
|
52 |
+
model turn
|
53 |
+
gpt-4o-mini 2 8.912500
|
54 |
+
Chocolatine-14B-Instruct-DPO-v1.2 2 8.337500
|
55 |
+
Chocolatine-3B-Instruct-DPO-Revised 2 7.937500
|
56 |
+
Chocolatine-3B-Instruct-DPO-v1.2 2 7.862500
|
57 |
+
Phi-3-medium-4k-instruct 2 7.750000
|
58 |
+
Chocolatine-14B-Instruct-4k-DPO 2 7.737500
|
59 |
+
gpt-3.5-turbo 2 7.679167
|
60 |
+
Phi-3.5-mini-instruct 2 7.575000
|
61 |
+
Daredevil-8B 2 7.087500
|
62 |
+
Meta-Llama-3.1-8B-Instruct 2 6.787500
|
63 |
+
Mistral-7B-Instruct-v0.3 2 6.500000
|
64 |
+
vigostral-7b-chat 2 6.162500
|
65 |
+
gemma-2-2b-it 2 6.100000
|
66 |
+
French-Alpaca-7B-Instruct_beta 2 5.487395
|
67 |
+
vigogne-2-7b-chat 2 2.775000
|
68 |
|
69 |
########## Average ##########
|
70 |
+
score
|
71 |
+
model
|
72 |
+
gpt-4o-mini 9.100000
|
73 |
+
Chocolatine-14B-Instruct-DPO-v1.2 8.475000
|
74 |
+
Chocolatine-14B-Instruct-4k-DPO 8.187500
|
75 |
+
Chocolatine-3B-Instruct-DPO-v1.2 8.118750
|
76 |
+
Phi-3.5-mini-instruct 8.050000
|
77 |
+
Phi-3-medium-4k-instruct 7.987500
|
78 |
+
Chocolatine-3B-Instruct-DPO-Revised 7.962500
|
79 |
+
gpt-3.5-turbo 7.908333
|
80 |
+
Daredevil-8B 7.487500
|
81 |
+
Meta-Llama-3.1-8B-Instruct 6.918750
|
82 |
+
Mistral-7B-Instruct-v0.3 6.625000
|
83 |
+
vigostral-7b-chat 6.475000
|
84 |
+
gemma-2-2b-it 6.275000
|
85 |
+
French-Alpaca-7B-Instruct_beta 5.587866
|
86 |
+
vigogne-2-7b-chat 4.218750
|
87 |
```
|
88 |
|
89 |
### Usage
|