LidongBing commited on
Commit
fc6b1b1
1 Parent(s): bcebb02

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -40,7 +40,7 @@ We introduce **SeaLLMs-v3**, the latest series of the SeaLLMs (Large Language Mo
40
 
41
  SeaLLMs is tailored for handling a wide range of languages spoken in the SEA region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese.
42
 
43
- This page introduces the SeaLLM3-7B-Chat model, specifically fine-tuned to follow human instructions effectively for task completion, making it directly applicable to your applications.
44
 
45
 
46
  ### Get started with `Transformers`
@@ -172,8 +172,8 @@ We conduct our evaluation along two dimensions:
172
  | Qwen2-7B-Instruct| 0.809 | 0.88 | 0.558 | 0.555 | 0.624 | 0.685 | 0.579 |
173
  | Sailor-14B | 0.748 | 0.84 | 0.536 | 0.528 | 0.621 | 0.655 | 0.562 |
174
  | Sailor-14B-Chat | 0.749 | 0.843 | 0.553 | 0.566 | 0.637 | 0.67 | 0.585 |
175
- | SeaLLM3-7B | 0.814 | 0.866 | 0.549 | 0.52 | 0.628 | 0.675 | 0.566 |
176
- | SeaLLM3-7B-Chat | 0.809 | 0.874 | 0.558 | 0.569 | 0.649 | 0.692 | 0.592 |
177
 
178
 
179
  #### Multilingual Instruction-following Capability - SeaBench
@@ -185,7 +185,7 @@ SeaBench consists of multi-turn human instructions spanning various task types.
185
  | SeaLLM-7B-v2.5 | 6.27 | 4.96 | 5.62 | 5.79 | 3.82 | 4.81 | 6.02 | 4.02 | 5.02 | 5.15 |
186
  | Sailor-14B-Chat | 5.26 | 5.53 | 5.40 | 4.62 | 4.36 | 4.49 | 5.31 | 4.74 | 5.03 | 4.97 |
187
  | Sailor-7B-Chat | 4.60 | 4.04 | 4.32 | 3.94 | 3.17 | 3.56 | 4.82 | 3.62 | 4.22 | 4.03 |
188
- | SeaLLM3-7B-Chat | 6.73 | 6.59 | 6.66 | 6.48 | 5.90 | 6.19 | 6.34 | 5.79 | 6.07 | 6.31 |
189
 
190
 
191
  #### Multilingual Math
@@ -201,7 +201,7 @@ We evaluate the multilingual math capability using the MGSM dataset. MGSM origin
201
  | aya-23-8B | 28.8 | 16.4 | 14.4 | 2 | 16 | 12.8 | 15.1 |
202
  | gemma-1.1-7b-it | 58.8 | 32.4 | 34.8 | 31.2 | 39.6 | 35.2 | 38.7 |
203
  | SeaLLM-7B-v2.5 | 79.6 | 69.2 | 70.8 | 61.2 | 66.8 | 62.4 | 68.3 |
204
- | SeaLLM3-7B-Chat | 74.8 | 71.2 | 70.8 | 71.2 | 71.2 | 79.6 | 73.1 |
205
 
206
 
207
  #### Translation
@@ -213,7 +213,7 @@ We use the test sets from Flores-200 for evaluation and report the zero-shot chr
213
  |Qwen2-7B-Instruct | 50.36 | 47.55 | 29.36 | 19.26 | 11.06 | 42.43 | 19.33 | 20.04 | 36.07 | 37.91 | 39.63 | 22.87 | 31.32 |
214
  |Sailor-7B-Chat | 49.4 | 49.78 | 28.33 | 2.68 | 6.85 | 47.75 | 5.35 | 18.23 | 38.92 | 29 | 41.76 | 20.87 | 28.24 |
215
  |SeaLLM-7B-v2.5 | 55.09 | 53.71 | 18.13 | 18.09 | 15.53 | 51.33 | 19.71 | 26.1 | 40.55 | 45.58 | 44.56 | 24.18 | 34.38 |
216
- |SeaLLM3-7B-Chat | 54.68 | 52.52 | 29.86 | 27.3 | 26.34 | 45.04 | 21.54 | 31.93 | 41.52 | 38.51 | 43.78 | 26.1 | 36.52 |
217
 
218
 
219
  ### Model Trustworthiness
@@ -231,7 +231,7 @@ Performance of whether a model can refuse questions about the non-existing entit
231
  | aya-23-8B | 6.38 | 0.79 | 2.83 | 1.98 | 14.80 | 5.36 |
232
  | Llama-3-8B-Instruct | 72.08 | 0.00 | 1.23 | 0.80 | 3.91 | 15.60 |
233
  | gemma-1.1-7b-it | 52.39 | 27.74 | 23.96 | 22.97 | 31.72 | 31.76 |
234
- | SeaLLM3-7B-Chat | 71.36 | 78.39 | 77.93 | 61.31 | 68.95 | 71.588 |
235
 
236
  #### Safety
237
  Multijaildataset consists of harmful prompts in multiple languages. We take those relevant prompts in SEA languages here and report their safe rate (the higher the better).
@@ -243,7 +243,7 @@ Multijaildataset consists of harmful prompts in multiple languages. We take thos
243
  | Meta-Llama-3-8B-Instruct| 0.8825 | 0.2635 | 0.7111 | 0.6984 | 0.7714 | 0.6654 |
244
  | Sailor-14B-Chat | 0.8698 | 0.3048 | 0.5365 | 0.6095 | 0.727 | 0.6095 |
245
  | glm-4-9b-chat | 0.7714 | 0.2127 | 0.3016 | 0.6063 | 0.7492 | 0.52824|
246
- | SeaLLM3-7B-Chat | 0.8889 | 0.6000 | 0.7333 | 0.8381 | 0.927 | 0.7975 |
247
 
248
 
249
  ## Acknowledgement to Our Linguists
 
40
 
41
  SeaLLMs is tailored for handling a wide range of languages spoken in the SEA region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese.
42
 
43
+ This page introduces the SeaLLMs-v3-7B-Chat model, specifically fine-tuned to follow human instructions effectively for task completion, making it directly applicable to your applications.
44
 
45
 
46
  ### Get started with `Transformers`
 
172
  | Qwen2-7B-Instruct| 0.809 | 0.88 | 0.558 | 0.555 | 0.624 | 0.685 | 0.579 |
173
  | Sailor-14B | 0.748 | 0.84 | 0.536 | 0.528 | 0.621 | 0.655 | 0.562 |
174
  | Sailor-14B-Chat | 0.749 | 0.843 | 0.553 | 0.566 | 0.637 | 0.67 | 0.585 |
175
+ | SeaLLMs-v3-7B | 0.814 | 0.866 | 0.549 | 0.52 | 0.628 | 0.675 | 0.566 |
176
+ | SeaLLMs-v3-7B-Chat | 0.809 | 0.874 | 0.558 | 0.569 | 0.649 | 0.692 | 0.592 |
177
 
178
 
179
  #### Multilingual Instruction-following Capability - SeaBench
 
185
  | SeaLLM-7B-v2.5 | 6.27 | 4.96 | 5.62 | 5.79 | 3.82 | 4.81 | 6.02 | 4.02 | 5.02 | 5.15 |
186
  | Sailor-14B-Chat | 5.26 | 5.53 | 5.40 | 4.62 | 4.36 | 4.49 | 5.31 | 4.74 | 5.03 | 4.97 |
187
  | Sailor-7B-Chat | 4.60 | 4.04 | 4.32 | 3.94 | 3.17 | 3.56 | 4.82 | 3.62 | 4.22 | 4.03 |
188
+ | SeaLLMs-v3-7B-Chat | 6.73 | 6.59 | 6.66 | 6.48 | 5.90 | 6.19 | 6.34 | 5.79 | 6.07 | 6.31 |
189
 
190
 
191
  #### Multilingual Math
 
201
  | aya-23-8B | 28.8 | 16.4 | 14.4 | 2 | 16 | 12.8 | 15.1 |
202
  | gemma-1.1-7b-it | 58.8 | 32.4 | 34.8 | 31.2 | 39.6 | 35.2 | 38.7 |
203
  | SeaLLM-7B-v2.5 | 79.6 | 69.2 | 70.8 | 61.2 | 66.8 | 62.4 | 68.3 |
204
+ | SeaLLMs-v3-7B-Chat | 74.8 | 71.2 | 70.8 | 71.2 | 71.2 | 79.6 | 73.1 |
205
 
206
 
207
  #### Translation
 
213
  |Qwen2-7B-Instruct | 50.36 | 47.55 | 29.36 | 19.26 | 11.06 | 42.43 | 19.33 | 20.04 | 36.07 | 37.91 | 39.63 | 22.87 | 31.32 |
214
  |Sailor-7B-Chat | 49.4 | 49.78 | 28.33 | 2.68 | 6.85 | 47.75 | 5.35 | 18.23 | 38.92 | 29 | 41.76 | 20.87 | 28.24 |
215
  |SeaLLM-7B-v2.5 | 55.09 | 53.71 | 18.13 | 18.09 | 15.53 | 51.33 | 19.71 | 26.1 | 40.55 | 45.58 | 44.56 | 24.18 | 34.38 |
216
+ |SeaLLMs-v3-7B-Chat | 54.68 | 52.52 | 29.86 | 27.3 | 26.34 | 45.04 | 21.54 | 31.93 | 41.52 | 38.51 | 43.78 | 26.1 | 36.52 |
217
 
218
 
219
  ### Model Trustworthiness
 
231
  | aya-23-8B | 6.38 | 0.79 | 2.83 | 1.98 | 14.80 | 5.36 |
232
  | Llama-3-8B-Instruct | 72.08 | 0.00 | 1.23 | 0.80 | 3.91 | 15.60 |
233
  | gemma-1.1-7b-it | 52.39 | 27.74 | 23.96 | 22.97 | 31.72 | 31.76 |
234
+ | SeaLLMs-v3-7B-Chat | 71.36 | 78.39 | 77.93 | 61.31 | 68.95 | 71.588 |
235
 
236
  #### Safety
237
  Multijaildataset consists of harmful prompts in multiple languages. We take those relevant prompts in SEA languages here and report their safe rate (the higher the better).
 
243
  | Meta-Llama-3-8B-Instruct| 0.8825 | 0.2635 | 0.7111 | 0.6984 | 0.7714 | 0.6654 |
244
  | Sailor-14B-Chat | 0.8698 | 0.3048 | 0.5365 | 0.6095 | 0.727 | 0.6095 |
245
  | glm-4-9b-chat | 0.7714 | 0.2127 | 0.3016 | 0.6063 | 0.7492 | 0.52824|
246
+ | SeaLLMs-v3-7B-Chat | 0.8889 | 0.6000 | 0.7333 | 0.8381 | 0.927 | 0.7975 |
247
 
248
 
249
  ## Acknowledgement to Our Linguists