SeaLLMs
/

SeaLLM-7B-v2.5

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Zhiqiang007 commited on Apr 4

Commit

8d22163

•

1 Parent(s): dbfd674

Update README.md

Files changed (1) hide show

README.md +37 -0

README.md CHANGED Viewed

@@ -75,6 +75,43 @@ By using our released weights, codes, and demos, you agree to and comply with th
 ## Evaluation
 ### Multilingual World Knowledge

 ## Evaluation
+### Zero-shot CoT Multilingual Math Reasoning
+<!--
+[SeaLLM-7B-v2](https://huggingface.co/SeaLLMs/SeaLLM-7B-v2) achieves with **78.5** score on the GSM8K with zero-shot CoT reasoning, making it the **state of the art** in the realm of 7B models. It also outperforms GPT-3.5 in the same GSM8K benchmark as translated into SEA languages (🇨🇳 🇻🇳 🇮🇩 🇹🇭). [SeaLLM-7B-v2](https://huggingface.co/SeaLLMs/SeaLLM-7B-v2) also surpasses GPT-3.5 on the Thai-translated MATH benchmark, with **28.4** vs 18.1 scores.
+![fig_sea_math_side_by_side.png](fig_sea_math_side_by_side.png)
+-->
+<details>
+<summary>See details on English and translated GSM8K and MATH with zero-shot reasoning</summary>
+<br>
+| Model | GSM8K<br>en | MATH<br>en | GSM8K<br>zh | MATH<br>zh | GSM8K<br>vi | MATH<br>vi | GSM8K<br>id | MATH<br>id | GSM8K<br>th | MATH<br>th
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| GPT-3.5 | 80.8 | 34.1 | 48.2 | 21.5 | 55 | 26.5 | 64.3 | 26.4 | 35.8 | 18.1
+| Qwen-14B-chat | 61.4 | 18.4 | 41.6 | 11.8 | 33.6 | 3.6 | 44.7 | 8.6 | 22 | 6
+| Vistral-7b-chat | 48.2 | 12.5 |  |  | 48.7 | 3.1 |  |  |  |
+| Qwen1.5-7B-chat | 56.8 | 15.3 | 40 | 2.7 | 37.7 | 9 | 36.9 | 7.7 | 21.9 |
+| SeaLLM-7B-v2 | 78.2 | 27.5 | 53.7 | 17.6 | 69.9 | 23.8 | 71.5 | 24.4 | 59.6 | 22.4
+| SeaLLM-7B-v2.5 | 78.5 | 34.9 | 51.3 | 22.1 | 72.3 | 30.2 | 71.5 | 30.1 | 62.0 | 28.4
+</details>
+Baselines were evaluated using their respective chat-template and system prompts ([Qwen1.5-7B-chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat/blob/main/tokenizer_config.json), [Vistral](https://huggingface.co/Viet-Mistral/Vistral-7B-Chat)).
+#### Zero-shot MGSM
+[SeaLLM-7B-v2](https://huggingface.co/SeaLLMs/SeaLLM-7B-v2) also outperforms GPT-3.5 and Qwen-14B on the multilingual MGSM for Zh and Th.
+| Model | MGSM-Zh | MGSM-Th
+|-----| -----  | ---
+| ChatGPT (reported) | 61.2 | 47.2
+| Qwen-14B-chat | 59.6 | 28
+| SeaLLM-7B-v2 | **64.8** | 62.4
+| SeaLLM-7B-v2.5 | 58.0 | **64.8**
 ### Multilingual World Knowledge