waiyiaisg commited on
Commit
9f28f9b
1 Parent(s): e261300

update english base results

Browse files
Files changed (1) hide show
  1. README.md +22 -5
README.md CHANGED
@@ -33,13 +33,30 @@ The continued pre-training data for LLaMA3 8B SEA-LIONv2 base model encompasses
33
  - **Languages:** English, Indonesian, Thai, Vietnamese, Tamil
34
  - **License:** [LLaMA3 Community License](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE)
35
 
36
- ### Performance Benchmarks
 
37
 
38
- LLaMA3 8B SEA-LIONv has a similar English performance with LLaMA3-8B-Base model:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
- | Model | ARC | BBH | HellaSwag | MMLU | GSM8k | Average |
41
- |----------------------|:-----:|:-----:|:---------:|:-----:|:------:|:-------:|
42
- | LLaMA3 8B SEA-LIONv2 | 58.87 | 47.70 | 81.14 | 63.11 | 50.49 | 60.26 |
43
 
44
  ## Training Details
45
 
 
33
  - **Languages:** English, Indonesian, Thai, Vietnamese, Tamil
34
  - **License:** [LLaMA3 Community License](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE)
35
 
36
+ ### Benchmark Performance
37
+ We evaluated LLaMA3 8B SEA-LIONv2 base model on general language capabilities.
38
 
39
+ #### General Language Capabilities
40
+ For the evaluation of general language capabilities, we employed the [BHASA evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
41
+ These tasks include Question Answering (QA), Sentiment Analysis (Sentiment), Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng), Abstractive Summarization (Summ), Causal Reasoning (Causal) and Natural Language Inference (NLI).
42
+
43
+ The evaluation was done **five-shot** with native prompts and only a sample of 100-1000 instances for each dataset was used as per the setting described in the paper.
44
+
45
+ **BHASA**
46
+
47
+
48
+
49
+ **English**
50
+
51
+ | Model | ARC | BBH | HellaSwag | MMLU | GSM8k | Average |
52
+ | ---------------------------------------- | ----- | ----- | --------- | ----- | ----- | ------- |
53
+ | aisingapore/llama3-8b-cpt-sealionv2-base | 58.87 | 47.70 | 81.14 | 63.11 | 50.49 | 60.26 |
54
+ | google/gemma-2-9b | 68.00 | 53.53 | 82.73 | 70.26 | 63.53 | 67.61 |
55
+ | meta-llama/Meta-Llama-3-8B | 57.85 | 46.09 | 81.89 | 65.10 | 45.34 | 59.25 |
56
+ | Qwen/Qwen2-7B | 61.86 | 53.10 | 80.63 | 70.45 | 78.09 | 68.83 |
57
+ | Sail/Sailor-7B | 50.34 | 35.65 | 76.11 | 52.80 | 33.81 | 49.74 |
58
+ | mistralai/Mistral-7B-v0.3 | 59.56 | 44.89 | 82.97 | 62.36 | 33.36 | 56.63 |
59
 
 
 
 
60
 
61
  ## Training Details
62