weiqipedia commited on
Commit
46cf2da
1 Parent(s): e1f1385

Update metrics in README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -65,13 +65,16 @@ For Natural Language Generation (NLG) tasks, we tested the model on Machine Tran
65
 
66
  For Natural Language Reasoning (NLR) tasks, we tested the model on Natural Language Inference (NLI) using the IndoNLI lay dataset and on Causal Reasoning (Causal) using the XCOPA dataset. The metrics are accuracy for both tasks.
67
 
68
- | Model Name | Sent (F1) | QA (F1) | Tox (F1) | MT-EN-ID (ChrF++)| (COMET22)| MT-ID-EN (ChrF++)| (COMET22)| AbsSum (ROUGE-L)| NLI (Acc) | Causal (Acc) |
69
- |--------------------------|-----------|----------|----------|------------------|----------|------------------|----------|-----------------|-----------|-------------|
70
- | sealion7b-instruct-nc | **76.13** | 24.86 | **24.45**| **52.50** | **86.97**| 46.82 | 81.34 | **15.44** | **33.20** | **23.80** |
71
- | Mistral-7B-Instruct-v0.1 | 73.66 | **26.08**| 18.60 | 31.08 | 55.29 | 51.20 | 82.38 | 14.41 | 29.20 | 11.00 |
72
- | Llama-2-7b-chat-hf | 41.92 | 4.23 | 0.00 | 47.96 | 77.86 | **55.76** | **86.08**| 4.59 | 0.00 | 0.00 |
73
- | falcon-7b-instruct | 0.00 | 8.47 | 7.21 | 1.66 | 30.07 | 16.82 | 46.32 | 1.55 | 0.00 | 2.20 |
74
-
 
 
 
75
 
76
  ## Technical Specifications
77
 
 
65
 
66
  For Natural Language Reasoning (NLR) tasks, we tested the model on Natural Language Inference (NLI) using the IndoNLI lay dataset and on Causal Reasoning (Causal) using the XCOPA dataset. The metrics are accuracy for both tasks.
67
 
68
+ | Model                          | QA (F1) | Sentiment (F1) | Toxicity (F1) | Eng>Indo (ChrF++) | Indo>Eng (ChrF++) | Summary (ROUGE-L) | NLI (Acc) | Causal (Acc) |
69
+ |--------------------------------|---------|----------------|---------------|-------------------|-------------------|-------------------|-----------|--------------|
70
+ | SEA-LION-7B-Instruct-Research  | 24.86   | 76.13          | 24.45         | 52.50             | 46.82             | 15.44             | 33.20     | 23.80        |
71
+ | SEA-LION-7B-Instruct           | **68.41**   | **91.45**          | 17.98         | 57.48             | 58.04             | **17.54**             | **53.10**     | 60.80        |
72
+ | SeaLLM 7B v1                   | 30.96   | 56.29          | 22.60         | 62.23             | 41.55             | 14.03             | 26.50     | 56.60        |
73
+ | SeaLLM 7B v2                   | 44.40   | 80.13          | **55.24**         | 64.01             | **63.28**             | 17.31             | 43.60     | **82.00**        |
74
+ | Sailor-7B (Base)               | 65.43   | 59.48          | 20.48         | **64.27**             | 60.68             | 8.69              | 15.10     | 38.40        |
75
+ | Llama 2 7B Chat                | 11.12   | 52.32          | 0.00          | 44.09             | 57.58             | 9.24              | 0.00      | 0.00         |
76
+ | Mistral 7B Instruct v0.1       | 38.85   | 74.38          | 20.83         | 30.60             | 51.43             | 15.63             | 28.60     | 50.80        |
77
+ | GPT-4 | 73.60 | 74.14 | 63.96 | 69.38 | 67.53 | 18.71 | 83.20 | 96.00 |
78
 
79
  ## Technical Specifications
80