aisingapore
/

sea-lion-7b-instruct-research

@@ -65,13 +65,16 @@ For Natural Language Generation (NLG) tasks, we tested the model on Machine Tran
 For Natural Language Reasoning (NLR) tasks, we tested the model on Natural Language Inference (NLI) using the IndoNLI lay dataset and on Causal Reasoning (Causal) using the XCOPA dataset. The metrics are accuracy for both tasks.
-| Model Name               | Sent (F1) | QA (F1)  | Tox (F1) | MT-EN-ID (ChrF++)| (COMET22)| MT-ID-EN (ChrF++)| (COMET22)| AbsSum (ROUGE-L)| NLI (Acc) | Causal (Acc) |
-|--------------------------|-----------|----------|----------|------------------|----------|------------------|----------|-----------------|-----------|-------------|
-| sealion7b-instruct-nc    | **76.13** | 24.86    | **24.45**| **52.50**        | **86.97**| 46.82            | 81.34    | **15.44**       | **33.20** | **23.80**   |
-| Mistral-7B-Instruct-v0.1 | 73.66     | **26.08**| 18.60    | 31.08            | 55.29    | 51.20            | 82.38    | 14.41           | 29.20     | 11.00       |
-| Llama-2-7b-chat-hf       | 41.92     | 4.23     | 0.00     | 47.96            | 77.86    | **55.76**        | **86.08**| 4.59            | 0.00      | 0.00        |
-| falcon-7b-instruct       | 0.00      | 8.47     | 7.21     | 1.66             | 30.07    | 16.82            | 46.32    | 1.55            | 0.00      | 2.20        |
 ## Technical Specifications

 For Natural Language Reasoning (NLR) tasks, we tested the model on Natural Language Inference (NLI) using the IndoNLI lay dataset and on Causal Reasoning (Causal) using the XCOPA dataset. The metrics are accuracy for both tasks.
+| Model                          | QA (F1) | Sentiment (F1) | Toxicity (F1) | Eng>Indo (ChrF++) | Indo>Eng (ChrF++) | Summary (ROUGE-L) | NLI (Acc) | Causal (Acc) |
+|--------------------------------|---------|----------------|---------------|-------------------|-------------------|-------------------|-----------|--------------|
+| SEA-LION-7B-Instruct-Research  | 24.86   | 76.13          | 24.45         | 52.50             | 46.82             | 15.44             | 33.20     | 23.80        |
+| SEA-LION-7B-Instruct           | **68.41**   | **91.45**          | 17.98         | 57.48             | 58.04             | **17.54**             | **53.10**     | 60.80        |
+| SeaLLM 7B v1                   | 30.96   | 56.29          | 22.60         | 62.23             | 41.55             | 14.03             | 26.50     | 56.60        |
+| SeaLLM 7B v2                   | 44.40   | 80.13          | **55.24**         | 64.01             | **63.28**             | 17.31             | 43.60     | **82.00**        |
+| Sailor-7B (Base)               | 65.43   | 59.48          | 20.48         | **64.27**             | 60.68             | 8.69              | 15.10     | 38.40        |
+| Llama 2 7B Chat                | 11.12   | 52.32          | 0.00          | 44.09             | 57.58             | 9.24              | 0.00      | 0.00         |
+| Mistral 7B Instruct v0.1       | 38.85   | 74.38          | 20.83         | 30.60             | 51.43             | 15.63             | 28.60     | 50.80        |
+| GPT-4                          | 73.60   | 74.14          | 63.96         | 69.38             | 67.53             | 18.71             | 83.20     | 96.00        |
 ## Technical Specifications