Update README.md
Browse files
README.md
CHANGED
@@ -7,10 +7,10 @@ language:
|
|
7 |
- vi
|
8 |
license: llama3
|
9 |
---
|
10 |
-
# Llama3 8B SEA-LIONv2
|
11 |
|
12 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
13 |
-
This is the card for the Llama3 8B SEA-LIONv2 base model which has undergone continued pre-training from the [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model.
|
14 |
|
15 |
SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
16 |
|
@@ -19,7 +19,7 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
|
19 |
|
20 |
### Model Description
|
21 |
|
22 |
-
The continued pre-training data for Llama3 8B SEA-LIONv2 base model encompasses approximately 48B tokens.
|
23 |
|
24 |
- **Developed by:** Products Pillar, AI Singapore
|
25 |
- **Funded by:** Singapore NRF
|
@@ -30,7 +30,7 @@ The continued pre-training data for Llama3 8B SEA-LIONv2 base model encompasses
|
|
30 |
For tokenization, the model employs the default tokenizer used in Meta-Llama-3-8B-Instruct.
|
31 |
|
32 |
### Benchmark Performance
|
33 |
-
We evaluated Llama3 8B SEA-LIONv2 base model on general language capabilities.
|
34 |
|
35 |
#### General Language Capabilities
|
36 |
For the evaluation of general language capabilities in SEA languages, we employed the [BHASA evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
|
@@ -60,7 +60,7 @@ We also evaluated the model on English capabilities using tasks from the Open LL
|
|
60 |
|
61 |
### Data
|
62 |
|
63 |
-
Llama3 8B SEA-LIONv2 base model was continued pre-trained on 48B tokens of the following data:
|
64 |
|
65 |
| Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%) |
|
66 |
|---------------------------|:-----------------:|:----------:|:----------------:|:--------------:|
|
@@ -87,10 +87,10 @@ Note:
|
|
87 |
|
88 |
### Infrastructure
|
89 |
|
90 |
-
Llama3 8B SEA-LIONv2 was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
|
91 |
on the following hardware:
|
92 |
|
93 |
-
| Training Details | Llama3 8B SEA-LIONv2 |
|
94 |
|----------------------|:--------------------:|
|
95 |
| AWS EC2 p5d.24xlarge | 8 instances |
|
96 |
| Nvidia H100 80GB GPU | 64 |
|
@@ -99,7 +99,7 @@ on the following hardware:
|
|
99 |
|
100 |
### Configuration
|
101 |
|
102 |
-
| HyperParameter | Llama3 8B SEA-LIONv2 |
|
103 |
|-------------------|:--------------------:|
|
104 |
| Precision | bfloat16 |
|
105 |
| Optimizer | decoupled_adamw |
|
@@ -111,33 +111,33 @@ on the following hardware:
|
|
111 |
|
112 |
## The Team
|
113 |
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
-
|
118 |
-
Lee Chwan Ren<br>
|
119 |
-
Leong Wai Yi<br>
|
120 |
-
Leong Wei Qi<br>
|
121 |
-
Li Yier<br>
|
122 |
-
Liu Bing Jie Darius<br>
|
123 |
-
Lovenia Holy<br>
|
124 |
-
Montalan Jann Railey<br>
|
125 |
-
Ng Boon Cheong Raymond<br>
|
126 |
-
Ngui Jian Gang<br>
|
127 |
-
Nguyen Thanh Ngan<br>
|
128 |
-
|
129 |
-
Ong Tat-Wee David<br>
|
130 |
-
Ong Zhi Hao<br>
|
131 |
-
Rengarajan Hamsawardhini<br>
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
Teo
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
Yeo Yeow Tong<br>
|
141 |
Yong Xianbin<br>
|
142 |
|
143 |
|
|
|
7 |
- vi
|
8 |
license: llama3
|
9 |
---
|
10 |
+
# Llama3 8B CPT SEA-LIONv2
|
11 |
|
12 |
SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
|
13 |
+
This is the card for the Llama3 8B CPT SEA-LIONv2 base model which has undergone continued pre-training from the [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model.
|
14 |
|
15 |
SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
|
16 |
|
|
|
19 |
|
20 |
### Model Description
|
21 |
|
22 |
+
The continued pre-training data for Llama3 8B CPT SEA-LIONv2 base model encompasses approximately 48B tokens.
|
23 |
|
24 |
- **Developed by:** Products Pillar, AI Singapore
|
25 |
- **Funded by:** Singapore NRF
|
|
|
30 |
For tokenization, the model employs the default tokenizer used in Meta-Llama-3-8B-Instruct.
|
31 |
|
32 |
### Benchmark Performance
|
33 |
+
We evaluated Llama3 8B CPT SEA-LIONv2 base model on general language capabilities.
|
34 |
|
35 |
#### General Language Capabilities
|
36 |
For the evaluation of general language capabilities in SEA languages, we employed the [BHASA evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
|
|
|
60 |
|
61 |
### Data
|
62 |
|
63 |
+
Llama3 8B CPT SEA-LIONv2 base model was continued pre-trained on 48B tokens of the following data:
|
64 |
|
65 |
| Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%) |
|
66 |
|---------------------------|:-----------------:|:----------:|:----------------:|:--------------:|
|
|
|
87 |
|
88 |
### Infrastructure
|
89 |
|
90 |
+
Llama3 8B CPT SEA-LIONv2 was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
|
91 |
on the following hardware:
|
92 |
|
93 |
+
| Training Details | Llama3 8B CPT SEA-LIONv2 |
|
94 |
|----------------------|:--------------------:|
|
95 |
| AWS EC2 p5d.24xlarge | 8 instances |
|
96 |
| Nvidia H100 80GB GPU | 64 |
|
|
|
99 |
|
100 |
### Configuration
|
101 |
|
102 |
+
| HyperParameter | Llama3 8B CPT SEA-LIONv2 |
|
103 |
|-------------------|:--------------------:|
|
104 |
| Precision | bfloat16 |
|
105 |
| Optimizer | decoupled_adamw |
|
|
|
111 |
|
112 |
## The Team
|
113 |
|
114 |
+
Choa Esther<br>
|
115 |
+
Cheng Nicholas<br>
|
116 |
+
Huang Yuli<br>
|
117 |
+
Lau Wayne<br>
|
118 |
+
Lee Chwan Ren<br>
|
119 |
+
Leong Wai Yi<br>
|
120 |
+
Leong Wei Qi<br>
|
121 |
+
Li Yier<br>
|
122 |
+
Liu Bing Jie Darius<br>
|
123 |
+
Lovenia Holy<br>
|
124 |
+
Montalan Jann Railey<br>
|
125 |
+
Ng Boon Cheong Raymond<br>
|
126 |
+
Ngui Jian Gang<br>
|
127 |
+
Nguyen Thanh Ngan<br>
|
128 |
+
Ong Brandon<br>
|
129 |
+
Ong Tat-Wee David<br>
|
130 |
+
Ong Zhi Hao<br>
|
131 |
+
Rengarajan Hamsawardhini<br>
|
132 |
+
Siow Bryan<br>
|
133 |
+
Susanto Yosephine<br>
|
134 |
+
Tai Ngee Chia<br>
|
135 |
+
Tan Choon Meng<br>
|
136 |
+
Teo Eng Sipp Leslie<br>
|
137 |
+
Teo Wei Yi<br>
|
138 |
+
Tjhi William<br>
|
139 |
+
Teng Walter<br>
|
140 |
+
Yeo Yeow Tong<br>
|
141 |
Yong Xianbin<br>
|
142 |
|
143 |
|