RaymondAISG commited on
Commit
5c5fb7c
1 Parent(s): 20b39c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -49,11 +49,11 @@ Llama3 8B CPT SEA-LIONv2 base model was continued pre-trained on 48B tokens of t
49
  | Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%) |
50
  |---------------------------|:-----------------:|:----------:|:----------------:|:--------------:|
51
  | Dolma RefinedWeb - English| 7.650 | 1 | 7.650 | 15.90 |
52
- | Dolma C4 - English | 1.160 | 1 | 1 | 9.21 |
53
- | Dolma Reddit - English | 1.339 | 1 | 14.7 | 2.42 |
54
- | Dolma Semantic Scholar | 0.959 | 1 | 2.9 | 2.79 |
55
- | Dolma arXiv | 0.469 | 1 | 5.3 | 1.99 |
56
- | Dolma StarCoder | 4.422 | 1 | 4.9 | 0.98 |
57
  | SEA-LION Pile - Indonesian| 3.4 | 2 | 6.8 | 14.17 |
58
  | Wiki* - Indonesian | 0.3 | 4 | 1.2 | 2.50 |
59
  | SEA-LION Pile - Tamil | 5.6 | 1 | 5.6 | 11.67 |
 
49
  | Data Source | Unique Tokens (B) | Multiplier | Total Tokens (B) | Percentage (%) |
50
  |---------------------------|:-----------------:|:----------:|:----------------:|:--------------:|
51
  | Dolma RefinedWeb - English| 7.650 | 1 | 7.650 | 15.90 |
52
+ | Dolma C4 - English | 1.160 | 1 | 1.16 | 9.21 |
53
+ | Dolma Reddit - English | 1.339 | 1 | 1.339 | 2.42 |
54
+ | Dolma Semantic Scholar | 0.959 | 1 | 0.959 | 2.79 |
55
+ | Dolma arXiv | 0.469 | 1 | 0.469 | 1.99 |
56
+ | Dolma StarCoder | 4.422 | 1 | 4.422 | 0.98 |
57
  | SEA-LION Pile - Indonesian| 3.4 | 2 | 6.8 | 14.17 |
58
  | Wiki* - Indonesian | 0.3 | 4 | 1.2 | 2.50 |
59
  | SEA-LION Pile - Tamil | 5.6 | 1 | 5.6 | 11.67 |