Qi Wang
commited on
Commit
•
b7ac396
1
Parent(s):
983603d
Update readme_en.md
Browse files- readme_en.md +9 -9
readme_en.md
CHANGED
@@ -22,10 +22,10 @@ The tokenizer for the model was also retrained, without relying on any existing
|
|
22 |
|
23 |
Training Parameters:
|
24 |
|
25 |
-
1. Maximum Sentence Length:
|
26 |
-
2. Vocabulary Size:
|
27 |
-
3. Normalization Rule:
|
28 |
-
4. Character Coverage:
|
29 |
|
30 |
| | Llama2 | Baby Llama2 |
|
31 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
|
@@ -48,11 +48,11 @@ Before full training, the corpus is processed for vectorization. Using the recen
|
|
48 |
|
49 |
Pre-training is done on a single 3090 machine. The model uses the architecture of llama2, and the training parameters are as follows:
|
50 |
|
51 |
-
1. max_seq_len =
|
52 |
-
2. dim =
|
53 |
-
3. n_headers =
|
54 |
-
4. n_layers =
|
55 |
-
5. n_kv_headers =
|
56 |
|
57 |
## Demonstration
|
58 |
|
|
|
22 |
|
23 |
Training Parameters:
|
24 |
|
25 |
+
1. Maximum Sentence Length: 2657
|
26 |
+
2. Vocabulary Size: 32000
|
27 |
+
3. Normalization Rule: identity
|
28 |
+
4. Character Coverage: 0.9995
|
29 |
|
30 |
| | Llama2 | Baby Llama2 |
|
31 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
|
|
|
48 |
|
49 |
Pre-training is done on a single 3090 machine. The model uses the architecture of llama2, and the training parameters are as follows:
|
50 |
|
51 |
+
1. max_seq_len = 1024
|
52 |
+
2. dim = 768
|
53 |
+
3. n_headers = 12
|
54 |
+
4. n_layers = 12
|
55 |
+
5. n_kv_headers = 12
|
56 |
|
57 |
## Demonstration
|
58 |
|