michaelfeil
commited on
Commit
•
3d44796
1
Parent(s):
990790f
Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
|
|
35 |
| Initialize From | LLaMA-3-8B-Inst| 65K |
|
36 |
| Sequence Length | 2^16 | 2^18 |
|
37 |
| RoPE theta | 15.3 M | 207.1 M |
|
38 |
-
| Batch Size (Tokens / Step) |
|
39 |
| Steps | 30 | 24 |
|
40 |
| Total Tokens | 63 M | 101 M |
|
41 |
| Learning Rate | 2.00E-05 | 2.00E-05 |
|
|
|
35 |
| Initialize From | LLaMA-3-8B-Inst| 65K |
|
36 |
| Sequence Length | 2^16 | 2^18 |
|
37 |
| RoPE theta | 15.3 M | 207.1 M |
|
38 |
+
| Batch Size (Tokens / Step) | 2.097 M | 4.192 M |
|
39 |
| Steps | 30 | 24 |
|
40 |
| Total Tokens | 63 M | 101 M |
|
41 |
| Learning Rate | 2.00E-05 | 2.00E-05 |
|