forrest-gradient
commited on
Commit
•
e77cde0
1
Parent(s):
2f636d9
Update README.md
Browse files
README.md
CHANGED
@@ -42,13 +42,12 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
|
|
42 |
| Initialize From | Llama-3-70B-Instruct | 65K |
|
43 |
| Sequence Length 2^N | 16 | 18 |
|
44 |
| RoPE theta | 15,296,098 | 207,112,184 |
|
45 |
-
| Batch Size |
|
46 |
| Gradient Accumulation Steps | 1 | 1 |
|
47 |
| Steps | 20 | 25 |
|
48 |
| Total Tokens | 83,886,080 | 104,857,600 |
|
49 |
| Learning rate | 0.00002 | 0.00002 |
|
50 |
| # GPUs | 512 | 512 |
|
51 |
-
| Ring parallelism | 64 | 16 |
|
52 |
| GPU Type | NVIDIA L40S | NVIDIA L40S |
|
53 |
| Minutes to Train (Wall) | 100 | 170 |
|
54 |
|
|
|
42 |
| Initialize From | Llama-3-70B-Instruct | 65K |
|
43 |
| Sequence Length 2^N | 16 | 18 |
|
44 |
| RoPE theta | 15,296,098 | 207,112,184 |
|
45 |
+
| Batch Size | 64 | 16 |
|
46 |
| Gradient Accumulation Steps | 1 | 1 |
|
47 |
| Steps | 20 | 25 |
|
48 |
| Total Tokens | 83,886,080 | 104,857,600 |
|
49 |
| Learning rate | 0.00002 | 0.00002 |
|
50 |
| # GPUs | 512 | 512 |
|
|
|
51 |
| GPU Type | NVIDIA L40S | NVIDIA L40S |
|
52 |
| Minutes to Train (Wall) | 100 | 170 |
|
53 |
|