forrest-gradient commited on
Commit
e77cde0
1 Parent(s): 2f636d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -42,13 +42,12 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
42
  | Initialize From | Llama-3-70B-Instruct | 65K |
43
  | Sequence Length 2^N | 16 | 18 |
44
  | RoPE theta | 15,296,098 | 207,112,184 |
45
- | Batch Size | 1 | 1 |
46
  | Gradient Accumulation Steps | 1 | 1 |
47
  | Steps | 20 | 25 |
48
  | Total Tokens | 83,886,080 | 104,857,600 |
49
  | Learning rate | 0.00002 | 0.00002 |
50
  | # GPUs | 512 | 512 |
51
- | Ring parallelism | 64 | 16 |
52
  | GPU Type | NVIDIA L40S | NVIDIA L40S |
53
  | Minutes to Train (Wall) | 100 | 170 |
54
 
 
42
  | Initialize From | Llama-3-70B-Instruct | 65K |
43
  | Sequence Length 2^N | 16 | 18 |
44
  | RoPE theta | 15,296,098 | 207,112,184 |
45
+ | Batch Size | 64 | 16 |
46
  | Gradient Accumulation Steps | 1 | 1 |
47
  | Steps | 20 | 25 |
48
  | Total Tokens | 83,886,080 | 104,857,600 |
49
  | Learning rate | 0.00002 | 0.00002 |
50
  | # GPUs | 512 | 512 |
 
51
  | GPU Type | NVIDIA L40S | NVIDIA L40S |
52
  | Minutes to Train (Wall) | 100 | 170 |
53