Merge branch 'main' of hf.co:togethercomputer/RedPajama-Base-INCITE-2.8B-v1 into main
Browse files
README.md
CHANGED
@@ -181,11 +181,12 @@ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/data
|
|
181 |
|
182 |
**Training Procedure**
|
183 |
|
184 |
-
- **Hardware:**
|
185 |
-
- **Optimizer:**
|
186 |
-
- **
|
|
|
187 |
- **Num of Tokens:** 800B Tokens
|
188 |
-
- **Learning rate:**
|
189 |
|
190 |
## Community
|
191 |
|
|
|
181 |
|
182 |
**Training Procedure**
|
183 |
|
184 |
+
- **Hardware:** 256 nodes of 6xV100 (IBM Power9), on the OLCF Summit cluster
|
185 |
+
- **Optimizer:** Apex FusedAdam
|
186 |
+
- **Parallelism:** Pipeline parallel 6, tensor parallel 2
|
187 |
+
- **Gradient Accumulations**: 8 (global batch size 4M tokens)
|
188 |
- **Num of Tokens:** 800B Tokens
|
189 |
+
- **Learning rate:** 0.00016
|
190 |
|
191 |
## Community
|
192 |
|