danfu09 commited on
Commit
c852771
1 Parent(s): 8cf442c

Add training details

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -119,11 +119,12 @@ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/data
119
 
120
  **Training Procedure**
121
 
122
- - **Hardware:** TODO @Dan
123
- - **Optimizer:**
124
- - **Gradient Accumulations**:
 
125
  - **Num of Tokens:** 800B Tokens
126
- - **Learning rate:**
127
 
128
  ## Community
129
 
 
119
 
120
  **Training Procedure**
121
 
122
+ - **Hardware:** 256 nodes of 6xV100 (IBM Power9), on the OLCF Summit cluster
123
+ - **Optimizer:** Apex FusedAdam
124
+ - **Parallelism:** Pipeline parallel 6, tensor parallel 2
125
+ - **Gradient Accumulations**: 8 (global batch size 4M tokens)
126
  - **Num of Tokens:** 800B Tokens
127
+ - **Learning rate:** 0.00016
128
 
129
  ## Community
130