achal-tri commited on
Commit
23b5c8b
1 Parent(s): 8ceb934

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -3
README.md CHANGED
@@ -48,9 +48,14 @@ The model was trained using the following setup:
48
  - **Total Training Tokens:** 2.6T
49
  - **Hardware:** Trained on H100 GPUs
50
 
51
- For more detailed training information, please refer to Appendix P.3 of the
52
- paper.
53
- To ensure our trained model is broadly useful, including for math and coding tasks, we combine our 3.8T [DCLM-BASELINE](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0) with the [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata) and [ProofPile2](https://huggingface.co/datasets/EleutherAI/proof-pile-2) data to arrive at a 4.1T token dataset.
 
 
 
 
 
54
 
55
  ## Evaluation
56
 
 
48
  - **Total Training Tokens:** 2.6T
49
  - **Hardware:** Trained on H100 GPUs
50
 
51
+
52
+ We train our 1.4B model for 2.6T tokens on DCLM-Baseline.
53
+ Similar to the 7B model training recipe described in Appendix P of our paper,
54
+ we train for 2.3T tokens on DCLM-baseline combined with the StarCoder and ProofPile2 datasets,
55
+ with the hyper-parameters described above.
56
+ Note that we use a schedule set for the full dataset, and stop training early at 2.3T tokens.
57
+ Then, we cool down the model on the same dataset to the cooldown LR over 200B tokens.
58
+ We will update our paper soon with more training details.
59
 
60
  ## Evaluation
61