TRI-ML
/

DCLM-1B-v0

achal-tri commited on Jul 18, 2024

Commit

23b5c8b

verified ·

1 Parent(s): 8ceb934

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -48,9 +48,14 @@ The model was trained using the following setup:
 - **Total Training Tokens:** 2.6T
 - **Hardware:** Trained on H100 GPUs
-For more detailed training information, please refer to Appendix P.3 of the
-paper.
-To ensure our trained model is broadly useful, including for math and coding tasks, we combine our 3.8T [DCLM-BASELINE](https://huggingface.co/datasets/mlfoundations/dclm-baseline-1.0)  with the [StarCoder](https://huggingface.co/datasets/bigcode/starcoderdata)  and [ProofPile2](https://huggingface.co/datasets/EleutherAI/proof-pile-2) data to arrive at a 4.1T token dataset.
 ## Evaluation

 - **Total Training Tokens:** 2.6T
 - **Hardware:** Trained on H100 GPUs
+We train our 1.4B model for 2.6T tokens on DCLM-Baseline.
+Similar to the 7B model training recipe described in Appendix P of our paper,
+we train for 2.3T tokens on DCLM-baseline combined with the StarCoder and ProofPile2 datasets,
+with the hyper-parameters described above.
+Note that we use a schedule set for the full dataset, and stop training early at 2.3T tokens.
+Then, we cool down the model on the same dataset to the cooldown LR over 200B tokens.
+We will update our paper soon with more training details.
 ## Evaluation