whynlp commited on
Commit
28759c8
·
verified ·
1 Parent(s): 4cb6198

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -69,7 +69,7 @@ print(response[0]["generated_text"])
69
 
70
  The model has 10 warmup layers. i.e. 1/2 KV cache of a standard TinyLlama.
71
 
72
- This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
73
 
74
  Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.
75
 
 
69
 
70
  The model has 10 warmup layers. i.e. 1/2 KV cache of a standard TinyLlama.
71
 
72
+ This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 250B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
73
 
74
  Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.
75