whynlp
/

tinyllama-lckv-w10-ft-250b

Text Generation

Model card Files Files and versions Community

whynlp commited on Dec 2, 2024

Commit

28759c8

·

verified ·

1 Parent(s): 4cb6198

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -69,7 +69,7 @@ print(response[0]["generated_text"])
 The model has 10 warmup layers. i.e. 1/2 KV cache of a standard TinyLlama.
-This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
 Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.

 The model has 10 warmup layers. i.e. 1/2 KV cache of a standard TinyLlama.
+This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 250B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
 Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.