Update README.md
Browse files
README.md
CHANGED
@@ -69,7 +69,7 @@ print(response[0]["generated_text"])
|
|
69 |
|
70 |
The model has 10 warmup layers. i.e. 1/2 KV cache of a standard TinyLlama.
|
71 |
|
72 |
-
This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on
|
73 |
|
74 |
Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.
|
75 |
|
|
|
69 |
|
70 |
The model has 10 warmup layers. i.e. 1/2 KV cache of a standard TinyLlama.
|
71 |
|
72 |
+
This model was first initialized from the [TinyLlama 2.5T checkpoint](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T), then continued pre-training on 250B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
|
73 |
|
74 |
Since the model structure has been changed, the initialization cannot inherit the performance of the TinyLlama checkpoint, but it effectively boosts the training process compared to pre-training from scratch.
|
75 |
|