Update README.md
Browse filesadd info about kv cache saving
README.md
CHANGED
@@ -66,6 +66,8 @@ print(response[0]["generated_text"])
|
|
66 |
|
67 |
## The LCKV Collection
|
68 |
|
|
|
|
|
69 |
This model was randomly initialized, then pre-trained on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
|
70 |
|
71 |
The evaluation follows that of TinyLlama. Refer to [our paper](https://arxiv.org/abs/2405.10637) for more details.
|
|
|
66 |
|
67 |
## The LCKV Collection
|
68 |
|
69 |
+
The model has 2 warmup layers. i.e. 3/22 KV cache of a standard TinyLlama.
|
70 |
+
|
71 |
This model was randomly initialized, then pre-trained on 100B tokens from [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B).
|
72 |
|
73 |
The evaluation follows that of TinyLlama. Refer to [our paper](https://arxiv.org/abs/2405.10637) for more details.
|