Doctor-Shotgun
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,10 @@ Created using [TinyLlama-1.1B](https://huggingface.co/TinyLlama/tinyLlama-interm
|
|
16 |
|
17 |
Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.
|
18 |
|
|
|
|
|
|
|
|
|
19 |
### Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated via [exllamav2](https://github.com/turboderp/exllamav2):
|
20 |
|
21 |
| Model | 2048 | 4096 | 8192 | 16384 | 32768 |
|
|
|
16 |
|
17 |
Of note, the base checkpoint used was from commit "final model" fad4f1a5cd0563ac41349b8fec2e6e51156568a0 which was subsequently reverted, and not the current main branch 3T checkpoint of TinyLlama-1.1B.
|
18 |
|
19 |
+
[EXL2 Quants by turboderp](https://huggingface.co/turboderp/TinyLlama-1B-32k-exl2)
|
20 |
+
|
21 |
+
The quantized model fits alongside a 4.25bpw 70B model at 32k sequence length on a single A6000 and provides noticeable speed-up with speculative decoding.
|
22 |
+
|
23 |
### Wikitext (wikitext-2-raw-v1_train) Perplexity (64 rows) as evaluated via [exllamav2](https://github.com/turboderp/exllamav2):
|
24 |
|
25 |
| Model | 2048 | 4096 | 8192 | 16384 | 32768 |
|