|
--- |
|
language: |
|
- en |
|
tags: |
|
- ggml |
|
- causal-lm |
|
- pythia |
|
license: apache-2.0 |
|
datasets: |
|
- EleutherAI/the_pile_deduplicated |
|
--- |
|
|
|
### This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints. |
|
|
|
If you're starting off, I highly recommend for you to get models from the newest directory [(2023-05-25)](https://huggingface.co/Merry/ggml-pythia-deduped/tree/main/2023-05-25). |
|
|
|
[Click here if you're looking for ggmlv1 and ggmlv2 models.](https://huggingface.co/Merry/ggml-pythia-deduped/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd). |
|
|
|
# RAM USAGE |
|
Model | RAM usage |
|
:--:|:--: |
|
Unloaded | 41.3 MiB |
|
| |
|
ggmlv3-pythia-70m-deduped-q4_0.bin | 95.5 MiB |
|
ggmlv3-pythia-160m-deduped-q4_0.bin | 201.1 MiB |
|
ggmlv3-pythia-410m-deduped-q4_0.bin | 415.1 MiB |
|
ggmlv3-pythia-1b-deduped-q4_0.bin | 762.2 MiB |
|
ggmlv3-pythia-1.4b-deduped-q4_0.bin | 1.0 GiB |
|
ggmlv3-pythia-2.8b-deduped-q4_0.bin | 1.9 GiB |
|
| |
|
ggmlv3-pythia-70m-deduped-q5_1.bin | 108.7 MiB |
|
ggmlv3-pythia-160m-deduped-q5_1.bin | 226.9 MiB |
|
ggmlv3-pythia-410m-deduped-q5_1.bin | 494.0 MiB |
|
ggmlv3-pythia-1b-deduped-q5_1.bin | 943.9 MiB |
|
ggmlv3-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB |
|
ggmlv3-pythia-2.8b-deduped-q5_1.bin | 2.3 GiB |
|
|
|
*Tested on KoboldCpp with OpenBLAS enabled.* |
|
|
|
**Versions:** |
|
|
|
**2023-04-20:** *q4_3. Used [commit 05f3079](https://github.com/ggerganov/ggml/tree/05f307971862b83df12fada0c42ee027ba5a82b5/examples/stablelm)* |
|
|
|
**2023-04-30:** *q5_0, q5_1, and q8_0, up to 2.8B. I can't upload all conversions of 6.9B and 12B due to my internet. Used [commit 5dd92f4](https://github.com/ggerganov/ggml/tree/5dd92f421ee44f18b8fde0afbf5ca8fc7bf93841/examples/stablelm)* |
|
|
|
**2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm)* |
|
|
|
**2023-05-15:** *New quantization format (ggmlv2). q4_0 and q5_1, up to 2.8B. Used [commit 010203f](https://github.com/ggerganov/ggml/tree/010203f94a85df5c86b773dc5acb698c8e7b1e7b/examples/gpt-neox)* |
|
|
|
**2023-05-25:** *New quantization format (ggmlv3). q4_0 and q5_1, up to 2.8B. Used [commit 73ad593](https://github.com/ggerganov/ggml/tree/73ad593cf84f864f0fcfd3a196253575c70d66a2/examples/gpt-neox)* |
|
|
|
**Notes:** |
|
- The models have been converted with ggerganov/ggml's gpt-neox conversion script, and tested only on KoboldCpp. Other frontends that support GGML-based conversions of GPT-NeoX *should* work, but I can't promise anything. |
|
- They're sorted by date based on when they were converted so it was easier to track breaking changes. If you're just starting off I highly recommend the latest, which is currently 2023-05-25. Combined with KoboldCpp v1.25.1+ this improved the tokenizer, which in my testing reduces occurrences of broken words like "Alicae" or "Reimu Hai-ku-rei". |
|
|
|
# ALTERNATIVES |
|
If you're here because you want a smaller model to run on a device with constrained memory, consider the following, most (if not all) of which have GGML conversions available: |
|
- [**RedPajama-INCITE**](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) (3B, 7B), using the GPT-NeoX architecture |
|
- [**OpenLLaMA**](https://huggingface.co/openlm-research/open_llama_3b_600bt_preview) (3B, 7B), using the LLaMA architecture |
|
- [**MPT-1b-RedPajama-200b**](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) (1B), using the MPT architecture |
|
- [**RWKV-4 PilePlus**](https://huggingface.co/BlinkDL/rwkv-4-pileplus) (169M, 430M, 1.5B, 3B), using the RWKV architecture |
|
- [**GPT-2**](https://huggingface.co/gpt2-xl) (124M, 355M, 774M, 1.5B), using the GPT-2 architecture |