---
license: apache-2.0
---

# Converted with ggerganov/ggml's stablelm conversion script, and tested with KoboldCpp.
## *(I can't promise that this will work with other frontends, if at all; I haven't had the most success myself. Use at your own risk!)*

**2023-04-20:** *q4_3. Used [commit 05f3079](https://github.com/ggerganov/ggml/tree/05f307971862b83df12fada0c42ee027ba5a82b5/examples/stablelm).*

**2023-04-30:** *q5_0, q5_1, and q8_0, up to 2.8B. I can't upload all conversions of 6.9B and 12B due to my internet. Used [commit 5dd92f4](https://github.com/ggerganov/ggml/tree/5dd92f421ee44f18b8fde0afbf5ca8fc7bf93841/examples/stablelm).*

**2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm).*

**2023-05-15:** *New quantization format. q4_0 and q5_1, up to 2.8B. Used [commit 010203f](https://github.com/ggerganov/ggml/tree/010203f94a85df5c86b773dc5acb698c8e7b1e7b/examples/gpt-neox).*

They're separated by date and commit so it's easier to track any breaking changes.

# RAM USAGE (on KoboldCpp w/ OpenBLAS)
Model | Initial RAM | After generation
:--:|:--:|:--:
Unloaded | 41.3 MiB
ggml-pythia-70m-deduped-q4_0.bin | 113.3 MiB | 267.8 MiB
ggml-pythia-70m-deduped-q5_1.bin | 121.5 MiB | 129.4 MiB
ggml-pythia-160m-deduped-q4_0.bin | 199.4 MiB | 201.6 MiB
ggml-pythia-160m-deduped-q5_1.bin | 227.5 MiB | 241.0 MiB
ggml-pythia-410m-deduped-q4_0.bin | 399.2 MiB | 406.2 MiB
ggml-pythia-410m-deduped-q5_1.bin | 455.7 MiB | 460.3 MiB
ggml-pythia-1b-deduped-q4_0.bin | 803.0 MiB | 809.0 MiB
ggml-pythia-1b-deduped-q5_1.bin | 921.5 MiB | 927.3 MiB
ggml-pythia-1.4b-deduped-q4_0.bin | 1.1 GiB | 1.1 GiB
ggml-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB | 1.3 GiB
ggml-pythia-2.8b-deduped-q4_0.bin | 2.0 GiB | 2.0 GiB
ggml-pythia-2.8b-deduped-q5_1.bin | 2.4 GiB | 2.4 GiB

# ALTERNATIVES
If you're here because you want a smaller model to run on a device with constrained memory, consider the following:
- OpenLLaMA [3B](https://huggingface.co/openlm-research/open_llama_3b_350bt_preview) [(7B)](https://huggingface.co/openlm-research/open_llama_7b_400bt_preview)
- RedPajama-INCITE [(3B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) [(7B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1)
- MPT [(1B)](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) [(7B)](https://huggingface.co/mosaicml/mpt-7b).
- RWKV PilePlus [(169M) (430M) (1.5B) (3B)](https://huggingface.co/BlinkDL/rwkv-4-pileplus)

All of them are trained at least partially on an open reproduction of LLaMA's dataset, [RedPajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT and RWKV use their own.