--- license: apache-2.0 --- # Converted with ggerganov/ggml's stablelm conversion script, and tested with KoboldCpp. ## *(I can't promise that this will work with other frontends, if at all; I haven't had the most success myself. Use at your own risk!)* **2023-04-20:** *q4_3. Used [commit 05f3079](https://github.com/ggerganov/ggml/tree/05f307971862b83df12fada0c42ee027ba5a82b5/examples/stablelm).* **2023-04-30:** *q5_0, q5_1, and q8_0, up to 2.8B. I can't upload all conversions of 6.9B and 12B due to my internet. Used [commit 5dd92f4](https://github.com/ggerganov/ggml/tree/5dd92f421ee44f18b8fde0afbf5ca8fc7bf93841/examples/stablelm).* **2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm).* They're separated by date and commit so it's easier to track any breaking changes. # ALTERNATIVES If you're here because you want a smaller model to run on a device with constrained memory, try the instruct-based RWKV-Raven ([q8_0](https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main) and [q5_1](https://huggingface.co/latestissue/RWKV-4-Raven-CPP-Converted-Quantized/tree/main)) which goes as low as 1.5B, or [RWKV-PilePlus](https://huggingface.co/BlinkDL/rwkv-4-pileplus/tree/main), which goes as low as 169M. If you're here because you want an openly-licensed LLaMA, there's: - OpenLLaMA [(7B)](https://huggingface.co/openlm-research/open_llama_7b_preview_300bt) - RedPajama-INCITE [(3B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) [(7B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) - MPT [(1B)](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) [(7B)](https://huggingface.co/mosaicml/mpt-7b). All of them are trained on an open reproduction of LLaMA's dataset, [RedPajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT uses its own. # RAM USAGE Model | Initial RAM usage :--:|:--: ggml-pythia-70m-deduped-q4_3.bin | 121.2 MiB ggml-pythia-160m-deduped-q4_3.bin | 225.2 MiB ggml-pythia-410m-deduped-q4_3.bin | 498.1 MiB ggml-pythia-1b-deduped-q4_3.bin | 951.5 MiB ggml-pythia-1.4b-deduped-q4_3.bin | 1.3 GiB ggml-pythia-2.8b-deduped-q4_3.bin | 2.4 GiB ggml-pythia-6.9b-deduped-q4_3.bin | 5.4 GiB ggml-pythia-12b-deduped-q4_3.bin | 9.2 GiB