license: apache-2.0
Converted with ggerganov/ggml's stablelm conversion script, and tested with KoboldCpp.
(I can't promise that this will work with other frontends, if at all; I haven't had the most success myself. Use at your own risk!)
2023-04-20: q4_3. Used commit 05f3079.
2023-04-30: q5_0, q5_1, and q8_0, up to 2.8B. I can't upload all conversions of 6.9B and 12B due to my internet. Used commit 5dd92f4.
2023-05-06: q4_0 and q4_2, up to 2.8B. Used commit ff6e03c.
They're separated by date and commit so it's easier to track any breaking changes.
ALTERNATIVES
If you're here because you want a smaller model to run on a device with constrained memory, try the instruct-based RWKV-Raven (q8_0 and q5_1) which goes as low as 1.5B, or RWKV-PilePlus, which goes as low as 169M.
If you're here because you want an openly-licensed LLaMA, there's:
All of them are trained on an open reproduction of LLaMA's dataset, RedPajama, but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT uses its own.
RAM USAGE
Model | Initial RAM usage |
---|---|
ggml-pythia-70m-deduped-q4_3.bin | 121.2 MiB |
ggml-pythia-160m-deduped-q4_3.bin | 225.2 MiB |
ggml-pythia-410m-deduped-q4_3.bin | 498.1 MiB |
ggml-pythia-1b-deduped-q4_3.bin | 951.5 MiB |
ggml-pythia-1.4b-deduped-q4_3.bin | 1.3 GiB |
ggml-pythia-2.8b-deduped-q4_3.bin | 2.4 GiB |
ggml-pythia-6.9b-deduped-q4_3.bin | 5.4 GiB |
ggml-pythia-12b-deduped-q4_3.bin | 9.2 GiB |