Crataco
/

Pythia-Deduped-Series-GGML

Text Generation

English

ggml

causal-lm

pythia

Model card Files Files and versions Community

Merry commited on May 19, 2023

Commit

d68ebe9

1 Parent(s): babe17a

Update README.md

Browse files

Files changed (1) hide show

README.md +23 -18

README.md CHANGED Viewed

@@ -11,27 +11,32 @@ license: apache-2.0
 **2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm).*
-They're separated by date and commit so it's easier to track any breaking changes.
-# ALTERNATIVES
-If you're here because you want a smaller model to run on a device with constrained memory, try the instruct-based RWKV-Raven ([q8_0](https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main) and [q5_1](https://huggingface.co/latestissue/RWKV-4-Raven-CPP-Converted-Quantized/tree/main)) which goes as low as 1.5B, or [RWKV-PilePlus](https://huggingface.co/BlinkDL/rwkv-4-pileplus/tree/main), which goes as low as 169M.
-If you're here because you want an openly-licensed LLaMA, there's:
-- OpenLLaMA [(7B)](https://huggingface.co/openlm-research/open_llama_7b_preview_300bt)
 - RedPajama-INCITE [(3B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) [(7B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1)
 - MPT [(1B)](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) [(7B)](https://huggingface.co/mosaicml/mpt-7b).
-All of them are trained on an open reproduction of LLaMA's dataset, [RedPajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT uses its own.
-# RAM USAGE
-Model | Initial RAM usage
-:--:|:--:
-ggml-pythia-70m-deduped-q4_3.bin | 121.2 MiB
-ggml-pythia-160m-deduped-q4_3.bin | 225.2 MiB
-ggml-pythia-410m-deduped-q4_3.bin | 498.1 MiB
-ggml-pythia-1b-deduped-q4_3.bin | 951.5 MiB
-ggml-pythia-1.4b-deduped-q4_3.bin | 1.3 GiB
-ggml-pythia-2.8b-deduped-q4_3.bin | 2.4 GiB
-ggml-pythia-6.9b-deduped-q4_3.bin | 5.4 GiB
-ggml-pythia-12b-deduped-q4_3.bin | 9.2 GiB

 **2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm).*
+**2023-05-15:** *New quantization format. q4_0 and q5_1, up to 2.8B. Used [commit 010203f](https://github.com/ggerganov/ggml/tree/010203f94a85df5c86b773dc5acb698c8e7b1e7b/examples/gpt-neox).*
+They're separated by date and commit so it's easier to track any breaking changes.
+# RAM USAGE (on KoboldCpp w/ OpenBLAS)
+Model | Initial RAM | After generation
+:--:|:--:|:--:
+Unloaded | 41.3 MiB
+ggml-pythia-70m-deduped-q4_0.bin | 113.3 MiB | 267.8 MiB
+ggml-pythia-70m-deduped-q5_1.bin | 121.5 MiB | 129.4 MiB
+ggml-pythia-160m-deduped-q4_0.bin | 199.4 MiB | 201.6 MiB
+ggml-pythia-160m-deduped-q5_1.bin | 227.5 MiB | 241.0 MiB
+ggml-pythia-410m-deduped-q4_0.bin | 399.2 MiB | 406.2 MiB
+ggml-pythia-410m-deduped-q5_1.bin | 455.7 MiB | 460.3 MiB
+ggml-pythia-1b-deduped-q4_0.bin | 803.0 MiB | 809.0 MiB
+ggml-pythia-1b-deduped-q5_1.bin | 921.5 MiB | 927.3 MiB
+ggml-pythia-1.4b-deduped-q4_0.bin | 1.1 GiB | 1.1 GiB
+ggml-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB | 1.3 GiB
+ggml-pythia-2.8b-deduped-q4_0.bin | 2.0 GiB | 2.0 GiB
+ggml-pythia-2.8b-deduped-q5_1.bin | 2.4 GiB | 2.4 GiB
+# ALTERNATIVES
+If you're here because you want a smaller model to run on a device with constrained memory, consider the following:
+- OpenLLaMA [3B](https://huggingface.co/openlm-research/open_llama_3b_350bt_preview) [(7B)](https://huggingface.co/openlm-research/open_llama_7b_400bt_preview)
 - RedPajama-INCITE [(3B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) [(7B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1)
 - MPT [(1B)](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) [(7B)](https://huggingface.co/mosaicml/mpt-7b).
+- RWKV PilePlus [(169M) (430M) (1.5B) (3B)](https://huggingface.co/BlinkDL/rwkv-4-pileplus)
+All of them are trained at least partially on an open reproduction of LLaMA's dataset, [RedPajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT and RWKV use their own.