Update README.md
Browse files
README.md
CHANGED
@@ -11,27 +11,32 @@ license: apache-2.0
|
|
11 |
|
12 |
**2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm).*
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
|
21 |
-
|
|
|
22 |
- RedPajama-INCITE [(3B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) [(7B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1)
|
23 |
- MPT [(1B)](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) [(7B)](https://huggingface.co/mosaicml/mpt-7b).
|
|
|
24 |
|
25 |
-
All of them are trained on an open reproduction of LLaMA's dataset, [RedPajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT
|
26 |
-
|
27 |
-
# RAM USAGE
|
28 |
-
Model | Initial RAM usage
|
29 |
-
:--:|:--:
|
30 |
-
ggml-pythia-70m-deduped-q4_3.bin | 121.2 MiB
|
31 |
-
ggml-pythia-160m-deduped-q4_3.bin | 225.2 MiB
|
32 |
-
ggml-pythia-410m-deduped-q4_3.bin | 498.1 MiB
|
33 |
-
ggml-pythia-1b-deduped-q4_3.bin | 951.5 MiB
|
34 |
-
ggml-pythia-1.4b-deduped-q4_3.bin | 1.3 GiB
|
35 |
-
ggml-pythia-2.8b-deduped-q4_3.bin | 2.4 GiB
|
36 |
-
ggml-pythia-6.9b-deduped-q4_3.bin | 5.4 GiB
|
37 |
-
ggml-pythia-12b-deduped-q4_3.bin | 9.2 GiB
|
|
|
11 |
|
12 |
**2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm).*
|
13 |
|
14 |
+
**2023-05-15:** *New quantization format. q4_0 and q5_1, up to 2.8B. Used [commit 010203f](https://github.com/ggerganov/ggml/tree/010203f94a85df5c86b773dc5acb698c8e7b1e7b/examples/gpt-neox).*
|
15 |
|
16 |
+
They're separated by date and commit so it's easier to track any breaking changes.
|
17 |
|
18 |
+
# RAM USAGE (on KoboldCpp w/ OpenBLAS)
|
19 |
+
Model | Initial RAM | After generation
|
20 |
+
:--:|:--:|:--:
|
21 |
+
Unloaded | 41.3 MiB
|
22 |
+
ggml-pythia-70m-deduped-q4_0.bin | 113.3 MiB | 267.8 MiB
|
23 |
+
ggml-pythia-70m-deduped-q5_1.bin | 121.5 MiB | 129.4 MiB
|
24 |
+
ggml-pythia-160m-deduped-q4_0.bin | 199.4 MiB | 201.6 MiB
|
25 |
+
ggml-pythia-160m-deduped-q5_1.bin | 227.5 MiB | 241.0 MiB
|
26 |
+
ggml-pythia-410m-deduped-q4_0.bin | 399.2 MiB | 406.2 MiB
|
27 |
+
ggml-pythia-410m-deduped-q5_1.bin | 455.7 MiB | 460.3 MiB
|
28 |
+
ggml-pythia-1b-deduped-q4_0.bin | 803.0 MiB | 809.0 MiB
|
29 |
+
ggml-pythia-1b-deduped-q5_1.bin | 921.5 MiB | 927.3 MiB
|
30 |
+
ggml-pythia-1.4b-deduped-q4_0.bin | 1.1 GiB | 1.1 GiB
|
31 |
+
ggml-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB | 1.3 GiB
|
32 |
+
ggml-pythia-2.8b-deduped-q4_0.bin | 2.0 GiB | 2.0 GiB
|
33 |
+
ggml-pythia-2.8b-deduped-q5_1.bin | 2.4 GiB | 2.4 GiB
|
34 |
|
35 |
+
# ALTERNATIVES
|
36 |
+
If you're here because you want a smaller model to run on a device with constrained memory, consider the following:
|
37 |
+
- OpenLLaMA [3B](https://huggingface.co/openlm-research/open_llama_3b_350bt_preview) [(7B)](https://huggingface.co/openlm-research/open_llama_7b_400bt_preview)
|
38 |
- RedPajama-INCITE [(3B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) [(7B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1)
|
39 |
- MPT [(1B)](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) [(7B)](https://huggingface.co/mosaicml/mpt-7b).
|
40 |
+
- RWKV PilePlus [(169M) (430M) (1.5B) (3B)](https://huggingface.co/BlinkDL/rwkv-4-pileplus)
|
41 |
|
42 |
+
All of them are trained at least partially on an open reproduction of LLaMA's dataset, [RedPajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT and RWKV use their own.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|