Merry commited on
Commit
d68ebe9
·
1 Parent(s): babe17a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -18
README.md CHANGED
@@ -11,27 +11,32 @@ license: apache-2.0
11
 
12
  **2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm).*
13
 
14
- They're separated by date and commit so it's easier to track any breaking changes.
15
 
16
- # ALTERNATIVES
17
 
18
- If you're here because you want a smaller model to run on a device with constrained memory, try the instruct-based RWKV-Raven ([q8_0](https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main) and [q5_1](https://huggingface.co/latestissue/RWKV-4-Raven-CPP-Converted-Quantized/tree/main)) which goes as low as 1.5B, or [RWKV-PilePlus](https://huggingface.co/BlinkDL/rwkv-4-pileplus/tree/main), which goes as low as 169M.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- If you're here because you want an openly-licensed LLaMA, there's:
21
- - OpenLLaMA [(7B)](https://huggingface.co/openlm-research/open_llama_7b_preview_300bt)
 
22
  - RedPajama-INCITE [(3B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) [(7B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1)
23
  - MPT [(1B)](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) [(7B)](https://huggingface.co/mosaicml/mpt-7b).
 
24
 
25
- All of them are trained on an open reproduction of LLaMA's dataset, [RedPajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT uses its own.
26
-
27
- # RAM USAGE
28
- Model | Initial RAM usage
29
- :--:|:--:
30
- ggml-pythia-70m-deduped-q4_3.bin | 121.2 MiB
31
- ggml-pythia-160m-deduped-q4_3.bin | 225.2 MiB
32
- ggml-pythia-410m-deduped-q4_3.bin | 498.1 MiB
33
- ggml-pythia-1b-deduped-q4_3.bin | 951.5 MiB
34
- ggml-pythia-1.4b-deduped-q4_3.bin | 1.3 GiB
35
- ggml-pythia-2.8b-deduped-q4_3.bin | 2.4 GiB
36
- ggml-pythia-6.9b-deduped-q4_3.bin | 5.4 GiB
37
- ggml-pythia-12b-deduped-q4_3.bin | 9.2 GiB
 
11
 
12
  **2023-05-06:** *q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm).*
13
 
14
+ **2023-05-15:** *New quantization format. q4_0 and q5_1, up to 2.8B. Used [commit 010203f](https://github.com/ggerganov/ggml/tree/010203f94a85df5c86b773dc5acb698c8e7b1e7b/examples/gpt-neox).*
15
 
16
+ They're separated by date and commit so it's easier to track any breaking changes.
17
 
18
+ # RAM USAGE (on KoboldCpp w/ OpenBLAS)
19
+ Model | Initial RAM | After generation
20
+ :--:|:--:|:--:
21
+ Unloaded | 41.3 MiB
22
+ ggml-pythia-70m-deduped-q4_0.bin | 113.3 MiB | 267.8 MiB
23
+ ggml-pythia-70m-deduped-q5_1.bin | 121.5 MiB | 129.4 MiB
24
+ ggml-pythia-160m-deduped-q4_0.bin | 199.4 MiB | 201.6 MiB
25
+ ggml-pythia-160m-deduped-q5_1.bin | 227.5 MiB | 241.0 MiB
26
+ ggml-pythia-410m-deduped-q4_0.bin | 399.2 MiB | 406.2 MiB
27
+ ggml-pythia-410m-deduped-q5_1.bin | 455.7 MiB | 460.3 MiB
28
+ ggml-pythia-1b-deduped-q4_0.bin | 803.0 MiB | 809.0 MiB
29
+ ggml-pythia-1b-deduped-q5_1.bin | 921.5 MiB | 927.3 MiB
30
+ ggml-pythia-1.4b-deduped-q4_0.bin | 1.1 GiB | 1.1 GiB
31
+ ggml-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB | 1.3 GiB
32
+ ggml-pythia-2.8b-deduped-q4_0.bin | 2.0 GiB | 2.0 GiB
33
+ ggml-pythia-2.8b-deduped-q5_1.bin | 2.4 GiB | 2.4 GiB
34
 
35
+ # ALTERNATIVES
36
+ If you're here because you want a smaller model to run on a device with constrained memory, consider the following:
37
+ - OpenLLaMA [3B](https://huggingface.co/openlm-research/open_llama_3b_350bt_preview) [(7B)](https://huggingface.co/openlm-research/open_llama_7b_400bt_preview)
38
  - RedPajama-INCITE [(3B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) [(7B)](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1)
39
  - MPT [(1B)](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) [(7B)](https://huggingface.co/mosaicml/mpt-7b).
40
+ - RWKV PilePlus [(169M) (430M) (1.5B) (3B)](https://huggingface.co/BlinkDL/rwkv-4-pileplus)
41
 
42
+ All of them are trained at least partially on an open reproduction of LLaMA's dataset, [RedPajama](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T), but they're based on different architectures. OpenLLaMA is based on the LLaMA architecture (making it compatible with llama.cpp), RedPajama-INCITE is based on GPT-NeoX, and MPT and RWKV use their own.