shadowsword
commited on
Commit
•
dffb18f
1
Parent(s):
4324822
Update README.md
Browse files
README.md
CHANGED
@@ -27,8 +27,40 @@ example$ python3 ./make-ggml.py --model /home/inpw/Pygmalion-1.1-7b --outname Py
|
|
27 |
|
28 |
Includes `USE_POLICY.md` making sure to comply with license agreements / legalities.
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
Original Card:
|
31 |
|
|
|
|
|
32 |
The LLaMA based Pygmalion-7b model:
|
33 |
|
34 |
https://huggingface.co/PygmalionAI/pygmalion-7b
|
|
|
27 |
|
28 |
Includes `USE_POLICY.md` making sure to comply with license agreements / legalities.
|
29 |
|
30 |
+
## Provided GGML Quants
|
31 |
+
|
32 |
+
| Quant Method | Use Case |
|
33 |
+
| ---- | ---- |
|
34 |
+
| Q2_K | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. |
|
35 |
+
| Q3_K_S | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors |
|
36 |
+
| Q3_K_M | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
|
37 |
+
| Q3_K_L | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
|
38 |
+
| Q4_0 | Original quant method, 4-bit. |
|
39 |
+
| Q4_1 | Original quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
|
40 |
+
| Q4_K_S | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors |
|
41 |
+
| Q4_K_M | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K |
|
42 |
+
| Q5_0 | Original quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. |
|
43 |
+
| Q5_1 | Original quant method, 5-bit. Even higher accuracy, resource usage and slower inference. |
|
44 |
+
| Q5_K_S | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors |
|
45 |
+
| Q5_K_M | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K |
|
46 |
+
| Q6_K | New k-quant method. Uses GGML_TYPE_Q8_K for all tensors - 6-bit quantization |
|
47 |
+
| ---- | ----
|
48 |
+
| fp16 | Compiled Safetensors, can be used to quantize |
|
49 |
+
|
50 |
+
Thanks to TheBloke for the information on quant use cases.
|
51 |
+
|
52 |
+
| RAM/VRAM | Parameters |
|
53 |
+
| ---- | ---- |
|
54 |
+
| 4GB | 3B |
|
55 |
+
| 8GB | 7B |
|
56 |
+
| 16GB | 13B |
|
57 |
+
| 32GB | 30B |
|
58 |
+
| 64GB | 65B |
|
59 |
+
|
60 |
Original Card:
|
61 |
|
62 |
+
# Pygmalion Vicuna 1.1 7B
|
63 |
+
|
64 |
The LLaMA based Pygmalion-7b model:
|
65 |
|
66 |
https://huggingface.co/PygmalionAI/pygmalion-7b
|