Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,8 @@ For other versions of the models, see here:
|
|
23 |
- [GGMLv2 q4_0 / q5_1](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd/2023-05-15) (70M to 2.8B)
|
24 |
- [GGMLv3 q4_0 / q5_1](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/main) (70M to 2.8B)
|
25 |
|
|
|
|
|
26 |
|
27 |
# RAM USAGE
|
28 |
Model | RAM usage
|
|
|
23 |
- [GGMLv2 q4_0 / q5_1](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd/2023-05-15) (70M to 2.8B)
|
24 |
- [GGMLv3 q4_0 / q5_1](https://huggingface.co/Crataco/Pythia-Deduped-Series-GGML/tree/main) (70M to 2.8B)
|
25 |
|
26 |
+
**Description:**
|
27 |
+
- The motivation behind these quantizations was that the LLaMA series lacks sizes below 7B, whereas it was the norm for older models to be available in as little as ~125M parameters. This makes it uncomfortable to run on hardware with less than 4GB of RAM, even with 2-bit quantization.
|
28 |
|
29 |
# RAM USAGE
|
30 |
Model | RAM usage
|