AetherArchitectural
/

GGUF-Quantization-Script

Text Generation

text-generation-inference

Model card Files Files and versions Community

FantasiaFoundry commited on Mar 14, 2024

Commit

70e358f

·

verified ·

1 Parent(s): 2ab8465

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -14,9 +14,9 @@ tags:
 Simple python script (`gguf-imat.py`) to generate various GGUF-IQ-Imatrix quantizations from a Hugging Face `author/model` input, for Windows and NVIDIA hardware.
-This is setup for a Windows machine with 8GB of VRAM, assuming use with an NVIDIA GPU. If you want to change the the `-ngl` (number of GPU layers) amount, you can do so at [**line 120**](https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/blob/main/gguf-imat.py#L120). This is only relevant during the `--imatrix` data generation. If you don't have enough VRAM you can decrease the `-ngl` amount or set it to 0 to only use your System RAM instead for all layers.
-Your `imatrix.txt` is expected to be located inside the `imatrix` folder. Included file is considered a good option, [this discussion](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) is where it came from.
 Adjust `quantization_options` in [**line 133**](https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/blob/main/gguf-imat.py#L133).
@@ -39,6 +39,8 @@ Quantizations will be output into the created `models\{model-name}-GGUF` folder.
 ### **Credits:**
 **If this proves useful for you, feel free to credit and share the repository.**
 **Made in conjunction with [@Lewdiculous](https://huggingface.co/Lewdiculous).**

 Simple python script (`gguf-imat.py`) to generate various GGUF-IQ-Imatrix quantizations from a Hugging Face `author/model` input, for Windows and NVIDIA hardware.
+This is setup for a Windows machine with 8GB of VRAM, assuming use with an NVIDIA GPU. If you want to change the `-ngl` (number of GPU layers) amount, you can do so at [**line 120**](https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/blob/main/gguf-imat.py#L120). This is only relevant during the `--imatrix` data generation. If you don't have enough VRAM you can decrease the `-ngl` amount or set it to 0 to only use your System RAM instead for all layers, this will make the imatrix data generation take longer, so kta good idea to find the number that gives your own machine the best results.
+Your `imatrix.txt` is expected to be located inside the `imatrix` folder. I have already included a file that is considered a good starting option, [this discussion](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) is where it came from. If you have suggestions or other imatrix data to recommend please do so.
 Adjust `quantization_options` in [**line 133**](https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script/blob/main/gguf-imat.py#L133).
 ### **Credits:**
+Feel free to Pull Request with your own features and improvements to this script.
 **If this proves useful for you, feel free to credit and share the repository.**
 **Made in conjunction with [@Lewdiculous](https://huggingface.co/Lewdiculous).**