Lewdiculous
/

Chaos_RP_l3_8B-GGUF-IQ-Imatrix

Inference Endpoints

Model card Files Files and versions Community

Lewdiculous commited on Apr 30

Commit

16e271b

•

1 Parent(s): 183ea5f

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -16,6 +16,10 @@ GGUF-IQ-Imatrix quants for [jeiku/Chaos_RP_l3_8B](https://huggingface.co/jeiku/C
 > **Updated!**
 > These quants have been redone with the fixes from [llama.cpp/pull/6920](https://github.com/ggerganov/llama.cpp/pull/6920) in mind.
 > [!WARNING]
 > Recommended presets [here](https://huggingface.co/Lewdiculous/Model-Requests/tree/main/data/presets/cope-llama-3-0.1) or [here](https://huggingface.co/Virt-io/SillyTavern-Presets). <br>
 > Use the latest version of KoboldCpp. **Use the provided presets.** <br>

 > **Updated!**
 > These quants have been redone with the fixes from [llama.cpp/pull/6920](https://github.com/ggerganov/llama.cpp/pull/6920) in mind.
+> [!NOTE]
+> **Quant:** <br>
+> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** quant for up to 12288 context sizes.
 > [!WARNING]
 > Recommended presets [here](https://huggingface.co/Lewdiculous/Model-Requests/tree/main/data/presets/cope-llama-3-0.1) or [here](https://huggingface.co/Virt-io/SillyTavern-Presets). <br>
 > Use the latest version of KoboldCpp. **Use the provided presets.** <br>