Lewdiculous
/

Llama-3-Lumimaid-8B-v0.1-OAS-GGUF-IQ-Imatrix

Inference Endpoints

Model card Files Files and versions Community

Lewdiculous commited on May 9

Commit

4ec3888

•

1 Parent(s): 8f98c87

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ GGUF-IQ-Imatrix quants for [NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS](https://hug
 > If there are any issues or questions let me know.
 > [!NOTE]
-> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** quant for up to 12288 context sizes.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/JUxfdTot7v7LTdIGYyzYM.png)

 > If there are any issues or questions let me know.
 > [!NOTE]
+> For **8GB VRAM** GPUs, I recommend the **Q4_K_M-imat** (4.89 BPW) quant for up to 12288 context sizes.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65d4cf2693a0a3744a27536c/JUxfdTot7v7LTdIGYyzYM.png)