Nexesenex
/

MIstral-QUantized-70b_Miqu-1-70b-iMat.GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Feb 3

Commit

86671a1

•

1 Parent(s): bc8cef9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ Full offload possible on 48GB VRAM with a huge context size :
 Full offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3_K_M, for example)
 - Q3_K_M, Q3_K_S, Q3_K_XS, IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context!)
-- Lower quality : Q2_K_S
 Full offload possible on 24GB VRAM with a decent context size.
 - IQ2_XS SOTA

 Full offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3_K_M, for example)
 - Q3_K_M, Q3_K_S, Q3_K_XS, IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context!)
+- Lower quality : Q2_K (I remade one with iMatrix, which beats hands-down Miqudev's on perplexity), Q2_K_S
 Full offload possible on 24GB VRAM with a decent context size.
 - IQ2_XS SOTA