InferenceIllusionist
/

WizardLM-2-8x22B-iMat-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Jun 15, 2024

Commit

b85b1a5

·

verified ·

1 Parent(s): b257e17

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ tags:
 Quantized from fp32 with love. If you're using the latest version of llama.cpp you should no longer need to combine files before loading.
-* Weighted quantizations created using [imatrix file](https://huggingface.co/jukofyork/WizardLM-2-8x22B-imatrix) provided by jukofyork
 * Calculated in 105 chunks with n_ctx=512 using groups_merged.txt
 For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)

 Quantized from fp32 with love. If you're using the latest version of llama.cpp you should no longer need to combine files before loading.
+* Importance matrix calculated using fp16 precision model
 * Calculated in 105 chunks with n_ctx=512 using groups_merged.txt
 For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)