Spaces:

ggml-org
/

gguf-my-repo

Running on A10G

App Files Files Community

146

Add IQ Quantization support with the help of imatrix and GPUs

#35

by qnixsynapse - opened Apr 7, 2024

Discussion

qnixsynapse

Apr 7, 2024

It will allows us to create imatrix data and quants with one go!

hus960

Apr 22, 2024

Super useful when we deal with 100b+ models , 1m bit is really nice to support.

rinaldow

Apr 22, 2024

Would be awesome to see options for IQ 6 / 5 /4 /3 / 2 NL / XS

Dampfinchen

May 10, 2024

Would be really awesome to have an option to upload a txt for imatrix creation and then create imatrix quants with it.

israellaguan

Jun 3, 2024

#78 should help here if merged

reach-vb

ggml.ai org Jun 11, 2024

We just merged support for iMatrix! Do let us know if you have any feedback! 🤗

qnixsynapse

Jun 12, 2024

@reach-vb Just gave it a try. I have one suggestion. Currently, it is impossible for anyone to see the progress because the gradio only shows the loading indicator. I think it would be better if console logs are shown instead. This will allow us to track the progress and inspect any errors encountered during calculation/conversion. :)

Thanks to you and everybody else involved. I should close this discussion now. :)

qnixsynapse changed discussion status to closed Jun 12, 2024

reach-vb

ggml.ai org Jun 12, 2024

That's a brilliant feedback!

qnixsynapse

Jul 15, 2024

@reach-vb Hi! When llama.cpp gets updated here?
Sorry for bothering you but currently Gemma(9B) conversation fails because of an assert, which has been fixed upstream.

Need atleast b3389 for the fix.

I was not sure how to contact you, so commented here. 😃

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment