Does this use imatrix?

#3
by nonetrix - opened

Seems imatrix can improve quants, does this use it? At 2 bits as expected it sometimes acts a bit strange, but I have seen other quants of similar sized models do better. I haven't tried 3 bits yet though might be able to squeeze it into 64GBs of RAM. Also, I think imatrix needs a dataset, ideally should be in all the languages that this model supports well. But, I imagine that it would greatly help with cases like this even with a lazy dataset
image.png

nonetrix changed discussion status to closed

These are regular quants (without imatrix)

Could you tell me which llama.cpp fork you use and what the SHA-256 hash of the weights is? There was an issue with F16 token embeddings, and I would like to make sure that this is not related to it

Well seems to always be periods a lot of the time it struggles with oddly, also seems to always pick similar words it will replace it with. Ignore the broken fonts that's a issue with my terminal with CJK languages not sure what it is
image.png
Anyway, as for the sum it's 47e139a57872a72096c05b043b1ec6c2f08451da7df0d84d45168708667b98f5 ./models/command-r-plus-Q2_K.gguf and I am running this https://github.com/ggerganov/llama.cpp/pull/6491 at commit d2924073ee9bdd600d22ded4e2d5fe30e69783a7

Tried Q3 big improvement and still have more spare memory than I thought I would

I have some early perplexity results on wikitext-2-raw and they seem to confirm this improvement

Test PPL Value Standard Deviation
Q2_K 5.7178 +/- 0.03418
Q3_K_L 4.6214 +/- 0.02629
Q4_K_M 4.4625 +/- 0.02522

Sign up or log in to comment