Quant-Cartel
/

Phainesthesia-8x7B-iMat-GGUF

GGUF

Not-For-All-Audiences

GGUF

Model card Files Files and versions Community

Is this really using the wrong imatrix file?

by mradermacher - opened Mar 21, 2024

Discussion

mradermacher

Mar 21, 2024

The description of this quant claims to use an imatrix file from ikawrakows repo, but there isn't one for this model there. Are you using an imatrix file for a different model (imatrix files are specific to the model) or is the description wrong?

InferenceIllusionist

Quant Cartel org Mar 21, 2024

•

edited Mar 21, 2024

Howdy Michael - happy you reached out! Always nice hearing from a fellow quantizer in the field. Actually I'm confused on how you arrived at "wrong" here. While you're correct that this quant was not made with the traditional .dat file imatrix calculation method, the kl-divergence with the new process showed significant improvement versus when the quant is made with groups_merged.txt. If you look on the model card for maid-yuzu-v8-alter-iMat you can see the detailed KL divergence comparisons between groups merged and the new method.

To be clear regardless of running this deep dive analysis on the impact of using the .imatrix file from the repo, every single quant I upload is verified with automated benchmarks - as well as a thorough manual review process to ensure quantization was a success - before it ever hits the repo.

Not sure if you've had a chance to sit down and measure our respective Phainesthesia Q4_K_S quants for KL divergence, but if you did, you'd see that we are nearly identical with almost the same exact metrics:

mradermacher/Phainesthesia-8x7B.i1-Q4_K_M.gguf

InferenceIllusionist/Phainesthesia-8x7B-iMat-Q4_K_M.gguf.gguf

Holy smokes we're almost twins!!! Were you copying off my homework or something Mike?!? But in all seriousness when working with machine learning it's crucial to be guided by data; before throwing around words like "wrong" it might be better to just take a second to let the data tell the story right?

As always absolutely feel free to run the KL divergence calculations yourself and please keep me in check if you see otherwise. I would hate to be putting out any poor quality quants for folks on HF which is why having the review process is so important to me. The good news is these benchmarks are relatively quick to do (especially compared to imat calculations which I'm sure you know take time!) and should be a cornerstone of any quantizer's process for ensuring quality. If you have any follow-ups I'm happy to clarify and I appreciate you reaching out to make sure, we could all use a little more collaboration and a second set of eyes these days.

PS: if you're ever open to collaborating on a project please let me know! You have some serious compute on your side and have been curious if you've thought about diving into the exciting world of finetuning or QLORAs together.

mradermacher

Mar 21, 2024

•

edited Mar 21, 2024

There is no need to be so arrogant - I came to the conclusion, as I wrote, that it's the incorrect file because the model description lists the source, which does not contain the correct imatrix file for this model. That's the result that you get when you get guided by the data, in this case the model description... So if the description is correct, it's the incorrect file. If the imatrix file was correct for the model, the description refers to the wrong imatrix file. No matter how, the imatrix file is incorrect. Pretty simple. I did not attack the quality (or even talk about the quality) of your quants. I certainly don't verify my quants before uploading (I simply don't have the disk space for this luxury), so all the power to you for doing that.

I suspected that you simply had an error in the description (maybe you confuse imatrix files with their training data? But but your stated source also does not have training data), that's why I asked. I noticed this before on your quants and wanted to help you fix the mistake.

As for the serious compute, I have to disappoint you. The hardware I quantize on is, by and large, a decade old, or even older (e.g. the server that is quantizing Phainesthesia is a Xeon E3-1275, and is one of the faster ones).

Anyway, I won't bother you again.

InferenceIllusionist

Quant Cartel org Mar 21, 2024

Not meaning to be arrogant, just honestly perplexed at your continued assertion that is an "incorrect" file. Both of these divergences were taken from the static Q8. The data I posted shows that there is no difference in using the mixtral-8x7b-instruct-v0.1.imatrix from the base model in the ikawrakow repo and whatever importance matrix calculation methods you decided to go with. Functionally our quants are the same. If you have any data at all to suggest otherwise I'm all ears. I'll list the exact file I used in case you're curious to test for yourself.

(Sad to hear about the compute though and best of luck to you)

InferenceIllusionist changed discussion status to closed Mar 21, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment