What text do you use to make the imatrix for your uploads?

#1
by ddh0 - opened

I'm trying to wrap my head around how imatrix quantizations work and whether or not it's worth quantizing my models with it. Would you mind sharing the text file that you use for your quants? And any other tips you might have.

Thanks so much! I really appreciate all these uploads. You're doing great work.

It's worth quantizing your models with it if you want to use Q5 or smaller quants. With smaller than Q4, they really become important.

I cannot share the text I use, but it contains "groups_merged.txt" from https://github.com/ggerganov/llama.cpp/discussions/5263 which is generally a reasonable good text, and not too long. Other people use wikipedia extracts (wiki.train.raw), groups_10_merged.txt, random tokens or others.

It seems generally not super critical which text you use. In my experience, 40k tokens can be enough, although I generally use around 160k tokens. Make sure you compute it on a GPU, though, as that will cut down times tremendously (you don't need much vram if you keep the model in system ram). You can do imatrix calculations on quants if you can't fit the whole model, or you can stream from an nvme disk, which isn't all that much slower than ram.

I don't know what I am doing, though, so listening to me might not give you best results, and my imatrix quants have not been compared much to others.

mradermacher changed discussion status to closed

Sign up or log in to comment