Weighted vs static

#1
by SporkySporkness - opened

Hello, I've read in some posts that imatrix quants are slower when run on CPU. Is this only for the IQ# quants, or does it also apply for the i1 - Q6K for example?
Thanks a lot :)

imatrix quants are pretty much the same speed as non-imatrix quants. Any difference is probably down to random noise or chance.

However, IQ quants are more cpu intensive than non-IQ quants, there are a lot more imatrix IQ-quants than static ones, and IQ-quants are often confused with "imatrix quants", so this is probably where this comes from.

Also, as with practically everything, it depends - Q-quants are often memory-speed bound. So if you throw 4 or 20 cores on it, it often performs the same. IQ quants are much more cpu-intensive, so they might run a lot slower than Q-quants on a 4 core cpu, or about the same on a 20 core cpu.

For example, on my 14700K cpu, IQ3 quants tend to be the same speed or faster than Q4_K_S quants, as that cpu has ample cores available. But on a 4 or 8-core machine, the IQ3 quant might run very much slower than the Q4_K_S.

mradermacher changed discussion status to closed

Sign up or log in to comment