Weighted vs static
Hello, I've read in some posts that imatrix quants are slower when run on CPU. Is this only for the IQ# quants, or does it also apply for the i1 - Q6K for example?
Thanks a lot :)
imatrix quants are pretty much the same speed as non-imatrix quants. Any difference is probably down to random noise or chance.
However, IQ quants are more cpu intensive than non-IQ quants, there are a lot more imatrix IQ-quants than static ones, and IQ-quants are often confused with "imatrix quants", so this is probably where this comes from.
Also, as with practically everything, it depends - Q-quants are often memory-speed bound. So if you throw 4 or 20 cores on it, it often performs the same. IQ quants are much more cpu-intensive, so they might run a lot slower than Q-quants on a 4 core cpu, or about the same on a 20 core cpu.
For example, on my 14700K cpu, IQ3 quants tend to be the same speed or faster than Q4_K_S quants, as that cpu has ample cores available. But on a 4 or 8-core machine, the IQ3 quant might run very much slower than the Q4_K_S.