Is the Q8_0 quant also imatrix'd? Why?
#1
by
igzbar
- opened
What was the basis of the decision to use imatrix vs. regular quantization for Q8_0? Doesn't imatrix reduce performance?
It shouldn't reduce performance (unless you have a source on that) but it also should not affect it much if at all, since at Q8 there's no need to compress portions further than others