Q5_1 & Q5_K_M quant
Can you also upload the Q5_1 & Q5_K_M for mixtral-instruct please?
With this quantization approach Q5_1
is about the same as Q5_0
, and Q5_K_M
is about the same as Q5_K_S
.
With this quantization approach
Q5_1
is about the same asQ5_0
, andQ5_K_M
is about the same asQ5_K_S
.
Interesting... Is this a case exclusive to MoE? Because the difference between Q5_0 & Q5_1 on the openhermes model is more than double on the quantization error percentage.
With this quantization approach
Q5_1
is about the same asQ5_0
, andQ5_K_M
is about the same asQ5_K_S
.Interesting... Is this a case exclusive to MoE? Because the difference between Q5_0 & Q5_1 on the openhermes model is more than double on the quantization error percentage.
You can see in the OpenHermes table that Q5_K_M
quantization error is about the same as Q5_K_S
. Q5_1
has always behaved in erratic ways, for some models the quantization error being significantly higher than Q5_0
. With this new quantization that utilizes an "importance matrix", Q5_1
behaves much better in the sense that it is as good as, or better than, Q5_0
. In the case of the base and instruct tuned Mixtral-8x7b models it is about the same as Q5_0
. I'm not sure if it is related to the MoE architecture as the number of models quantized that way is too small at this point to draw this conclusion.