Differences between Q5 model variants

#3
by JineLD - opened

Hi,

thank you for providing the quantized versions of the model, they work great with llama.cpp! I do have a question about the variants though, what exactly are the differences between:

  • Q5_0
  • Q5_1_00001-00002 + Q5_2_00001-00002
  • Q5_K_M
  • Q5_K_S

I started with a Q8 version which is way too slow for my system, moved onto Q6 which is almost usable with some patience, now wondering if I should try Q5, but there I am a bit at a loss on where to start and what the differences between these Q5 variants are?

Kind regards,
Jin

Well, quite frankly the difference between the Q5 quants are very minimal so better to go with Q5_K_S. Q6 is very similiar in quality to Q8. Here's a graph, lower the perplexity the better. To best get the benefits of quantization with great quality, use Q5_K_S, Q6-Q8 is pretty much the same as loading the full unquantized model.

w336n_wS89x80gc5mwUVj.png

Ah I see, it's not always easy for a newbie to get a clear picture. Thank you for the detailed answer!

JineLD changed discussion status to closed
Quant Factory org

Thanks for clarifying @Clevyby

Sign up or log in to comment