Quantization error
Has anyone successfully used LLAMA-CPP to quantize the model for gguf? I tried to quantize it to q5_k_m weights, but encountered a weight loss error during the quantization import. Why is this happening?
I was able to successfully quantize to 3_k_s for testing. I no longer have the quantized model, but I didn't have to do anything special to get it working. I have never seen an error like that during quantization.
I have no interest in redownloading this model for quantization. If you are unable to quantize it, then I recommend finding a different model.
I searched for related issues on GitHub and found that this might be due to the lack of support for MoE in llama-cpp. I will upgrade the version of llama-cpp first and then see if this error still occurs.