Gptq or GGUF
any chance for a GPTQ or GGUF conversion for this?
llama.cpp has issues with llama-3.1, so everybody is currently waiting for those to be fixed before quantising. with luck, it will be fixed in a day.
llama.cpp has issues with llama-3.1, so everybody is currently waiting for those to be fixed before quantizing. with luck, it will be fixed in a day.
That is true for the 8B and 70B versions of this model, but the 12B version is based on Mistral Nemo, not Llama 3.1. And Nemo is fully supported by llama.cpp currently. The same is true for the 123B version which is based on Mistral Large.
You're right, Mistral quant can be done, will probably do some today.
L3.1 will need to wait
You're right, Mistral quant can be done, will probably do some today.
L3.1 will need to wait
Thank you, I really appreciate all the work you guys have put into this. I'm looking forward to trying this out👍.
It looks quite promising. Especially given how good Nemo is to start with.
You're right, Mistral quant can be done, will probably do some today.
L3.1 will need to wait
I look forward to your publication. I really hope you can also publish the best parameters for samplers.
L3.1 support got merged, I'm gonna do some static quant for the 4 model