Reconvert GGUF for the MoE, due to llama.cpp update
#1
by
CombinHorizon
- opened
would you please re-convert the GGUF using a newer version (newer than 2024-04apr-03) of llama.cpp for better performance?
see
https://github.com/ggerganov/llama.cpp/#hot-topics
MoE memory layout has been updated - reconvert models for mmap
support and regenerate imatrix
https://github.com/ggerganov/llama.cpp/pull/6387
thx
I found the solution for everyone own this gguf file:
./quantize --allow-requantize can convert the old format to new format.
due to internet traffic limit, I cannot upload the new gguf, sorry for that.