llama.cpp quantize
#3
by
LiuCi
- opened
Hi, author, I'm trying to quantify the model with llama.cpp. But when I use python convert-hf-to-gguf.py
to generate GGUF file, I got the error "NotImplementedError: Architecture 'LlamaForCausalLM' not supported!".
then I have tried with convert.py
, but this would get a error at the next step as shown on github. I have read your comment on reddit, you are a great coder.
Could you please tell me how to quantize the model with llama.cpp or any tutorial? I am a student who trying to place the model on a small platfrom. Thanks a lot for your reading!