Requantize to support latest code on llama.cpp

#45
by TusharRay - opened

As per https://github.com/ggerganov/llama.cpp/pull/1508, the current ggml q4_1 file in this model is out of support. Need to re-quantize with supported format for people to consume.

Same issue here. Conversion using the current llama.cpp script does not work. This is the output I'm getting when using the bundled ggml model:

(llama.cpp) ➜  llama.cpp git:(master) βœ— ./main -n 256 -ngl 1 -c 2048 -p "the truth is" -m ../gpt4-x-alpaca-13b-native-4bit-128g/gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g/ggml-model-q4_1.bin
main: warning: model does not support context sizes greater than 2048 tokens (4096 specified);expect poor results
main: build = 635 (5c64a09)
main: seed  = 1686257750
llama.cpp: loading model from ../gpt4-x-alpaca-13b-native-4bit-128g/gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g/ggml-model-q4_1.bin
error loading model: unexpectedly reached end of file
llama_init_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '../gpt4-x-alpaca-13b-native-4bit-128g/gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g/ggml-model-q4_1.bin'
main: error: unable to load model

Sign up or log in to comment