README.md · aisensiy/Qwen-72B-Chat-GGUF at 36b848d8e3155746f8fe8b9adb40f94e6a12a484

metadata

license: mit

How to convert

First, you need git clone llama.cpp and make it.

Then follow the instrution to generate gguf files.

# convert Qwen HF models to gguf fp16 format
python convert-hf-to-gguf.py --outfile qwen7b-chat-f16.gguf --outtype f16 Qwen-7B-Chat

# quantize the model to 4-bits (using q4_0 method)
./quantize qwen7b-chat-f16.gguf qwen7b-chat-q4_0.gguf q4_0

# chat with Qwen models
./main -m qwen7b-chat-q4_0.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

Files are split and require joining

Note: HF does not support uploading files larger than 50GB but upload a 41GB file is too hard for me. Therefore I have uploaded the Q4_0 by splitting it of 5GB per file.

To join the files, do the following:

Linux and macOS:

cat qwen72b-chat-q4_0.gguf-split-* >qwen72b-chat-q4_0.gguf && rm qwen72b-chat-q4_0.gguf-split-*