|
--- |
|
license: mit |
|
--- |
|
|
|
## How to convert |
|
|
|
First, you need git clone [llama.cpp](https://github.com/ggerganov/llama.cpp) and make it. |
|
|
|
Then follow the instrution to generate gguf files. |
|
|
|
``` |
|
# convert Qwen HF models to gguf fp16 format |
|
python convert-hf-to-gguf.py --outfile qwen7b-chat-f16.gguf --outtype f16 Qwen-7B-Chat |
|
|
|
# quantize the model to 4-bits (using q4_0 method) |
|
./quantize qwen7b-chat-f16.gguf qwen7b-chat-q4_0.gguf q4_0 |
|
|
|
# chat with Qwen models |
|
./main -m qwen7b-chat-q4_0.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt |
|
``` |
|
|
|
|
|
## Files are split and require joining |
|
|
|
**Note:** HF does not support uploading files larger than 50GB but upload a 41GB file is too hard for me. Therefore I have uploaded the Q4_0 by splitting it of 5GB per file. |
|
|
|
To join the files, do the following: |
|
|
|
Linux and macOS: |
|
|
|
``` |
|
cat qwen72b-chat-q4_0.gguf-split-* >qwen72b-chat-q4_0.gguf && rm qwen72b-chat-q4_0.gguf-split-* |
|
``` |
|
|
|
Windows: |
|
|
|
``` |
|
copy /B qwen72b-chat-q4_0.gguf-split-aa + qwen72b-chat-q4_0.gguf-split-ab + qwen72b-chat-q4_0.gguf-split-ac + qwen72b-chat-q4_0.gguf-split-ad + qwen72b-chat-q4_0.gguf-split-ae + qwen72b-chat-q4_0.gguf-split-af + qwen72b-chat-q4_0.gguf-split-ag + qwen72b-chat-q4_0.gguf-split-ah qwen72b-chat-q4_0.gguf |
|
``` |
|
|