Possible to upload Q4_K_M?

by EricTri - opened


Thanks for uploading this! I am currently limited on storage space and cant spare the 3TB needed to generate the Q4 optimized model. Would be much appreciated!


EricTri changed discussion title from Possible to uplaod Q4_K_M? to Possible to upload Q4_K_M?
Daydream org
edited Feb 5

Unfortunately, I no longer have access to the machine. But if you can spare more than 2TB here is how I would do it:

export WORK_DIR=$(pwd)
python3 -m venv venv
source venv/bin/activate
pip3 install -U "huggingface_hub[cli]"

# the fp8 checkpoints are around 700GB
mkdir checkpoints
huggingface-cli download --resume-download --local-dir checkpoints/DeepSeek-R1 deepseek-ai/DeepSeek-R1

# my fork of llama.cpp including pr #11446 and some changes to allow converting fp8 hf to bf16 gguf directly using triton(-cpu) without the need of intermediate checkpoints
git clone https://github.com/evshiron/llama.cpp --recursive
pushd llama.cpp
pip3 install -r requirements/requirements-convert_hf_to_gguf.txt
cmake -B build
cmake --build build --config Release

# install triton-cpu for cpu-only dequant
git clone https://github.com/triton-lang/triton-cpu --recursive
pushd triton-cpu
pip3 install ninja cmake wheel pybind11
MAX_JOBS=32 pip3 install -e python

# hopefully it should work, takes an hour or more depending on your hardware, the bf16 checkpoints are around 1.3TB
# the dequant process may take more than 64GB RAM, but should be doable within 360GB RAM
python3 llama.cpp/convert_hf_to_gguf.py --outtype bf16 --split-max-size 50G checkpoints/DeepSeek-R1

# removing the fp8 checkpoints gives us 700GB back
mkdir checkpoints/DeepSeek-R1-BF16
mv checkpoints/DeepSeek-R1/*.gguf checkpoints/DeepSeek-R1-BF16
rm -r checkpoints/DeepSeek-R1

# then use llama-quantize to make the quants you want, Q4_K_M should be around 400GB?
./llama.cpp/build/bin/llama-quantize --keep-split checkpoints/DeepSeek-R1-BF16/<THE_FIRST_OF_DeepSeek-R1-BF16_GGUF>.gguf Q4_K_M

Amazing, thanks for the detailed instructions. Going to give this a shot overnight tonight!

@EricTri If you still need it, I made a Q4_K here: Q4_K)

I'd made a Q2_K for myself Q2_K

around 1.3TB

For anyone else doing this, it's 13.4TB, so provision a little extra storage. Don't do what I did and have to attach another volume part-way-through then quickly mv and symlink things around :)

Sign up or log in to comment