fedric95/gemma-7b-GGUF · Hugging Face

Llamacpp Quantizations of Meta-Llama-3.1-8B

Using llama.cpp release b3583 for quantization.

Filename	Quant type	File Size	Perplexity (wikitext-2-raw-v1.test)
gemma-7b.BF16.gguf	BF16	17.1 GB	6.9857 +/- 0.04411
gemma-7b-Q8_0.gguf	Q8_0	9.08 GB	7.0373 +/- 0.04456
gemma-7b-Q6_K.gguf	Q6_K	7.01 GB	7.3858 +/- 0.04762
gemma-7b-Q5_K_M.gguf	Q5_K_M	6.14 GB	7.4227 +/- 0.04781
gemma-7b-Q5_K_S.gguf	Q5_K_S	5.98 GB	7.5232 +/- 0.04857
gemma-7b-Q4_K_M.gguf	Q4_K_M	5.33 GB	7.5800 +/- 0.04918
gemma-7b-Q4_K_S.gguf	Q4_K_S	5.05 GB	7.9673 +/- 0.05225
gemma-7b-Q3_K_L.gguf	Q3_K_L	4.71 GB	7.9586 +/- 0.05186
gemma-7b-Q3_K_M.gguf	Q3_K_M	4.37 GB	8.4077 +/- 0.05545
gemma-7b-Q3_K_S.gguf	Q3_K_S	3.98 GB	102.6126 +/- 1.62310
gemma-7b-Q2_K.gguf	Q2_K	3.48 GB	3970.5385 +/- 102.46527

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download fedric95/gemma-7b-GGUF --include "gemma-7b-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download fedric95/gemma-7b-GGUF --include "gemma-7b-Q8_0.gguf/*" --local-dir gemma-7b-Q8_0

You can either specify a new local-dir (gemma-7b-Q8_0) or download them all in place (./)