Grok-1 GGUF Quantizations

This repository contains unofficial GGUF Quantizations of Grok-1, compatible with llama.cpp as of PR- Add grok-1 support #6204.

Updates

Native Split Support in llama.cpp

  • The splits have been updated to utilize the improvements from PR: llama_model_loader: support multiple split/shard GGUFs. As a result, manual merging with gguf-split is no longer required.

    With this, there is no need to merge the split files before use. Just download all splits and run llama.cpp with the first split like you would previously. It'll detect the other splits and load them as well.

Direct Split Download from huggingface using llama.cpp

server \
    --hf-repo Arki05/Grok-1-GGUF \
    --hf-file grok-1-IQ3_XS-split-00001-of-00009.gguf \
    --model models/grok-1-IQ3_XS-split-00001-of-00009.gguf \
    -ngl 999

And that is very cool (@phymbert)

Available Quantizations

The following Quantizations are currently available for download:

Quant Split Files Size
Q2_K 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 112.4 GB
IQ3_XS 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 125.4 GB
Q4_K 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 186.0 GB
Q6_K 1-of-9, 2-of-9, 3-of-9, 4-of-9, 5-of-9, 6-of-9, 7-of-9, 8-of-9, 9-of-9 259.8 GB

I would recommend the IQ3_XS version for now.

More Quantizations will be uploaded soon. All current Quants are created without any importance matrix.

Downloads last month
3,723
GGUF
Model size
316B params
Architecture
grok

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .