Text Generation
Transformers
GGUF
English
mixtral
text-generation-inference

Why is the Q4_0 version the same size as the Q4_K_M one?

#2
by deleted - opened
deleted

It's the same with TheBloke's GGUFs, including with Dolphin 2.5 and 2.6, yet with all other Mixtrals I've seen, including Smaug and Nous-Hermes-2, the K_M versions are larger (27.7 vs 25.8 GB).

I thought _K_M meant that higher quantization was used for some blocks, hence the file sizes must be larger than the _0 version.

Anyways, thanks for the release. This is more just about idle curiosity.

Sign up or log in to comment