My quantizations.

#1
by ZeroWw - opened

These are my own quantizations (updated almost daily).

The difference with normal quantizations is that I quantize the output and embed tensors to f16.
and the other tensors to 15_k,q6_k or q8_0.
This creates models that are little or not degraded at all and have a smaller size.
They run at about 3-6 t/sec on CPU only using llama.cpp
And obviously faster on computers with potent GPUs

Note: I tested the f16/q6_k quantization and it worked even better than gemma 2b for some reason it could solve a logic problem where gemma failed.

BeaverLegacy org

Did you quant the right Smegmma? This one is Smegmma Deluxe.

Did you quant the right Smegmma? This one is Smegmma Deluxe.

Which one do you recommend for normal RP with ERP in between at 8k context? Smegmma-9B-v1 or Smegmma-Deluxe-9B-v1?

oops.. my bad.. I did the other... wait.. I do this one also..

These are my own quantizations (updated almost daily).
The difference with normal quantizations is that I quantize the output and embed tensors to f16.
and the other tensors to 15_k,q6_k or q8_0.
This creates models that are little or not degraded at all and have a smaller size.
They run at about 3-6 t/sec on CPU only using llama.cpp
And obviously faster on computers with potent GPUs

ALL the models were quantized in this way:
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
and there is also a pure f16 in every directory.

Did you quant the right Smegmma? This one is Smegmma Deluxe.

Which one do you recommend for normal RP with ERP in between at 8k context? Smegmma-9B-v1 or Smegmma-Deluxe-9B-v1?

I highly recommend you try both, especially in combination
Deluxe has better ideas and is suitable for the opening plot
Smegmma has strong obedience and is suitable for taking orders

So start RP with Deluxe first. If you want the AI ​​to follow your instructions to develop the story during the story, then switch to Smegmma.

Did you quant the right Smegmma? This one is Smegmma Deluxe.

Which one do you recommend for normal RP with ERP in between at 8k context? Smegmma-9B-v1 or Smegmma-Deluxe-9B-v1?

I highly recommend you try both, especially in combination
Deluxe has better ideas and is suitable for the opening plot
Smegmma has strong obedience and is suitable for taking orders

So start RP with Deluxe first. If you want the AI to follow your instructions to develop the story during the story, then switch to Smegmma.

Thanks for the recommendation.

Sign up or log in to comment