My quantizations.

by ZeroWw - opened Jul 6

Discussion

ZeroWw

Jul 6

•

edited Jul 23

These are my own quantizations (updated almost daily).

The difference with normal quantizations is that I quantize the output and embed tensors to f16.
and the other tensors to 15_k,q6_k or q8_0.
This creates models that are little or not degraded at all and have a smaller size.
They run at about 3-6 t/sec on CPU only using llama.cpp
And obviously faster on computers with potent GPUs

ZeroWw/Smegmma-9B-v1-GGUF

Note: I tested the f16/q6_k quantization and it worked even better than gemma 2b for some reason it could solve a logic problem where gemma failed.

TheDrummer

BeaverLegacy org Jul 6

Did you quant the right Smegmma? This one is Smegmma Deluxe.

ThijsL202

Jul 6

Did you quant the right Smegmma? This one is Smegmma Deluxe.

Which one do you recommend for normal RP with ERP in between at 8k context? Smegmma-9B-v1 or Smegmma-Deluxe-9B-v1?

ZeroWw

Jul 7

•

edited Jul 7

oops.. my bad.. I did the other... wait.. I do this one also..

These are my own quantizations (updated almost daily).
The difference with normal quantizations is that I quantize the output and embed tensors to f16.
and the other tensors to 15_k,q6_k or q8_0.
This creates models that are little or not degraded at all and have a smaller size.
They run at about 3-6 t/sec on CPU only using llama.cpp
And obviously faster on computers with potent GPUs

ALL the models were quantized in this way:
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
and there is also a pure f16 in every directory.

ZeroWw/Smegmma-Deluxe-9B-v1-GGUF

hoxv

Jul 8

Did you quant the right Smegmma? This one is Smegmma Deluxe.

Which one do you recommend for normal RP with ERP in between at 8k context? Smegmma-9B-v1 or Smegmma-Deluxe-9B-v1?

I highly recommend you try both, especially in combination
Deluxe has better ideas and is suitable for the opening plot
Smegmma has strong obedience and is suitable for taking orders

So start RP with Deluxe first. If you want the AI to follow your instructions to develop the story during the story, then switch to Smegmma.

ThijsL202

Jul 9

Did you quant the right Smegmma? This one is Smegmma Deluxe.

Which one do you recommend for normal RP with ERP in between at 8k context? Smegmma-9B-v1 or Smegmma-Deluxe-9B-v1?

I highly recommend you try both, especially in combination
Deluxe has better ideas and is suitable for the opening plot
Smegmma has strong obedience and is suitable for taking orders

So start RP with Deluxe first. If you want the AI to follow your instructions to develop the story during the story, then switch to Smegmma.

Thanks for the recommendation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment