Not-For-All-Audiences

Model card Files Files and versions Community

The "silly" test.

by ZeroWw - opened Jul 28

Jul 28

ZeroWw 'SILLY' version.

The original model has been quantized (fq8 version) and a percentage of it's tensors have been modified adding some noise.

Full colab: https://colab.research.google.com/drive/1a7seagBzu5l3k3FL4SFk0YJocl7nsDJw?usp=sharing

Fast colab: https://colab.research.google.com/drive/1SDD7ox21di_82Y9v68AUoy0PhkxwBVvN?usp=sharing

Original reddit post: https://www.reddit.com/r/LocalLLaMA/comments/1ec0s8p/i_made_a_silly_test/

I created a program to randomize the weights of a model. The program has 2 parameters: the percentage of weights to modify and the percentage of the original value to randmly apply to each weight.

At the end I check the resulting GGUF file for binary differences. In this example I set to modify 100% of the weights of Mistral 7b Instruct v0.3 by a maximum of 15% deviation.

Since the deviation is calculated on the F32 weights, when quantized to Q8_0 this changes. So, in the end I got a file that compared to the original has:

Bytes Difference percentage: 73.04%

Average value divergence: 2.98%

The cool thing is that chatting with the model I see no apparent difference and the model still works nicely as the original.

Since I am running everything on CPU, I could not run perplexity scores or anything computing intensive.

As a small test, I asked the model a few questions (like the history of the roman empire) and then fact check its answer using a big model. No errors were detected.

Update: all procedure tested and created on COLAB.

Example: https://huggingface.co/ZeroWw/L3.1-8B-Celeste-V1.5-SILLY

jukofyork

Jul 28

•

edited Jul 28

You're posting this everywhere now? :/

I tried to tell you in the other thread that this won't really do anything interesting... When you quantise a model (or perform any type of "lossy" compression) you introduce "noise" like this anyway:

https://en.wikipedia.org/wiki/Rate%E2%80%93distortion_theory

https://en.wikipedia.org/wiki/Quantization_(signal_processing)

So all you are doing is adding some kind of hybrid (uniformly distributed) distortion that ultimately has the same effect as quantisation...

In simple terms - when you use Q8_0 you are having to compress the weights to use 256 separate values:

Imagine each weight is a (uniformly distributed) value between 0 and 1.
This means you now have to round each weight to the nearest 1/256th (~0.004).
This is then approximately equivalent to adding a (uniformly distributed) random value of between 0 and 0.002 (0.004/2) to every weight.

In reality the weights tend to be Normally distributed but the same idea applies.

Altotas

Jul 28

This guy is the only one on the whole site who does this aggressive "promotion" of his quants. Everywhere I look, he's already there with his "My quants" and "Silly test".

AuriAetherwiing

Nothing is Real org Jul 28

tbh I don't really mind quants being sent right to our models' discussion pages, it's pretty convenient

Altotas

Jul 28

tbh I don't really mind quants being sent right to our models' discussion pages, it's pretty convenient

Have you looked at his quants though? He claims that his Q5 and Q6 somehow capable of performing as well as fp16, with only his own words as proof and zero benchmarks.

AuriAetherwiing

Nothing is Real org Jul 28

•

edited Jul 28

Have you looked at his quants though? He claims that his Q5 and Q6 somehow capable of performing as well as fp16, with only his own words as proof and zero benchmarks.

Well, I have looked into them now, specifically Q6. KoboldCPP reports wrong number of tensors, 291 instead of 292, and doesn't load. Removing from the model card, won't be adding them anymore.
Probably I should reorder quant list too, GGUFs should the lowest priority (bf16 > fp8 > exl2 > gguf), especially considering that fp8 and exl2 are first-party.

AuriAetherwiing changed discussion status to closed Jul 28

ZeroWw

Jul 28

This guy is the only one on the whole site who does this aggressive "promotion" of his quants. Everywhere I look, he's already there with his "My quants" and "Silly test".

sorry if I bothered anyone. I don't do this for money or anything. I thought it was useful.

lesson learned.

ZeroWw

Jul 28

Have you looked at his quants though? He claims that his Q5 and Q6 somehow capable of performing as well as fp16, with only his own words as proof and zero benchmarks.

Well, I have looked into them now, specifically Q6. KoboldCPP reports wrong number of tensors, 291 instead of 292, and doesn't load. Removing from the model card, won't be adding them anymore.
Probably I should reorder quant list too, GGUFs should the lowest priority (bf16 > fp8 > exl2 > gguf), especially considering that fp8 and exl2 are first-party.

I checked them before posting them. They were working inside colab even.
If you have problems try the 8B, if that works, then I will check again the 12B (but I am sure it worked).
NOTE: You must update llama.cpp to the very latest version for these to work.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment