30B 3bit seems pretty sweet by the official evaluation

#1
by Yhyu13 - opened

Thanks for making 3bit 128G quantized model for the community!

The 30B 3bit 128G model seems to meet a sweet spot that outperform 13B fp16 model:
msedge_wKQ6yw1t6Z.png
Which motivated me to just about convert the model by myself.

I was very pleasent with locally deploying 13B fp16 models on a dual 3090 server. Now I am going to try two instances of 30B 3bit model

Glad you are finding use for it! This one scores Wikitext2 at 5.22. I tried publishing model card with my eval results, but HF is having problems.

Here are the results I got (better to just compare to the results on my other two supercot quants, since then it's apples to apples)

WikiText2: 5.22 (12% worse than 4bit non-groupsize)
PTB: 19.63 (11% worse than 4bit non-groupsize)
C4: 6.93 (7% worse than 4bit non-groupsize)

It is hard to tell the difference but looks like those results are reasonably accurate . 3bit version writes a bit shorter responses but at least now it fits more text into memory before crashing.

Still hard to decide if it is better to use 4bit model which is a little bit better or smaller model which has more context space.

Sign up or log in to comment