exl2-2 please?

by Thireus - opened Dec 12, 2023

Discussion

Thireus

Dec 12, 2023

Would you be able to issue a exl2-2 version of this 6.0bpw model please? :)

LoneStriker

Owner Dec 13, 2023

I'll put it no my list. I'm not sure how much improvements we'll see for the 6bpw model. For the lower bpw models, things are definitely better.

Thireus

Dec 13, 2023

Indeed, I don't expect much improvements either but very curious to see the results. Thank you.

LoneStriker

Owner Dec 13, 2023

It's up, but with a caveat: the quant enhancements have not been finalized, so there's a chance we may have to redo the quants. Worth a test though to compare:
https://huggingface.co/LoneStriker/dolphin-2.2-70b-6.0bpw-h6-exl2-2

Thireus

Dec 14, 2023

•

edited Dec 14, 2023

Some improvement on wikitext ppl:

dolphin-2.2-70b-6.0bpw-h6-exl2: 3.9869189262390137
dolphin-2.2-70b-6.0bpw-h6-exl2-2: 3.9655632972717285

Thireus

Dec 14, 2023

When you say the quant enhancements have not been finalized, which step of the conversion process do you mean? Measuring quantization impact...? Would it be worth redoing the quants?

LoneStriker

Owner Dec 15, 2023

When you say the quant enhancements have not been finalized, which step of the conversion process do you mean? Measuring quantization impact...? Would it be worth redoing the quants?

Turboderp was still finalizing the quantization enhancements. Initially, the new quants showed improvements but had certain instabilities at certain model sizes (like 13B.) He's gone back to using measurements, but faster. Both methods improve perplexity substantially, particularly at lower bpw. At this point, ~5bpw is nearly indistinguishable from fp16. I believe that the quants I've re-done should be good. Going forward, I'll be using his latest measurement method. It should be merged into main shortly (if it hasn't already).

cgus

Dec 17, 2023

@LoneStriker do I understand it correctly that the new quantization method should be in newly released 0.0.11 (since dev branch completely merged into it)?
And that it's enabled when you don't specify a callibration dataset?

LoneStriker

Owner Dec 17, 2023

Yes, that's correct. You will use the default calibration dataset that is constructed from a diverse set of different texts. You can still specify the calibration dataset if you wish, if for example your model uses a language not covered by the built-in one. But, Turbo tried to include lots of different text types, so this is almost never needed.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment