Change from 2.6 to 2.55bpw

by AbiTabby - opened Nov 15, 2023

Nov 15, 2023

Firstly thanks for all the exl2 conversions.
But can I ask why you dropped output from your regular 2.6 to 2.55bpw?
The difference may well be insignificant, but I always found 2.6bpw to be a sweet spot on my setup (28GB Vram).

LoneStriker

Owner Nov 15, 2023

Mixed messages from folks requesting different bpw. I believe 2.55 was enough to get some folks extra context length. 2.6 was originally selected as that was what Turboderp said would fit in a single 24GB VRAM card. I can switch back to 2.6.

AbiTabby

Nov 15, 2023

If more people are requesting 2.55bpw, then stick with it. Though I would think that 2.4 is a better fit for those with 24GB VRAM, depending on what other programs they have open that are using VRAM.
Whatever you decide I'm very grateful for your quants.
Currently driving the Nous-Capybara-34B-5.0bpw that you posted, and finding it very responsive and coherent. But for some reason Alpaca Instruct is working better than Vicuna via ST frontend.
Anyway take care 😃

LoneStriker

Owner Nov 15, 2023

Things are getting even a bit more complicated. If you enable the cache_8bit option for the ExLlamav2 loader, you can fit even more bits; supposedly only trading speed of inference for more VRAM space (no degradation of quality):

We may have to do another calibration of what the best bpw settings are again with and without the cache_8bit option.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment