Best quant for vram size

#1
by ramzeez88 - opened

Hi, is it possible to determine when quantizing ,which bpw will be suitable for a given vram amount? I was thinking if you could state the figures in the model card. For example 3.0 bpw fits in 8gb of vram,4.0 bpw fits in 12gb vram,5.0 bpw fits in 16gb of vram and so on ?

Yeah this is something I've wanted to do for awhile now and will get around to figuring out, it also does depend on context but would be nice to at least get a ballpark

My guess for this model is that you'd be able to do about 4 bpw since the 8x7 I have to run at 3.5 on a 24gb card

I have 24GB VRAM and can easily fit in the 6.5 quants of 4x7B models with 49k context (so 1,5x the original context). Hope this helps!

Sign up or log in to comment