Thanks for quantize this model! could you further quantize it into 3.0bpw?
Hi, thanks for your quick response on this model. To fit it into 32G VRAM, it is kind of you to quantize a 3.0bpw model in exllamav2 format. Thanks in advance!
Hey @blackcat1402 , yes I can. I’ll start the job, it’ll take about an hour or two.
Would a 2.4 or 2.2 bpw fit on a 24gb card? Would love to try this.
No rush!
@DTechNation - I wouldn't recommend a quant this low, the quality will be severely degraded.
Understood. I have had mixed results with 2.3bpw lonestriker models back earlier this year. I need more VRAM for sure
@DTechNation if you’d like, I have a openwebui endpoint I run for some friends, runs the model at 7.0bpw with 90k context.
I could give you access for a week to experiment.
Chat.bigstorm.ai to signup.
Just let me know!
Closing for inactivity
@blackcat1402 The 3.0 BPW quant is uploaded! Sorry, forgot to leave a comment earlier.