Can I request miquella 6bpw, 8bpw?

#1
by Perpetuity7 - opened

Miquella is best merged model!

It'll require a lot of VRAM to run at 6bpw and 8bpw. I'll add it to the list to quantize. Didn't think anyone would have 90+ GB VRAM to run.

6.0bpw uploading (it'll take a while): https://huggingface.co/models?search=LoneStriker/miquella-120b
8.0bpw to follow afterward.

thank you so much!
I expect higher bpw models will show better results.
For the 120b model, exl2 seems to be the only option for practical inference speed.

Agreed on the speed. If you add speculative decoding with exui or other supported frontend, the inference speed is actually fast (for a 120B). 8.0bpw model is uploading now, had a glitch with the previous quant and had to rerun it.

Sign up or log in to comment