Can I request miquella 6bpw, 8bpw?

by Perpetuity7 - opened Feb 2

Discussion

Perpetuity7

Feb 2

Miquella is best merged model!

LoneStriker

Owner Feb 2

It'll require a lot of VRAM to run at 6bpw and 8bpw. I'll add it to the list to quantize. Didn't think anyone would have 90+ GB VRAM to run.

LoneStriker

Owner Feb 2

6.0bpw uploading (it'll take a while): https://huggingface.co/models?search=LoneStriker/miquella-120b
8.0bpw to follow afterward.

Perpetuity7

Feb 2

thank you so much!
I expect higher bpw models will show better results.
For the 120b model, exl2 seems to be the only option for practical inference speed.

LoneStriker

Owner Feb 2

Agreed on the speed. If you add speculative decoding with exui or other supported frontend, the inference speed is actually fast (for a 120B). 8.0bpw model is uploading now, had a glitch with the previous quant and had to rerun it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment