Quantisation

by davidsyoung - opened Sep 27

Sep 27

Would it be possible to get some quantisations of Molmo uploaded - AWQ, GPTQ, etc?

I believe right now that this is the strongest VLLM model, but unfortunately it's being overshadowed by Meta Llama 3.2 multimodal release.

For uptake from the community, I feel that if there was ready formats of the model that are commonly used it would be a huge help in getting it adopted.

Thank you

SeanScripts

Sep 29

I've added a BitsAndBytes NF4 quantization here: SeanScripts/Molmo-72B-0924-nf4

emanuelevivoli

1 day ago

Does it support flash_attn 2?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment