Quantisation

#4
by davidsyoung - opened

Would it be possible to get some quantisations of Molmo uploaded - AWQ, GPTQ, etc?

I believe right now that this is the strongest VLLM model, but unfortunately it's being overshadowed by Meta Llama 3.2 multimodal release.

For uptake from the community, I feel that if there was ready formats of the model that are commonly used it would be a huge help in getting it adopted.

Thank you

I've added a BitsAndBytes NF4 quantization here: SeanScripts/Molmo-72B-0924-nf4

Does it support flash_attn 2?

Sign up or log in to comment