Quantisation
#4
by
davidsyoung
- opened
Would it be possible to get some quantisations of Molmo uploaded - AWQ, GPTQ, etc?
I believe right now that this is the strongest VLLM model, but unfortunately it's being overshadowed by Meta Llama 3.2 multimodal release.
For uptake from the community, I feel that if there was ready formats of the model that are commonly used it would be a huge help in getting it adopted.
Thank you
Does it support flash_attn 2?