Fastest way for inference?

#28
by psycy - opened

I have fine tuned this model, however there's no support on vllm to run this model.
What's the best way to use this in prod?

psycy changed discussion title from Fastest way to inference? to Fastest way for inference?
Unofficial Mistral Community org

Not really, so the support is for the non-hf version that is - mistralai/Pixtral-12B-2409
However, the HF implemented Pixtral in Transformers, use a different Llava model structure. https://huggingface.co/mistral-community/pixtral-12b which isn't supported on vLLM due to

However, I found a script that seems to be working in another thread.

https://github.com/vllm-project/vllm/issues/8685

Thanks

Unofficial Mistral Community org

Yeah I think vllm team is working to make sure both are compatible!

Sign up or log in to comment