Fastest way for inference?
#28
by
psycy
- opened
I have fine tuned this model, however there's no support on vllm to run this model.
What's the best way to use this in prod?
psycy
changed discussion title from
Fastest way to inference?
to Fastest way for inference?
Not really, so the support is for the non-hf version that is - mistralai/Pixtral-12B-2409
However, the HF implemented Pixtral in Transformers, use a different Llava model structure. https://huggingface.co/mistral-community/pixtral-12b
which isn't supported on vLLM due to
However, I found a script that seems to be working in another thread.
https://github.com/vllm-project/vllm/issues/8685
Thanks
Yeah I think vllm team is working to make sure both are compatible!