Fastest way for inference?

#28

by psycy - opened 1 day ago

Discussion

psycy

1 day ago

•

edited 1 day ago

I have fine tuned this model, however there's no support on vllm to run this model.
What's the best way to use this in prod?

psycy changed discussion title from Fastest way to inference? to Fastest way for inference? 1 day ago

nielsr

Unofficial Mistral Community org 1 day ago

There's support, see https://docs.vllm.ai/en/v0.6.4/getting_started/examples/offline_inference_pixtral.html

psycy

1 day ago

Not really, so the support is for the non-hf version that is - mistralai/Pixtral-12B-2409
However, the HF implemented Pixtral in Transformers, use a different Llava model structure. https://huggingface.co/mistral-community/pixtral-12b which isn't supported on vLLM due to

However, I found a script that seems to be working in another thread.

https://github.com/vllm-project/vllm/issues/8685

Thanks

ArthurZ

Unofficial Mistral Community org 1 day ago

Yeah I think vllm team is working to make sure both are compatible!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment