Is vLLM support planned?

#10
by RonanMcGovern - opened

Would be great.

Nice work on this model.

vLLM has supported it in 0.6.4.

Excellent, many thanks

RonanMcGovern changed discussion status to closed

I'm running vllm via docker with 0.6.4.post1 and getting this error:

ValueError: Model architectures ['Qwen2AudioForConditionalGeneration'] failed to be inspected. Please check the logs for more details.

I'm passing these arguments:

--served-model-name Qwen2-Audio-7B-Instruct --model Qwen/Qwen2-Audio-7B-Instruct --port 8000 --trust_remote_code --gpu_memory_utilization 0.98 --enforce_eager --limit_mm_per_prompt audio=5

I also ssh'd into the pod and can confirm that transformers 4.46.2 and vllm 0.6.4.post1+cu124 are installed (I'm running on runpod on an A40). One click template is here.

I also posted under github issues here.

Sign up or log in to comment