Qwen/Qwen2-Audio-7B-Instruct · Is vLLM support planned?

RonanMcGovern

Nov 15, 2024

Would be great.

Nice work on this model.

yansh97

Nov 16, 2024

vLLM has supported it in 0.6.4.

RonanMcGovern

Nov 19, 2024

Excellent, many thanks

RonanMcGovern changed discussion status to closed Nov 19, 2024

RonanMcGovern

Nov 25, 2024

I'm running vllm via docker with 0.6.4.post1 and getting this error:

ValueError: Model architectures ['Qwen2AudioForConditionalGeneration'] failed to be inspected. Please check the logs for more details.

I'm passing these arguments:

--served-model-name Qwen2-Audio-7B-Instruct --model Qwen/Qwen2-Audio-7B-Instruct --port 8000 --trust_remote_code --gpu_memory_utilization 0.98 --enforce_eager --limit_mm_per_prompt audio=5

I also ssh'd into the pod and can confirm that transformers 4.46.2 and vllm 0.6.4.post1+cu124 are installed (I'm running on runpod on an A40). One click template is here.

I also posted under github issues here.