OOM with vllm

#48
by willowill5 - opened

OOM even on A100 80GB when deploying with

python -m vllm.entrypoints.api_server --model mistralai/Mixtral-8x7B-Instruct-v0.1 --dtype half

I have also tried flags "--max-model-len 8192" and "--gpu-memory-utilization 0.8 "

Anyone else run into this? Thanks!!

willowill5 changed discussion status to closed

Sign up or log in to comment