I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram)

#3
by orel12 - opened

@jarrelscy
wonder what to do.

Reduce your max seq len to 1 and max model len to 1024 and see.

Sign up or log in to comment