I am running in vllm 0.4.1 with 4 x gpus 24gb (A10G 24gb) = 96gb and eager mode and I am still out of memory, how? it should fit (like 87gb vram)
#3
by
orel12
- opened
@jarrelscy
wonder what to do.
Reduce your max seq len to 1 and max model len to 1024 and see.