out of memory

#3
by LiMuyi - opened

My GPU is NVIDIA GeForce RTX 4090 with 24GB, but when I load this model, it's out of memory. I already set cache_8bit==True and use Exllamav2_HF as loader. (My GPU is idle and no other models is running.)

Owner

Hi I'm not getting OOMs but sometimes responses are slow, because this quant uses also shared GPU memory. For now, you could try to reduce context size to 2k, soon I'l upload 2.8 bpw quant - that one fits perfectly on my 3090.

Sign up or log in to comment