What kind of GPU need to run this model locally on-prem ?

#8
by eliastick - opened

I'd like to run this model on-premise . What hardware and GPU I need . Thank you

@eliastick without quantization, you would require roughly 16gb vram for it? Any gpu with 16gb vram or more should be fine enough.

with quantization, you would require 5gb vram so any 6gb vram+ gpu should work. I would recommend using llama.cpp and possibly exllamav2 now.

Sign up or log in to comment