which model i need to use in my single 3090 and 32gb RAM
I would run GPTQ Q4 or GGUF Q5.
With Q5_K_L.GGUF I can get all 59 layers in 24g VRAM with 16k context size on a 4090.
Nice! Thanks for the report.
· Sign up or log in to comment