FYI: 11GB VRAM for 8k context with ExLlama @ ~29 tokens/s

#3
by gardner - opened
This comment has been hidden

This was meant to be posted in the GPTQ repo.

gardner changed discussion status to closed

Sign up or log in to comment