Inference speed extremely slow
#2
by
Borko24
- opened
It seems that when I load 'gptq-4bit-32g-actorder_True' the inference speed is very slow. For reference I am using A10 GPU with 24GB VRAM. The context is that I am conducting experiments with it. Is it because I have to install autogptq from source. I have found an open issue from a year ago about being slow when using the pre-built wheels. It should be faster than what is currently. It takes more than an hour to complete the tasks from the HumanEval dataset.