vllm only supports gptq or awq format.gptq is much faster than awq.
thanks a lot if there will be a llama-3-70B-instruct-uncensored-gptq-int4.
I'll think about it, thank you.
· Sign up or log in to comment