Quantization not recognized, even when building VLLM from source

#1
by willowill5 - opened

Looking forward to this one getting tested!

Yeah unfortunately vLLM only supports AWQ with Llama models.

But Hugging Face Text Generation Inference (TGI) recently added AWQ support, you could try it with that - I think their AWQ support may be for all models.

Or you could try my Falcon 180B GPTQ release with TGI, that definitely works - we tested that when I first released the 180B GPTQs.

awesome thank you!!

Sign up or log in to comment