Quantization not recognized, even when building VLLM from source

by willowill5 - opened Oct 14, 2023

Discussion

willowill5

Oct 14, 2023

Looking forward to this one getting tested!

TheBloke

Owner Oct 14, 2023

Yeah unfortunately vLLM only supports AWQ with Llama models.

But Hugging Face Text Generation Inference (TGI) recently added AWQ support, you could try it with that - I think their AWQ support may be for all models.

Or you could try my Falcon 180B GPTQ release with TGI, that definitely works - we tested that when I first released the 180B GPTQs.

willowill5

Oct 19, 2023

awesome thank you!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment