AWQ script

#1
by prudant - opened

Hi, can you share your script to create de awq model? I'm trying to quant another finetune of qwen2, the process complete without errors, but the model dont work when trying to use it =/

Owner

are you using AutoAWQ? https://github.com/casper-hansen/AutoAWQ
what library are you using for inference? I'm using vLLM https://docs.vllm.ai/

yes, I'm using auto AWQ, the patched version(fix the scale problems) from one otf the qwen contributors, and for engine i 'm using Aphrodite engine, is a very derivated work from vllm, but has more throughput over awq and gptq

Owner

I'll give it a try to Aphrodite.
have you tried vLLM? it is fast running in an NVIDIA A100 (80GB) even when they say that AWQ support is under-optimized at the moment...

not tried because Aphrodite has some features that vllm dont have (yet), but I give it a tried anyway. But the main point its never has a problem with awq and Aphoridte till now, and its pretty weird

@erickbp but back to the original question, do you used the https://github.com/casper-hansen/AutoAWQ for quant the qwen2 ? or do you has to customize something in order to get a working the quant.

Owner

No, I did not quantize this one myself due to hardware limitation. but yes, I have used AutoAWQ for all the other models that I have quantized. I have used the Azure VM "ND96asr A100 v4" for it ( 8 x A100). Is there any particular error or something that you are facing? I can give it a try but I don't want to setup that expensive VM and all that if you can tell already that it is going to fail.

already solved, autoawq did not support native qwen, after last commits now it support it... that was the problem!

Owner

glad to hear! Thanks for letting me know.

Sign up or log in to comment