How did you do awq quantization?

#1
by KnutJaegersberg - opened

I tried on my llamafied fine tune, but it didn't work.

also is this perhaps more the 6b model?

I tried on my llamafied fine tune, but it didn't work.

Hi! Maintainer of AutoAWQ just added support for Yi (https://github.com/casper-hansen/AutoAWQ/pull/167) two days ago. I tried to do it myself, but faced some floating bugs with Transformers and let it go... But now it should be fine to quantize/load AWQ models with Yi architecture. I've used Transformers v4.35 - everything works stable (excl. layers fusion - I've noticed there some issues, related to AutoAWQ caching for the last time, so just don't use layers fusion).

also is this perhaps more the 6b model?

Unfortunately (probably, fortunately :-))), it is exactly 34B-200K model. :-)
The 6B-AWQ model takes about 4 gigabytes - almost five times less than this one.

Here is 6B by Casper Hansen (maintainer of AutoAWQ) for comparison: https://huggingface.co/casperhansen/yi-6b-awq

Thanks sharing!
Guessed 6b because the parameter count on the model card says 5.4B params

KnutJaegersberg changed discussion status to closed

Sign up or log in to comment