ISTA-DASLab/Llama-2-7b-AQLM-2Bit-8x8-hf · Upgrade format of this model?

Mar 1, 2024

Hello Andrei, I work for NeuralMagic and I'm adding AQLM support to vLLM in an upcoming PR. Your llama 2 7b 1x16 and 2x8 models have no custom code and a quantization_config block in the config.json which is perfect. I'm able to run those models (and a tiny llama2 you have as well) end to end with no problems.

But this model, and the rest referenced in the readme have what look like an older format with a custom aqlm block in the config.json and custom code, making them not readable by vLLM. I was wondering, do you have plans to update those to the same standard as the first two? Or is that something I could try to do with a PR (if it's just a question of changing the config.json and removing the custom code.)

Thanks, -James

BlackSamorez

IST Austria Distributed Algorithms and Systems Lab org Mar 1, 2024

Indeed, I missed this model when updating checkpoints.
I've updated the format.
Thanks!

BlackSamorez changed discussion status to closed Mar 1, 2024

jaemzfleming

Mar 1, 2024

Thank you!