Error quantizing AWQ?

#1
by Rexe - opened

@TheBloke I want a guide via AWQ quantization, I’m having issues quantizing AutoAWQForCausalLM from my code, and the issues is:

ValueError: WQLinear_GEMM(in_features=14336, out_features=4096, bias=False, w_bit=4, group_size=128) does not have a parameter or a buffer named weight.

This is the code:
!pip install --upgrade transformers
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import torch
import safetensors
import os

#Rexe/Faradaylab-aria-mistral-merge
model_path = 'Faradaylab/ARIA-7B-V3-mistral-french-v1'
quant_name = model_path.split('/')[-1] + '-AWQ'

quant_path = 'Rexe/' + quant_name
quant_config = { 'zero_point': True, 'q_group_size': 128, 'w_bit': 4 }

load model

model = AutoAWQForCausalLM.from_quantized(model_path, device_map='auto', use_safetensors=True, strict=False)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, use_fast=True)

Quantize

model.quantize(tokenizer, quant_config=quant_config)

save quantized model

model.save_quantized(quant_name, safetensors=True, shard_size='10GB')
tokenizer.save_pretrained(quant_name)

Sign up or log in to comment