|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
#### Quantization config |
|
``` |
|
|
|
"zero_point": true, |
|
"q_group_size": 128, |
|
"w_bit": 4, |
|
"version": "GEMM" |
|
|
|
``` |
|
|
|
#### Script to AWQ quantization |
|
``` |
|
from awq import AutoAWQForCausalLM |
|
from transformers import AutoTokenizer |
|
|
|
model_path = 'PATH_TO Poro-34B' |
|
quant_path = 'Poro-34B-AWQ' |
|
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } |
|
|
|
# Load model |
|
model = AutoAWQForCausalLM.from_pretrained(model_path, safetensors=True) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
|
|
|
# Quantize |
|
model.quantize(tokenizer, quant_config=quant_config) |
|
|
|
# Save quantized model |
|
model.save_quantized(quant_path) |
|
tokenizer.save_pretrained(quant_path) |
|
``` |
|
|
|
|
|
|
|
#### Work supported by https://datacrunch.io/ |
|
##### Quantized by: gradjitta |