Magnum-v4-123b HQQ
This repo contains magnum-v4-123b quantized to 4-bit precision using HQQ.
HQQ provides a similar level of precision to AWQ at 4-bit, but with no need for calibration.
This quant was generated using 8xA40s within only 10 minutes.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, HqqConfig
model_path = "anthracite-org/magnum-v4-123b"
quant_config = HqqConfig(nbits=4, group_size=128, axis=1)
model = AutoModelForCausalLM.from_pretrained(model_path,
torch_dtype=torch.float16,
cache_dir='.',
device_map="cuda:0",
quantization_config=quant_config,
low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
output_path = "magnum-v4-123b-hqq-4bit"
model.save_pretrained(output_path)
tokenizer.save_pretrained(output_path)
Inference
You can perform inference directly with transformers, or using aphrodite:
pip install aphrodite-engine
aphrodite run alpindale/magnum-v4-123b-hqq-4bit -tp 2
- Downloads last month
- 18
Model tree for alpindale/magnum-v4-123b-hqq-4bit
Base model
anthracite-org/magnum-v4-123b