Model Information

Quantized version of iGeniusAI/Italia-9B-Instruct-v0.1 using torch.float32 for quantization tuning.

  • 8 bits (INT8)
  • group size = 64
  • Asymmetrical Quantization
  • Method WoQ (AutoRound format)

Quantization framework: Intel AutoRound v0.4.6

Note: this INT8 version of Italia-9B-Instruct-v0.1 has been quantized using NVIDIA CUDA libraries.

Replication Recipe

Step 1 Install Requirements

I suggest to install requirements into a dedicated python-virtualenv or a conda enviroment.

wget https://github.com/intel/auto-round/archive/refs/tags/v0.4.6.tar.gz
tar -xvzf v0.4.6.tar.gz
cd auto-round-0.4.6
pip install -r requirements.txt --upgrade

Step 2 Build Intel AutoRound wheel from sources

pip install -vvv --no-build-isolation -e .

Step 3 Script for Quantization

  from transformers import AutoModelForCausalLM, AutoTokenizer, GPTNeoXModel
  model_name = "iGeniusAI/Italia-9B-Instruct-v0.1"
  model = GPTNeoXModel.from_pretrained(model_name, trust_remote_code=True)
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  from auto_round import AutoRound
  bits, group_size, sym, device, amp = 8, 64, False, 'auto', False
  autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym, device=device, amp=amp)
  autoround.quantize()
  output_dir = "./AutoRound/iGeniusAI_Italia-9B-Instruct-v0.1-autoround-int8-gs64-auto-asym"
  autoround.save_quantized(output_dir, format='auto_round', inplace=True)

Note: the GPTNeoXSdpaAttention class is deprecated in favor of simply modifying the config._attn_implementationattribute of the GPTNeoXAttention class. So this require transformers<4.48.

License

MIT

Disclaimer

This quantized model comes with no warranty. It has been developed only for research purposes.

Downloads last month
11
Safetensors
Model size
2.42B params
Tensor type
F32
I32
FP16
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Model tree for fbaldassarri/iGeniusAI_Italia-9B-Instruct-v0.1-autoround-int8-gs64-auto-sym

Quantized
(24)
this model