This is the quantized version 4-bits created using autotrain, but it doesn't work. ## Error ### GPU ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62de65017e93762b858d3057/M0OoBfV1WC1QcLumyvy0L.png) ### CPU ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62de65017e93762b858d3057/ezLq3jhasIg--M-jJAMSI.png) ## Quantization Process ```py !pip install auto-gptq !pip install git+https://github.com/huggingface/optimum.git !pip install git+https://github.com/huggingface/transformers.git !pip install --upgrade accelerate ``` ```py from transformers import AutoModelForCausalLM, AutoTokenizer,GPTQConfig tokenizer = AutoTokenizer.from_pretrained("inception-mbzuai/jais-13b-chat") gptq_config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer) model = AutoModelForCausalLM.from_pretrained('inception-mbzuai/jais-13b-chat', quantization_config=gptq_config,trust_remote_code=True) ```