--- tags: - gptq - 4bit - gptqmodel - modelcloud - llama-3.1 - 8b - instruct --- This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel). - **bits**: 4 - **group_size**: 128 - **desc_act**: true - **static_groups**: false - **sym**: true - **lm_head**: false - **damp_percent**: 0.01 - **true_sequential**: true - **model_name_or_path**: "" - **model_file_base_name**: "model" - **quant_method**: "gptq" - **checkpoint_format**: "gptq" - **meta**: - **quantizer**: "gptqmodel:0.9.9-dev0" **Here is an example:** ```python from transformers import AutoTokenizer from gptqmodel import GPTQModel model_name = "ModelCloud/Meta-Llama-3.1-8B-Instruct-gptq-4bit" prompt = [{"role": "user", "content": "I am in Shanghai, preparing to visit the natural history museum. Can you tell me the best way to"}] tokenizer = AutoTokenizer.from_pretrained(model_name) model = GPTQModel.from_quantized(model_name) inputs = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True) outputs = model.generate(prompts=inputs, temperature=0.95, max_length=128) print(outputs[0].outputs[0].text) ```