Edit model card

This model has been quantized using GPTQModel.

  • bits: 4
  • group_size: 128
  • desc_act: true
  • static_groups: false
  • sym: true
  • lm_head: false
  • damp_percent: 0.0025
  • true_sequential: true
  • model_name_or_path: ""
  • model_file_base_name: "model"
  • quant_method: "gptq"
  • checkpoint_format: "gptq"
  • meta
    • quantizer: "gptqmodel:0.9.9-dev0"

Here is an example:

from transformers import AutoTokenizer
from gptqmodel import GPTQModel

model_name = "ModelCloud/Mistral-Large-Instruct-2407-gptq-4bit"

prompt = [{"role": "user", "content": "I am in Shanghai, preparing to visit the natural history museum. Can you tell me the best way to"}]

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = GPTQModel.from_quantized(model_name)

input_tensor = tokenizer.apply_chat_template(prompt, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_ids=input_tensor.to(model.device), max_new_tokens=100)
result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)

print(result)
Downloads last month
625
Safetensors
Model size
17.1B params
Tensor type
I32
·
BF16
·
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.