how to load and use model?

by Q4234 - opened

I tried

from transformers import AutoModelForCausalLM, AutoTokenizer
# from

modelName = "mit-han-lab/opt-13b-smoothquant" # 8-bit quantized model

model = AutoModelForCausalLM.from_pretrained(modelName, torch_dtype=torch.int8).cuda()
tokenizer = AutoTokenizer.from_pretrained(modelName, use_fast=False)

but this doesn't work...

MIT HAN Lab org

Hi, please refer to to see how to use those models. We haven't integrated our INT8 kernels into huggingface.

Sign up or log in to comment