how to load and use model?

#1
by Q4234 - opened

I tried

from transformers import AutoModelForCausalLM, AutoTokenizer
# from https://huggingface.co/facebook/opt-30b

modelName = "mit-han-lab/opt-13b-smoothquant" # 8-bit quantized model

model = AutoModelForCausalLM.from_pretrained(modelName, torch_dtype=torch.int8).cuda()
tokenizer = AutoTokenizer.from_pretrained(modelName, use_fast=False)

but this doesn't work...

MIT HAN Lab org

Hi, please refer to https://github.com/mit-han-lab/smoothquant#smoothquant-int8-inference-for-pytorch to see how to use those models. We haven't integrated our INT8 kernels into huggingface.

Sign up or log in to comment