running the model in Python

#3
by Asaf-Yehudai - opened

What is the minimal piece of code I can run to get similar results to the demo?

I started with that, but it is still not there yet. Do you have any idea?

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("eachadea/vicuna-13b-1.1")
model = AutoModelForCausalLM.from_pretrained("eachadea/vicuna-13b-1.1")
system="A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions."

def text_raper(text):
return system + f"\nUSER: {text}\nASSISTANT:"

input_text = "I love reading books"
input_text = text_raper(input_text)

input_ids = tokenizer.encode(input_text, return_tensors='pt')
output = model.generate(input_ids, max_length=1024, do_sample=True, temperature=0.7)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Aside from using fastchat, you should just copy the parameters the demo uses as well as the prompt template.

Okay, thanks.
Indeed update the prompt now, and I get similar results to the demo. (see above)
Thanks for uploading the model! :)

Asaf-Yehudai changed discussion status to closed

Sign up or log in to comment