Question about example / configuration
Hi Yam,
First of all, I want to thank you for this amazing contribution! I am looking forward to getting the most out of it.
I am doing a test run however I am getting some unstable responses. I think I might have to configure or use it differently.
This is my code:
'''
import torch
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "yam-peleg/Hebrew-Gemma-11B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, device_map="cuda")
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", quantization_config=quantization_config)
chat = [
{ "role": "user", "content": "ืืื ืื ืฉืืืื?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
outputs = model.generate(
tokenizer(prompt, return_tensors="pt").input_ids,
max_length=100,
do_sample=True,
top_p=0.95,
)
print(tokenizer.decode(outputs[0]))
'''
This is the output I get:
user
ืืื ืื ืฉืืืื?
model
ืื ื ืืืื ืืืืื ืืืงืกื "ืืื, ืื ืฉืืืื? " ืื ืชื ืชืฉืืื ืืขืืืงื.
ืืื ืื ืฉืคืข ืฉื ืืืชืจืื ืืช ืืื ืืืืชื ืขื ืืฉืืจืืชืื ืฉืื, ืืื ื ืจืืฆื ืืืฉืื ืืช ืืืืจื ืืืืืื ืืืืื ืืื ืืกืคืง ืฉืืจืืชืื ืืขืืืื ืืื ืฉืื ืืืื. ืื ื ืฉืื ืฉืืื ืื ืืื ืื
What do you think?
If there is a change needed I can add it as an example to the Model Card and help with documentation :)