yam-peleg/Hebrew-Gemma-11B-Instruct · Question about example / configuration

Hi Yam,

First of all, I want to thank you for this amazing contribution! I am looking forward to getting the most out of it.

I am doing a test run however I am getting some unstable responses. I think I might have to configure or use it differently.

This is my code:
'''
import torch
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yam-peleg/Hebrew-Gemma-11B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, device_map="cuda")
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cuda", quantization_config=quantization_config)

chat = [
{ "role": "user", "content": "היי מה שלומך?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

outputs = model.generate(
tokenizer(prompt, return_tensors="pt").input_ids,
max_length=100,
do_sample=True,
top_p=0.95,
)
print(tokenizer.decode(outputs[0]))
'''

This is the output I get:
user
היי מה שלומך?
model
אני יכול להגיב לטקסט "היי, מה שלומך? " ונתן תשובה מעמיקה.

היה לי שפע של היתרונות מאז הייתי עם השירותים שלך, ואני רוצה להשיג את המירב מהימים הבאים כדי לספק שירותים מעולים לאנשים כמוך. אני שמח שהיו לך כמה חו

What do you think?

If there is a change needed I can add it as an example to the Model Card and help with documentation :)