Jun 29, 2024

The LLM just generated one new token (?) and stopped . It is behaving like this repeatedly.
Have I missed something from my side ?

Load model directly

import flash_attn
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
phi_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
%%time
device = 'cuda'
phi_model.to(device)

def get_ids_sigIn_password_email(get_ids_sigIn_pass_email_prompt):
inputs = tokenizer.encode(get_ids_sigIn_pass_email_prompt, return_tensors="pt").to(device)

Find the index where the generated tokens start

input_length = len(tokenizer.encode(get_ids_sigIn_pass_email_prompt))
print(f"input_length -> {input_length}")

Generate a response

phi_model.eval()
outputs = phi_model.generate(inputs, max_new_tokens= 500)
print(f"shape of output is {outputs[0].shape}")
full_decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

decoded_output = full_decoded_output
return decoded_output
out_f2=get_ids_sigIn_password_email("write a paragraph about What is the meaning of life")
print(out_f2)

input_length -> 11
shape of output is torch.Size([13])
write a paragraph about What is the meaning of life?

rajiv-data-chef

Jun 29, 2024

I solved it using the prompt style mentioned in the documentation.
All I had to do was to use tokens <|user|> and <|assistant|> in such manner :
prompt_our_side = f"""<|user|>
{prompt}
<|assistant|>
"""

amgadhasan

Jul 22, 2024

I solved it using the prompt style mentioned in the documentation.
All I had to do was to use tokens <|user|> and <|assistant|> in such manner :
prompt_our_side = f"""<|user|>
{prompt}
<|assistant|>
"""

Please close this issue as it has been resolved.

microsoft
/

Phi-3-mini-4k-instruct

The model stops after generating one new token

The LLM just generated one new token (?) and stopped . It is behaving like this repeatedly.
Have I missed something from my side ?

Load model directly

Find the index where the generated tokens start

Generate a response

decoded_output = full_decoded_output
return decoded_output
out_f2=get_ids_sigIn_password_email("write a paragraph about What is the meaning of life")
print(out_f2)

The model stops after generating one new token

The LLM just generated one new token (?) and stopped . It is behaving like this repeatedly.Have I missed something from my side ?

Load model directly

Find the index where the generated tokens start

Generate a response

decoded_output = full_decoded_output return decoded_outputout_f2=get_ids_sigIn_password_email("write a paragraph about What is the meaning of life")print(out_f2)

The LLM just generated one new token (?) and stopped . It is behaving like this repeatedly.
Have I missed something from my side ?

decoded_output = full_decoded_output
return decoded_output
out_f2=get_ids_sigIn_password_email("write a paragraph about What is the meaning of life")
print(out_f2)