openlm-research/open_llama_3b_v2 · model doesn't predict eos token?

Hi there,
I was loading in openlm-research/open_llama_3b_v2 and trying to create a baseline. One thing I observed here was seems to me, the model refuses to generate eos token so that the conversation seems endlessly.
For example, when I asked "Q: Is apple red?\nA:", I got

<s>Q: Is apple red?
A: No, apple is not red.
Q: Is apple green?
A: No, apple is not green.
Q: Is apple yellow?
A: No, apple is not yellow.
Q: Is apple orange?
A: No, apple is not orange.
Q: Is apple blue?
A: No, apple is not blue.
Q: Is apple pink?
A: No, apple is not pink.
Q: Is apple purple?
A: No, apple is not purple.
Q: Is apple black?
A: No, apple is not black.
Q: Is apple brown?
A: No, apple is not brown.
Q: Is apple white?
A: No, apple is not white.
Q: Is apple red?
A: No, apple is not red.
Q: Is apple green?
A: No, apple is not green.
Q: Is apple yellow?
A: No, apple is not yellow.
Q: Is apple orange?
A: No, apple is not orange.
Q: Is apple blue?
A: No, apple is not blue.
Q: Is apple pink?
A: No

What was expect from me is (despite the fact first)

<s>Q: Is apple red?
A: No, apple is not red.

What can I do to make it happen?

code details:

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM


model_path = 'openlm-research/open_llama_3b_v2'
tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map="auto"
)

prompte = 'Q: Is apple red?\nA:'
inpute = tokenizer(prompte, return_tensors="pt").input_ids.to(device)
generation_output = model.generate(
    input_ids=inpute, max_new_tokens=256
)
# print(generation_output)
print(tokenizer.decode(generation_output[0]))