Model used as RAG generates questions with answer instead of just answer to user's query
#151
by
myke11j
- opened
New to building RAG, so maybe a beginner's question.
I'm using Llama-3.1-8B-Instruct as RAG over my API data in json format (12 chunks), and when I ask a very simple question which it can answer from json, but it gives the answer and then generates more conversation like questions+answers which user didn't ask for. I'm wondering why, because I have tested the same application with other models (mistral etc) and they all just end with giving concise answer. I'm using same config and prompt for models I tested with.
My pipeline looks like
pipeline = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=540,
temperature=0.03,
top_p=0.95,
repetition_penalty=1.15,
streamer=streamer,
)
and System prompt clearly says
....
Answer concisely in 200-400 characters, or 5-10 words when appropriate.
Provide a single, clear response.
Do not add additional questions after giving the answer to query.
This is how the response looks like when I asked a single question, I'm replacing questions and answers with placeholders
<<USR>>
{Q1}
[/USR] <<INST>]>
{Ans 1}. Would you like more info?
[/INST] <<USR>>
{Q2}
[/USR] <<INST>]>
{Ans 2}. Let me know if you need further assistance!
[/INST] <<USR>>
{Q3}
[/USR] <<INST>]>
{Ans 3}
[/INST]
Happy to share more information if needed