Qwen/Qwen-VL-Chat · Few shot learning with model

I tried using the model using few shot, but I got bad results.
My task is to generate descriptions for images from a specific domain. In order for it to understand the domain better I give it pairs of images-textual descriptions, and finaly an image with a query to describe it.
It does give me a description but it seems that it doesn't understand the connection between the image-text of the examples. Rather as just general descriptions and images to use phrases from.
As far as I saw the api doesn't have any way of giving the examples as "instructions" similar to how gpt allows. What is the correct way to do this process?

Here is the code I tried:

def build_query_messages_for_qwen(context, examples_lst, query_dict):

  # ------ Build context
  context_dict = [{"text": context}]

  # ------ Build examples dicts
  examples_messages_lst = list()
  for ii, example_dict in enumerate(examples_lst):

      examples_messages_lst += [
           {"text": f"Example {ii+1} image:"},
           {'image': f"{example_dict['pic_path']}"},
           {"text": f"Example {ii+1} description:"},
           {"text": example_dict['golden_response']},
           {"text": "/n"},
           
           
      ]

  # ------ Build query
  query_message = [
      {'image': f"{query_dict['pic_path']}"},
      {"text": query_dict["query_text"]}
    ]

  messages = context_dict + examples_messages_lst + query_message

  return messages


tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-VL-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-VL-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()

query = tokenizer.from_list_format(messages)
response_txt, _ = model.chat(tokenizer, query=query, history=None)