Lack of <tool_call> XML tags in response
Hi there!
I've been experimenting with this model in regards to function calling and I've been having issues replicating tool call responses in the format that is given on the model card.
No matter in what ways I adjust the prompt and add further instructions to add the XML tags, the model just isn't using them.
As for the example given on the model card, I am getting the following response from the model
Raw input text
<|start_header_id|>system<|end_header_id|>
You are a function calling AI model. You are provided with function signatures within <tools></tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. For each function call return a json object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"name": <function-name>,"arguments": <args-dict>}
</tool_call>
Here are the available tools:
<tools> {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"properties": {
"location": {
"description": "The city and state, e.g. San Francisco, CA",
"type": "string"
},
"unit": {
"enum": [
"celsius",
"fahrenheit"
],
"type": "string"
}
},
"required": [
"location"
],
"type": "object"
}
} </tools><|eot_id|><|start_header_id|>user<|end_header_id|>
What is the weather like in San Francisco in celcius?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Generated response
\n{\"id\": 0, \"name\": \"get_current_weather\", \"arguments\": {\"location\": \"San Francisco, CA\", \"unit\": \"celsius\"}}\n
The model is being run via vLLM within a Triton Inference Server on version 24.06, with the following configuration:
config.pbtxt
backend: "vllm"
instance_group [
{
count: 1,
kind: KIND_MODEL
}
]
model.json
{
"model":"/llama-3-groq-8b-tool-use-hf/",
"disable_log_requests": "true",
"gpu_memory_utilization": 0.9,
"enforce_eager": "false",
"tensor_parallel_size": 2,
"disable_custom_all_reduce": "true"
}
I've also tested this with the Transformers python library in the following manner:
NOTE: instruct.txt
contains the same system instructions as above
import torch
import transformers
class Llama3:
def __init__(self, model_path):
self.model_id = model_path
self.pipeline = transformers.pipeline(
"text-generation",
model=self.model_id,
model_kwargs={"torch_dtype": torch.float16},
device=5
)
self.terminators = [
self.pipeline.tokenizer.eos_token_id,
self.pipeline.tokenizer.convert_tokens_to_ids(""),
]
def get_response(
self, query, message_history=[], max_tokens=4096, temperature=0.6, top_p=0.9
):
user_prompt = message_history + [{"role": "user", "content": query}]
prompt = self.pipeline.tokenizer.apply_chat_template(
user_prompt, tokenize=False, add_generation_prompt=True
)
outputs = self.pipeline(
prompt,
max_new_tokens=max_tokens,
eos_token_id=self.terminators[0],
do_sample=True,
temperature=temperature,
top_p=top_p,
)
response = outputs[0]["generated_text"][len(prompt):]
return response, user_prompt + [{"role": "assistant", "content": response}]
def chatbot(self, system_instructions=""):
conversation = [{"role": "system", "content": system_instructions}]
while True:
user_input = input("User: ")
if user_input.lower() in ["exit", "quit"]:
print("Exiting the chatbot. Goodbye!")
break
response, conversation = self.get_response(user_input, conversation)
print(f"Assistant: {response}")
if __name__ == "__main__":
with open('instruct.txt', 'r') as file:
data = file.read().replace('\n', '')
bot = Llama3("/llama-3-groq-8b-tool-use-hf/")
bot.chatbot(system_instructions=data)
Upon running the script with the example input, I get the same response as above:
User: What is the weather like in San Francisco in celcius?
Assistant: {"id": 0, "name": "get_current_weather", "arguments": {"location": "San Francisco, CA", "unit": "celsius"}}
You might be missing configuration options that make the generation call output the special tool use tokens. In vLLM it’s skip_special_tokens
and that needs to be set to False
. The tool related XML tags are in the vocab as dedicated tokens.
Ahhh, that's it! Thank you so much. And an even bigger thanks for such an awesome model!