How to make the model decide when to not use tools/call functions and provide normal chat response?

by jeril - opened Jul 23, 2024

Jul 23, 2024

I deployed your model using TGI (Huggingface). The model is able to provide responses when the question is related to tool calling. Following was the code used:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="functionary")

response = client.chat.completions.create(
    model="meetkai/functionary-small-v2.5",
    messages=[{"role": "user",
            "content": "What is the weather for Istanbul?"}
    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

I am also getting the following output:

ChatCompletion(id='', choices=[Choice(finish_reason='eos_token', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments={'location': 'Istanbul'}, name='openweathermap', description=None), type='function')]))], created=1721757239, model='meetkai/functionary-small-v2.5', object='chat.completion', service_tier=None, system_fingerprint='2.1.2-dev0-sha-9935720', usage=CompletionUsage(completion_tokens=23, prompt_tokens=19, total_tokens=42))

But when I changed the question to:

What is 4 + 4?

I was expecting the answer 8, but I got the following error:

Traceback (most recent call last):
  File "/eph/nvme0/azureml/cr/j/5fab296a307947099e421be9e45eb265/exe/wd/logs/test2.py", line 5, in <module>
    response = client.chat.completions.create(
  File "/opt/conda/lib/python3.10/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 646, in create
    return self._post(
  File "/opt/conda/lib/python3.10/site-packages/openai/_base_client.py", line 1266, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/opt/conda/lib/python3.10/site-packages/openai/_base_client.py", line 942, in request
    return self._request(
  File "/opt/conda/lib/python3.10/site-packages/openai/_base_client.py", line 1046, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.UnprocessableEntityError: Error code: 422 - {'error': 'Tool error: No function found in generated text', 'error_type': 'tool_error'}

Could you please help me understand how to make the model decide when to not use tools/call functions and provide normal chat response?

jeffreymeetkai

MeetKai org Jul 26, 2024

Hi, can you provide more details of your dependencies so I can reproduce this error? I tried What is 4 + 4? and it worked totally fine

ChatCompletion(
    id='cmpl-fbcff334c54f406983371d64ab9bbe8f',
    choices=[
        Choice(
            finish_reason='stop',
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content='4 + 4 = 8',
                role='assistant',
                function_call=None,
                tool_calls=None,
                tool_call_id=None,
                name=None
            )
        )
    ],
    created=1721962674,
    model='meetkai/functionary-small-v2.5',
    object='chat.completion',
    service_tier=None,
    system_fingerprint=None,
    usage=CompletionUsage(completion_tokens=8, prompt_tokens=118, total_tokens=126)
)

jeril

Jul 30, 2024

Hi,

Thank you for the reply.
I used the following docker image:

ghcr.io/huggingface/text-generation-inference:sha-9935720

Then I started their inference server using the following command:

text-generation-launcher --model-id meetkai/functionary-small-v2.5 --num-shard 1 --port 8080

Then I tried the script that was initially shared. It worked fine for What is the weather for Istanbul?, whereas it gave the previously mentioned error for What is 4 + 4?.
How did you call the model for this question What is 4 + 4?? Did you pass toolsand tool_choice parameters? Because when I call the model without tools and tool_choice, I get the same response as you, and I get the mentioned error while passing the tools and tool_choice parameters. If you don’t mind, can you please help by sharing the script that you used for testing?

The following is the output of my pip freeze:

accelerate==0.29.3
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.4.0
archspec @ file:///home/conda/feedstock_root/build_artifacts/archspec_1708969572489/work
async-timeout==4.0.3
attrs==23.2.0
bitsandbytes==0.43.1
boltons @ file:///home/conda/feedstock_root/build_artifacts/boltons_1711936407380/work
Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1695989787169/work
certifi==2024.7.4
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1696001684923/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1698833585322/work
click==8.1.7
cloudpickle==3.0.0
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1666700638685/work
conda @ file:///home/conda/feedstock_root/build_artifacts/conda_1715631919917/work
conda-libmamba-solver @ file:///home/conda/feedstock_root/build_artifacts/conda-libmamba-solver_1706566000184/work/src
conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1691048088238/work
conda_package_streaming @ file:///home/conda/feedstock_root/build_artifacts/conda-package-streaming_1691009212940/work
datasets==2.20.0
Deprecated==1.2.14
dill==0.3.8
diskcache==5.6.3
distro @ file:///home/conda/feedstock_root/build_artifacts/distro_1704321475663/work
einops==0.6.1
exceptiongroup==1.2.2
filelock @ file:///home/conda/feedstock_root/build_artifacts/filelock_1719088281970/work
frozendict @ file:///home/conda/feedstock_root/build_artifacts/frozendict_1715092766944/work
frozenlist==1.4.1
fsspec==2024.5.0
gmpy2 @ file:///home/conda/feedstock_root/build_artifacts/gmpy2_1715527283764/work
googleapis-common-protos==1.63.2
grpc-interceptor==0.15.4
grpcio==1.65.1
grpcio-reflection==1.62.2
grpcio-status==1.62.2
grpcio-tools==1.62.2
h11==0.14.0
hf_transfer==0.1.6
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.23.5
idna==3.7
importlib_metadata==7.1.0
interegular==0.3.3
Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1715127149914/work
joblib==1.4.2
jsonpatch @ file:///home/conda/feedstock_root/build_artifacts/jsonpatch_1695536281965/work
jsonpointer @ file:///home/conda/feedstock_root/build_artifacts/jsonpointer_1695397238043/work
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
lark==1.1.9
libmambapy @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1711394305528/work/libmambapy
llvmlite==0.43.0
loguru==0.6.0
mamba @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1711394305528/work/mamba
MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1706899921127/work
menuinst @ file:///home/conda/feedstock_root/build_artifacts/menuinst_1705068285533/work
mpmath @ file:///home/conda/feedstock_root/build_artifacts/mpmath_1678228039184/work
multidict==6.0.5
multiprocess==0.70.16
mypy-protobuf==3.6.0
nest-asyncio==1.6.0
networkx @ file:///home/conda/feedstock_root/build_artifacts/networkx_1712540363324/work
numba==0.60.0
numpy==1.26.4
nvidia-nccl-cu12==2.22.3
openai==1.37.1
opentelemetry-api==1.25.0
opentelemetry-exporter-otlp==1.25.0
opentelemetry-exporter-otlp-proto-common==1.25.0
opentelemetry-exporter-otlp-proto-grpc==1.25.0
opentelemetry-exporter-otlp-proto-http==1.25.0
opentelemetry-instrumentation==0.46b0
opentelemetry-instrumentation-grpc==0.46b0
opentelemetry-proto==1.25.0
opentelemetry-sdk==1.25.0
opentelemetry-semantic-conventions==0.46b0
outlines==0.0.34
packaging==24.1
pandas==2.2.2
peft==0.10.0
pillow==10.4.0
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1706713388748/work
pluggy @ file:///home/conda/feedstock_root/build_artifacts/pluggy_1706116770704/work
prometheus_client==0.20.0
protobuf==4.25.3
psutil==6.0.0
py-cpuinfo==9.0.0
pyarrow==17.0.0
pyarrow-hotfix==0.6
pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1696355758174/work
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1711811537435/work
pydantic==2.8.2
pydantic_core==2.20.1
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1695373428874/work
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rpds-py==0.19.0
ruamel.yaml @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml_1707298115475/work
ruamel.yaml.clib @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml.clib_1707314473442/work
safetensors==0.4.3
scipy==1.13.1
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.1
sympy @ file:///home/conda/feedstock_root/build_artifacts/sympy_1718625539893/work
text-generation-server @ file:///usr/src/server
texttable==1.7.0
tokenizers==0.19.1
torch==2.3.0
tqdm==4.66.4
transformers==4.42.4
triton==2.3.0
truststore @ file:///home/conda/feedstock_root/build_artifacts/truststore_1694154605758/work
typer==0.6.1
types-protobuf==5.27.0.20240626
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
tzdata==2024.1
urllib3==2.2.2
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.19.2
zstandard==0.22.0

jeffreymeetkai

MeetKai org Jul 31, 2024

@jeril I've managed to reproduce your error. It is because the Functionary TGI server is not started. As TGI itself does not support tool-use catered to Functionary models' prompt template format, we have to run another Functionary TGI server on top of the TGI docker container running the model. The Functionary TGI server forms the input prompt and parses raw model responses from the model into OpenAI-compatible API responses. If you make API requests directly to the TGI docker container, errors like what you encountered will appear.

You can do the following:

Run the TGI docker container which loads the model

export volume=$PWD/data
docker run --gpus all --shm-size 64g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-9935720 --model-id meetkai/functionary-small-v2.5

Run the Functionary TGI server which will connect to the TGI docker container endpoint

python3 server_tgi.py --model meetkai/functionary-small-v2.5 --endpoint http://127.0.0.1:8080 --port 8000

Make API requests to the Functionary TGI server endpoint.

Our Functionary TGI server also supports starting a TGI docker container automatically at startup if no existing endpoint is detected. For more details, you can refer to the "Text-Generation-Inference" section here.

Hope this helps. Do let me know if you encounter any other problems.

jeril

Aug 1, 2024

Thank you so much for clariying this.

jeril changed discussion status to closed Aug 1, 2024

jeril

Aug 1, 2024

•

edited Aug 1, 2024

Hope to be of help to someone!
Following are the steps that worked for me:

Run the TGI docker container which loads the model

export volume=$PWD/data
docker run --gpus all --shm-size 64g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-9935720 --model-id meetkai/functionary-small-v2.5

Clone the functionary repo and install the dependencies:

git clone https://github.com/MeetKai/functionary.git
cd functionary
pip install -r requirements.txt

Run the Functionary TGI server which will connect to the TGI docker container endpoint:

python3 server_tgi.py --model meetkai/functionary-small-v2.5 --endpoint http://127.0.0.1:8080 --port 8000

Script to test the model response

import requests
from pprint import pprint

data = {
    "model": "meetkai/functionary-small-v2.5",
    "messages": [{"role": "user", "content": "What is the weather in Riyadh ?"}],
    "stop": ["<|end_of_text|>", "<|eot_id|>"],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        }
                    },
                    "required": ["location"],
                },
            },
        }
    ],
    "tool_choice": "auto",
}

response = requests.post(
    "http://127.0.0.1:8000/v1/chat/completions",
    json=data,
    headers={"Content-Type": "application/json", "Authorization": "Bearer xxxx"},
)

data = response.json()
pprint(data)

Output:

{'choices': [{'finish_reason': 'tool_calls',
              'index': 0,
              'message': {'content': None,
                          'function_call': None,
                          'name': None,
                          'role': 'assistant',
                          'tool_call_id': None,
                          'tool_calls': [{'function': {'arguments': '{"location": '
                                                                    '"Riyadh"}',
                                                       'name': 'get_current_weather'},
                                          'id': 'call_uR5Aq68teXsXpOnH1AdE7nx4',
                                          'index': None,
                                          'type': 'function'}]}}],
 'created': 1722519859,
 'id': 'cmpl-bab8c09ab4b34c84b9e43dcdf37e396e',
 'model': 'meetkai/functionary-small-v2.5',
 'object': 'chat.completion',
 'usage': {'completion_tokens': 14, 'prompt_tokens': 117, 'total_tokens': 131}}

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment