How to make the model decide when to not use tools/call functions and provide normal chat response?

#1
by jeril - opened

I deployed your model using TGI (Huggingface). The model is able to provide responses when the question is related to tool calling. Following was the code used:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="functionary")

response = client.chat.completions.create(
    model="meetkai/functionary-small-v2.5",
    messages=[{"role": "user",
            "content": "What is the weather for Istanbul?"}
    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

I am also getting the following output:

ChatCompletion(id='', choices=[Choice(finish_reason='eos_token', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments={'location': 'Istanbul'}, name='openweathermap', description=None), type='function')]))], created=1721757239, model='meetkai/functionary-small-v2.5', object='chat.completion', service_tier=None, system_fingerprint='2.1.2-dev0-sha-9935720', usage=CompletionUsage(completion_tokens=23, prompt_tokens=19, total_tokens=42))

But when I changed the question to:

What is 4 + 4?

I was expecting the answer 8, but I got the following error:

Traceback (most recent call last):
  File "/eph/nvme0/azureml/cr/j/5fab296a307947099e421be9e45eb265/exe/wd/logs/test2.py", line 5, in <module>
    response = client.chat.completions.create(
  File "/opt/conda/lib/python3.10/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 646, in create
    return self._post(
  File "/opt/conda/lib/python3.10/site-packages/openai/_base_client.py", line 1266, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/opt/conda/lib/python3.10/site-packages/openai/_base_client.py", line 942, in request
    return self._request(
  File "/opt/conda/lib/python3.10/site-packages/openai/_base_client.py", line 1046, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.UnprocessableEntityError: Error code: 422 - {'error': 'Tool error: No function found in generated text', 'error_type': 'tool_error'}

Could you please help me understand how to make the model decide when to not use tools/call functions and provide normal chat response?

MeetKai org

Hi, can you provide more details of your dependencies so I can reproduce this error? I tried What is 4 + 4? and it worked totally fine

ChatCompletion(
    id='cmpl-fbcff334c54f406983371d64ab9bbe8f',
    choices=[
        Choice(
            finish_reason='stop',
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content='4 + 4 = 8',
                role='assistant',
                function_call=None,
                tool_calls=None,
                tool_call_id=None,
                name=None
            )
        )
    ],
    created=1721962674,
    model='meetkai/functionary-small-v2.5',
    object='chat.completion',
    service_tier=None,
    system_fingerprint=None,
    usage=CompletionUsage(completion_tokens=8, prompt_tokens=118, total_tokens=126)
)

Hi,

Thank you for the reply.
I used the following docker image:

ghcr.io/huggingface/text-generation-inference:sha-9935720

Then I started their inference server using the following command:

text-generation-launcher --model-id meetkai/functionary-small-v2.5 --num-shard 1 --port 8080

Then I tried the script that was initially shared. It worked fine for What is the weather for Istanbul?, whereas it gave the previously mentioned error for What is 4 + 4?.
How did you call the model for this question What is 4 + 4?? Did you pass toolsand tool_choice parameters? Because when I call the model without tools and tool_choice, I get the same response as you, and I get the mentioned error while passing the tools and tool_choice parameters. If you don’t mind, can you please help by sharing the script that you used for testing?

The following is the output of my pip freeze:

accelerate==0.29.3
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.4.0
archspec @ file:///home/conda/feedstock_root/build_artifacts/archspec_1708969572489/work
async-timeout==4.0.3
attrs==23.2.0
bitsandbytes==0.43.1
boltons @ file:///home/conda/feedstock_root/build_artifacts/boltons_1711936407380/work
Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1695989787169/work
certifi==2024.7.4
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1696001684923/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1698833585322/work
click==8.1.7
cloudpickle==3.0.0
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1666700638685/work
conda @ file:///home/conda/feedstock_root/build_artifacts/conda_1715631919917/work
conda-libmamba-solver @ file:///home/conda/feedstock_root/build_artifacts/conda-libmamba-solver_1706566000184/work/src
conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1691048088238/work
conda_package_streaming @ file:///home/conda/feedstock_root/build_artifacts/conda-package-streaming_1691009212940/work
datasets==2.20.0
Deprecated==1.2.14
dill==0.3.8
diskcache==5.6.3
distro @ file:///home/conda/feedstock_root/build_artifacts/distro_1704321475663/work
einops==0.6.1
exceptiongroup==1.2.2
filelock @ file:///home/conda/feedstock_root/build_artifacts/filelock_1719088281970/work
frozendict @ file:///home/conda/feedstock_root/build_artifacts/frozendict_1715092766944/work
frozenlist==1.4.1
fsspec==2024.5.0
gmpy2 @ file:///home/conda/feedstock_root/build_artifacts/gmpy2_1715527283764/work
googleapis-common-protos==1.63.2
grpc-interceptor==0.15.4
grpcio==1.65.1
grpcio-reflection==1.62.2
grpcio-status==1.62.2
grpcio-tools==1.62.2
h11==0.14.0
hf_transfer==0.1.6
httpcore==1.0.5
httpx==0.27.0
huggingface-hub==0.23.5
idna==3.7
importlib_metadata==7.1.0
interegular==0.3.3
Jinja2 @ file:///home/conda/feedstock_root/build_artifacts/jinja2_1715127149914/work
joblib==1.4.2
jsonpatch @ file:///home/conda/feedstock_root/build_artifacts/jsonpatch_1695536281965/work
jsonpointer @ file:///home/conda/feedstock_root/build_artifacts/jsonpointer_1695397238043/work
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
lark==1.1.9
libmambapy @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1711394305528/work/libmambapy
llvmlite==0.43.0
loguru==0.6.0
mamba @ file:///home/conda/feedstock_root/build_artifacts/mamba-split_1711394305528/work/mamba
MarkupSafe @ file:///home/conda/feedstock_root/build_artifacts/markupsafe_1706899921127/work
menuinst @ file:///home/conda/feedstock_root/build_artifacts/menuinst_1705068285533/work
mpmath @ file:///home/conda/feedstock_root/build_artifacts/mpmath_1678228039184/work
multidict==6.0.5
multiprocess==0.70.16
mypy-protobuf==3.6.0
nest-asyncio==1.6.0
networkx @ file:///home/conda/feedstock_root/build_artifacts/networkx_1712540363324/work
numba==0.60.0
numpy==1.26.4
nvidia-nccl-cu12==2.22.3
openai==1.37.1
opentelemetry-api==1.25.0
opentelemetry-exporter-otlp==1.25.0
opentelemetry-exporter-otlp-proto-common==1.25.0
opentelemetry-exporter-otlp-proto-grpc==1.25.0
opentelemetry-exporter-otlp-proto-http==1.25.0
opentelemetry-instrumentation==0.46b0
opentelemetry-instrumentation-grpc==0.46b0
opentelemetry-proto==1.25.0
opentelemetry-sdk==1.25.0
opentelemetry-semantic-conventions==0.46b0
outlines==0.0.34
packaging==24.1
pandas==2.2.2
peft==0.10.0
pillow==10.4.0
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1706713388748/work
pluggy @ file:///home/conda/feedstock_root/build_artifacts/pluggy_1706116770704/work
prometheus_client==0.20.0
protobuf==4.25.3
psutil==6.0.0
py-cpuinfo==9.0.0
pyarrow==17.0.0
pyarrow-hotfix==0.6
pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1696355758174/work
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1711811537435/work
pydantic==2.8.2
pydantic_core==2.20.1
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML @ file:///home/conda/feedstock_root/build_artifacts/pyyaml_1695373428874/work
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rpds-py==0.19.0
ruamel.yaml @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml_1707298115475/work
ruamel.yaml.clib @ file:///home/conda/feedstock_root/build_artifacts/ruamel.yaml.clib_1707314473442/work
safetensors==0.4.3
scipy==1.13.1
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.1
sympy @ file:///home/conda/feedstock_root/build_artifacts/sympy_1718625539893/work
text-generation-server @ file:///usr/src/server
texttable==1.7.0
tokenizers==0.19.1
torch==2.3.0
tqdm==4.66.4
transformers==4.42.4
triton==2.3.0
truststore @ file:///home/conda/feedstock_root/build_artifacts/truststore_1694154605758/work
typer==0.6.1
types-protobuf==5.27.0.20240626
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
tzdata==2024.1
urllib3==2.2.2
wrapt==1.16.0
xxhash==3.4.1
yarl==1.9.4
zipp==3.19.2
zstandard==0.22.0
MeetKai org

@jeril I've managed to reproduce your error. It is because the Functionary TGI server is not started. As TGI itself does not support tool-use catered to Functionary models' prompt template format, we have to run another Functionary TGI server on top of the TGI docker container running the model. The Functionary TGI server forms the input prompt and parses raw model responses from the model into OpenAI-compatible API responses. If you make API requests directly to the TGI docker container, errors like what you encountered will appear.

You can do the following:

  1. Run the TGI docker container which loads the model
export volume=$PWD/data
docker run --gpus all --shm-size 64g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-9935720 --model-id meetkai/functionary-small-v2.5
  1. Run the Functionary TGI server which will connect to the TGI docker container endpoint
python3 server_tgi.py --model meetkai/functionary-small-v2.5 --endpoint http://127.0.0.1:8080 --port 8000
  1. Make API requests to the Functionary TGI server endpoint.

Our Functionary TGI server also supports starting a TGI docker container automatically at startup if no existing endpoint is detected. For more details, you can refer to the "Text-Generation-Inference" section here.

Hope this helps. Do let me know if you encounter any other problems.

Thank you so much for clariying this.

jeril changed discussion status to closed

Hope to be of help to someone!
Following are the steps that worked for me:

  1. Run the TGI docker container which loads the model
export volume=$PWD/data
docker run --gpus all --shm-size 64g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-9935720 --model-id meetkai/functionary-small-v2.5
  1. Clone the functionary repo and install the dependencies:
git clone https://github.com/MeetKai/functionary.git
cd functionary
pip install -r requirements.txt
  1. Run the Functionary TGI server which will connect to the TGI docker container endpoint:
python3 server_tgi.py --model meetkai/functionary-small-v2.5 --endpoint http://127.0.0.1:8080 --port 8000
  1. Script to test the model response
import requests
from pprint import pprint

data = {
    "model": "meetkai/functionary-small-v2.5",
    "messages": [{"role": "user", "content": "What is the weather in Riyadh ?"}],
    "stop": ["<|end_of_text|>", "<|eot_id|>"],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        }
                    },
                    "required": ["location"],
                },
            },
        }
    ],
    "tool_choice": "auto",
}

response = requests.post(
    "http://127.0.0.1:8000/v1/chat/completions",
    json=data,
    headers={"Content-Type": "application/json", "Authorization": "Bearer xxxx"},
)

data = response.json()
pprint(data)

Output:

{'choices': [{'finish_reason': 'tool_calls',
              'index': 0,
              'message': {'content': None,
                          'function_call': None,
                          'name': None,
                          'role': 'assistant',
                          'tool_call_id': None,
                          'tool_calls': [{'function': {'arguments': '{"location": '
                                                                    '"Riyadh"}',
                                                       'name': 'get_current_weather'},
                                          'id': 'call_uR5Aq68teXsXpOnH1AdE7nx4',
                                          'index': None,
                                          'type': 'function'}]}}],
 'created': 1722519859,
 'id': 'cmpl-bab8c09ab4b34c84b9e43dcdf37e396e',
 'model': 'meetkai/functionary-small-v2.5',
 'object': 'chat.completion',
 'usage': {'completion_tokens': 14, 'prompt_tokens': 117, 'total_tokens': 131}}

Sign up or log in to comment