Calls that default to conversational task fail with a 404

#82
by ccozad - opened

tl;dr; The HuggingFaceInferenceAPI class in LlamaIndex calls the conversational API and this causes a 404. A work around is to add a task="text-generation" parameter to force the library to use a valid task name. This may be a problem in other areas that default to the conversational task

I ran into an issue in the "Components of LlamaIndex" notebook where the section that has you create a VectorStoreIndex in LlamaIndex and then use it (index.as_query_engine(...) and then query_engine.query(...))that then throws a 404 not found exception like:

huggingface_hub.errors.HfHubHTTPError: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/Qwen/Qwen2.5-Coder-32B-Instruct/v1/chat/completions

Based on web searches, conversational tasks were deprecated in 2024. It looks like the task was perhaps finally removed recently?

The relevant packages in LlamaIndex state the following

class HuggingFaceInferenceAPI(FunctionCallingLLM):
    """
    Wrapper on the Hugging Face's Inference API.

    Overview of the design:
    - Synchronous uses InferenceClient, asynchronous uses AsyncInferenceClient
    - chat uses the conversational task: https://huggingface.co/tasks/conversational
    - complete uses the text generation task: https://huggingface.co/tasks/text-generation

    Note: some models that support the text generation task can leverage Hugging
    Face's optimized deployment toolkit called text-generation-inference (TGI).
    Use InferenceClient.get_model_status to check if TGI is being used.

    Relevant links:
    - General Docs: https://huggingface.co/docs/api-inference/index
    - API Docs: https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client
    - Source: https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub/inference
    """

The HuggingFaceInferenceAPI can be called with a task parameter to workaround this issue like task="text-generation" to force the library to use a valid task name.

The full call should look like the following:

llm = HuggingFaceInferenceAPI(
    token=hf_token, 
    model="Qwen/Qwen2.5-Coder-32B-Instruct",
    task="text-generation"
)

or like so if using a notebook with the HF token set earlier:

llm = HuggingFaceInferenceAPI(
    model="Qwen/Qwen2.5-Coder-32B-Instruct",
    task="text-generation"
)

Relevant issue on the LlamaIndex side: https://github.com/run-llama/llama_index/issues/18547

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment