Model throws an error when using with Inference Endpoints

#18
by pzmudzinski - opened

I deployed it using Inference Endpoints but whenever I am doing POST to deployed endpoint with an image as a body it's throwing this error:

{
    "error": "'Image' object is not subscriptable"
}

Should be it be used differently there? (decoded to base64 and send as a json would be my guess?)

Trying to get the same response as using Inference API but in own deployment so there won't be downtime:

[
    {
        "score": 1.0,
        "label": "Background",
        "mask": ...
    },
    {
        "score": 1.0,
        "label": "Hair",
        "mask": ...
   },
...
]

If you take a look in the "files and versions" tab of the model, it will have a "handler.py" file which is what endpoints uses I think. It expects a dictionary where the image has the key "image" so that might be causing the issue. So I would guess that's the issue, not sending the model in a dictionary format. From the code the image should also be encoded to base 64. But do just look at the handler code https://huggingface.co/mattmdjaga/segformer_b2_clothes/blob/main/handler.py to see what's going on.

Also, feel free to fork the model and change to handler.py file to suit your needs. I initially made it over a year ago for a work project so it might not be the best fit for all use cases.

Thank for quick response, after trying it out I am getting as a response just huge array of array of numbers:

[
    [
        0,
        0,
        0,
        0,
       ...

How can I convert it to format coming from prototype API:

    {
        "score": 1.0,
        "label": "Background",
        "mask": ...
    },

so this thread https://huggingface.co/mattmdjaga/segformer_b2_clothes/discussions/17 should help you get everything except the mask. I don't actually know what the mask encoding is. You'll also need to convert the list to a tensor to follow the thread.

This is response format:

label	The label for the class (model specific) of a segment.
score	A float that represents how likely it is that the segment belongs to the given class.
mask	A str (base64 str of a single channel black-and-white img) representing the mask of a segment.

So there is no way to somehow use whatever huggingface is using to implement that behavior?

Oh, if that's the case then you could loop for every present label in the prediction and encode array[array==label_int]. Does that make sense? so something like

id2label = model.config.id2label
pred_ids = pred_seg.unique()
output = []
for id in pred_ids:
    mask = pred_seg[pred_seg==id]
    output.append({
        "score": 1,
        "label": id2label[id.item()],
        "mask": encode.base64(mask)
    })

Ok so you are saying I should :

  1. download this repository,
  2. make those changes in handler.py (I assume those changes should replace line 39)
  3. create own model in hugging faces
  4. push changed codebase
  5. re-deploy it using Inference Endpoints

Is that correct or I am missing something?

Yes, you can also test the handler method before deploying to make sure it runs correctly, there's HF documentation on that. OR you could just use the current handler and add those steps as post-processing steps. They don't require a gpu so it might be quicker to add that as post-processing in your code instead of forking everything.

let me try with second option, will let you know how it goes. thanks for help!
I also reached to HF support if there is option to get the same behavior as on their prototyping API.

one more thing - what did you mean by encoding mask tensor into base64 as encode.base64(mask)?
what is "encode.base"?
shouldn't it look like something here?
https://stackoverflow.com/questions/75244472/how-to-convert-torch-tensor-to-base64-image

Yes, that stackoverflow is what i'm talking about, you should double check that you can encode and then decode a test image and that the decoded image is the same as input image.

This stack overflow example is throwing error:

    pil_image = transform(mask)
                ^^^^^^^^^^^^^^^
    return F.to_pil_image(pic, self.mode)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    raise ValueError(f"pic should be 2/3 dimensional. Got {pic.ndimension()} dimensions.")

I assume mask tensor has one dimension - any idea how to convert it to 3 dimensions?

you can avoid using torch transforms by turning the tensor into numpy like tensor.numpy() then turn the array to a PIL image like Image.fromarray(np_array)

Ok I came up with something like that (which would serve as flask proxy converting HF endpoint into the same format as prototyping API):

from flask import Flask
from flask import request
from PIL import Image
import requests
import base64
import torch
from io import BytesIO
import numpy as np
import os

app = Flask(__name__)

API_URL = os.environ.get("API_URL")
headers = {
    "Authorization": f"Bearer {os.environ.get('API_TOKEN')}",
}

id2label = {
    "0": "Background",
    "1": "Hat",
    "2": "Hair",
    "3": "Sunglasses",
    "4": "Upper-clothes",
    "5": "Skirt",
    "6": "Pants",
    "7": "Dress",
    "8": "Belt",
    "9": "Left-shoe",
    "10": "Right-shoe",
    "11": "Face",
    "12": "Left-leg",
    "13": "Right-leg",
    "14": "Left-arm",
    "15": "Right-arm",
    "16": "Bag",
    "17": "Scarf"
  }



@app
	.post("/classify")
def hello_world():
    input = request.json
    response = requests.post(API_URL, headers=headers, json=input)
    json = response.json()
    pred_seg = torch.tensor(json)
    pred_ids = pred_seg.unique()
    output = []
    for id in pred_ids:
        mask = (pred_seg == id)
        pil_image = Image.fromarray((mask * 255).numpy().astype(np.uint8))
        base64_string = image_to_base_64(pil_image)
        output.append({
            "score": 1,
            "label": id2label[str(id.item())],
            "mask": base64_string
        })
    return output

def image_to_base_64(image):
  buffered = BytesIO()
  image.save(buffered, format="PNG")
  img_str = base64.b64encode(buffered.getvalue())
  return img_str.decode('utf-8')

Looks good? It still does not require GPU?

yeh i think that's fine, can't tell without running. Yeh this should be fine without a GPU.

I published fully working example (deployable on AWS lambda) here in case anyone needs it in the future.

Sign up or log in to comment