Clarification on Box Coordinates Scaling

#1
by demitsuki - opened

Hi Ferret Team,

Thanks for sharing this checkpoint!

I ran the following example on your demo, using Ferret-UI-Llama8b model and default parameters:
{
"id": 0,
"image": "appstore_reminders.png",
"image_h": 2532,
"image_w": 1170,
"conversations": [
{
"from": "human",
"value": "\nWhere is the Games Tab located?"
}
]
}

The response returned: Games Tab [[0, 906, 256, 965]]. However, this box doesn’t seem to align with the "Games Tab" in the image, whether scaled or unscaled.

Could you clarify the scaling logic applied to the box and how I should interpret it?

Thanks!
Demi

Hi @demitsuki sorry for the late reply!
You can check the scaling logic here: https://github.com/apple/ml-ferret/blob/main/ferretui/ferretui/eval/model_UI.py

To speed up:

# ratio
ratio_w = VOCAB_IMAGE_W * 1.0 / image_wdith
ratio_h = VOCAB_IMAGE_H * 1.0 /image_height
def get_bbox_coor(box, ratio_w, ratio_h):
    return box[0] * ratio_w, box[1] * ratio_h, box[2] * ratio_w, box[3] * ratio_h

Sign up or log in to comment