Can this grounding model output bbox?

#3
by xxheyu - opened

with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
print(tokenizer.decode(outputs[0]))

It only output the description of the image. How can I get the bounding box?

Thanks in advace!

I just meet the same issue.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

It is in the output text with [[x1,y1,x2,y2]]

I tried several prompts, but I find the HF version model cannot output the box coordinates.
Then, I shifted to the SAT version model. I found it can generate the box coordinates.

Sign up or log in to comment