How to use visual grounding with this model ?
#25
by
r4hul77
- opened
In the documentation it is said that this model has visual grounding (object detection and segmentation), what is the best way to use this from this model (As I understand llama only outputs text tokens) ?
x2