view post Post 2625 The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.You can try it out with my space maxiw/Qwen2-VL-Detection 4 replies · 👍 11 11 👀 5 5 🤗 1 1 + Reply
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32