ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Paper • 2411.18363 • Published 23 days ago • 9
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published May 16 • 26
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published May 16 • 26
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection Paper • 2303.05499 • Published Mar 9, 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models Paper • 2305.15023 • Published May 24, 2023
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy Paper • 2403.14610 • Published Mar 21 • 3
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Paper • 2401.14159 • Published Jan 25 • 1
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 11
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 48