Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 11
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 11
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Paper • 2312.00081 • Published Nov 30, 2023 • 2
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation Paper • 2311.14671 • Published Nov 24, 2023
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs Paper • 2406.04334 • Published Jun 6, 2024
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning Paper • 2311.07574 • Published Nov 13, 2023 • 14