MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published 6 days ago • 18
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published 25 days ago • 69
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published 11 days ago • 33
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published 11 days ago • 27