LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 10 days ago • 99
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation Paper • 2411.04999 • Published 18 days ago • 16
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14 • 50