LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 10 days ago • 99
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation Paper • 2411.04999 • Published 18 days ago • 16
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14 • 50
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Paper • 2408.12480 • Published Aug 22 • 17