LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 8 days ago • 55
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation Paper • 2411.17636 • Published Nov 26, 2024 • 2
MALT: Improving Reasoning with Multi-Agent LLM Training Paper • 2412.01928 • Published Dec 2, 2024 • 40
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published Nov 7, 2024 • 20
PALO: A Polyglot Large Multimodal Model for 5B People Paper • 2402.14818 • Published Feb 22, 2024 • 23
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models Paper • 2311.13435 • Published Nov 22, 2023 • 16
TokenFlow: Consistent Diffusion Features for Consistent Video Editing Paper • 2307.10373 • Published Jul 19, 2023 • 56