MotionLLM: Understanding Human Behaviors from Human Motions and Videos Paper • 2405.20340 • Published 30 days ago • 19
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published Apr 28 • 25
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding Paper • 2405.08344 • Published May 14 • 10
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Paper • 2406.02523 • Published 25 days ago • 8
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis Paper • 2406.06216 • Published 19 days ago • 16
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published 19 days ago • 22
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published 5 days ago • 45
VideoLLM-online: Online Video Large Language Model for Streaming Video Paper • 2406.11816 • Published 12 days ago • 20
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published 3 days ago • 42
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published 16 days ago • 47
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published 16 days ago • 17
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published 22 days ago • 39
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 18 days ago • 30
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper • 2406.04338 • Published 23 days ago • 32
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published 23 days ago • 47
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 18 days ago • 53
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models Paper • 2406.07472 • Published 18 days ago • 10
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published 22 days ago • 49
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration Paper • 2406.01014 • Published 26 days ago • 29
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published 25 days ago • 27