PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 10 days ago • 111
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published 9 days ago • 97
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published 15 days ago • 15
Mimir: Improving Video Diffusion Models for Precise Text Understanding Paper • 2412.03085 • Published 11 days ago • 12
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published 12 days ago • 17
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion Paper • 2412.03515 • Published 10 days ago • 25
Trajectory Attention for Fine-grained Video Motion Control Paper • 2411.19324 • Published 16 days ago • 12
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model Paper • 2411.19108 • Published 17 days ago • 16
On Domain-Specific Post-Training for Multimodal Large Language Models Paper • 2411.19930 • Published 15 days ago • 24
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS Paper • 2411.18478 • Published 17 days ago • 31
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 52