An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 20 days ago • 53
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 20 days ago • 30
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published 19 days ago • 38
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published 19 days ago • 23
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination Paper • 2406.05132 • Published 24 days ago • 27
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published 22 days ago • 21
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published 21 days ago • 35