What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published 19 days ago • 38
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation Paper • 2406.08392 • Published 19 days ago • 17
Hierarchical Patch Diffusion Models for High-Resolution Video Generation Paper • 2406.07792 • Published 20 days ago • 13
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published 19 days ago • 47
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination Paper • 2406.05132 • Published 24 days ago • 27
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published 19 days ago • 23
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 20 days ago • 30
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper • 2406.04338 • Published 25 days ago • 32
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Paper • 2406.06523 • Published 21 days ago • 48
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published 23 days ago • 39