How Far is Video Generation from World Model: A Physical Law Perspective Paper β’ 2411.02385 β’ Published 22 days ago β’ 33
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper β’ 2411.04709 β’ Published 21 days ago β’ 25
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper β’ 2411.07975 β’ Published 14 days ago β’ 24
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper β’ 2411.08380 β’ Published 13 days ago β’ 25
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper β’ 2411.04997 β’ Published 19 days ago β’ 35
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper β’ 2411.02397 β’ Published 22 days ago β’ 22
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper β’ 2410.10306 β’ Published Oct 14 β’ 52