Reangle-A-Video: 4D Video Generation as Video-to-Video Translation Paper โข 2503.09151 โข Published 1 day ago โข 26
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper โข 2503.08638 โข Published 2 days ago โข 53
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper โข 2503.04812 โข Published 10 days ago โข 12
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer Paper โข 2503.07027 โข Published 4 days ago โข 23
Token-Efficient Long Video Understanding for Multimodal LLMs Paper โข 2503.04130 โข Published 8 days ago โข 77
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper โข 2502.20321 โข Published 14 days ago โข 29
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper โข 2502.17157 โข Published 18 days ago โข 51
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper โข 2502.14499 โข Published 22 days ago โข 179
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper โข 2502.14786 โข Published 21 days ago โข 129
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper โข 2501.16975 โข Published Jan 28 โข 26
Diffusion Adversarial Post-Training for One-Step Video Generation Paper โข 2501.08316 โข Published Jan 14 โข 33
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper โข 2501.05441 โข Published Jan 9 โข 88