Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29, 2024 • 34
Vision-Language Models Can Self-Improve Reasoning via Reflection Paper • 2411.00855 • Published Oct 30, 2024 • 5
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 62
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published 2 days ago • 12