OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published 4 days ago • 41
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published 7 days ago • 34
Benchmarking Mental State Representations in Language Models Paper • 2406.17513 • Published 11 days ago • 3
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 11 days ago • 73
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework Paper • 2403.13248 • Published Mar 20 • 74
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Paper • 2406.11768 • Published 19 days ago • 20
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published 18 days ago • 14
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published 16 days ago • 30
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published 19 days ago • 36
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88