MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper • 2501.02955 • Published Jan 6 • 40
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Paper • 2412.21059 • Published Dec 30, 2024 • 18
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Paper • 2412.15204 • Published Dec 19, 2024 • 33
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published Dec 16, 2024 • 18
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31, 2024 • 49
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Paper • 2409.02897 • Published Sep 4, 2024 • 47
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29, 2024 • 57
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13, 2024 • 66
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12, 2024 • 17
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12, 2024 • 39