Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions Paper • 2503.13369 • Published 3 days ago • 6
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models Paper • 2503.12545 • Published 4 days ago • 5
Aligning Multimodal LLM with Human Preference: A Survey Paper • 2503.14504 • Published 2 days ago • 20
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification Paper • 2503.12505 • Published 4 days ago • 9
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published 4 days ago • 26
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published 2 days ago • 41
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published 3 days ago • 24
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning Paper • 2503.13444 • Published 3 days ago • 12