Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published 5 days ago • 39
Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems Paper • 2210.15037 • Published Oct 26, 2022 • 1
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Paper • 2410.04734 • Published Oct 7 • 16
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published Oct 14 • 37