VisionArena: 230K Real World User-VLM Conversations with Preference Labels Paper • 2412.08687 • Published 11 days ago • 11
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15 • 32
Describing Differences in Image Sets with Natural Language Paper • 2312.02974 • Published Dec 5, 2023 • 13