TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models Paper • 2410.23266 • Published 6 days ago • 19
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet' Paper • 2410.21647 • Published 8 days ago • 11
RARe: Retrieval Augmented Retrieval with In-Context Examples Paper • 2410.20088 • Published 11 days ago • 5
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback Paper • 2410.21242 • Published 8 days ago • 6
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization Paper • 2410.19609 • Published 11 days ago • 14
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation Paper • 2410.23090 • Published 6 days ago • 51
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published 13 days ago • 193
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines Paper • 2410.21220 • Published 8 days ago • 8
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published 12 days ago • 11
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Paper • 2410.18558 • Published 13 days ago • 17
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper • 2410.17243 • Published 14 days ago • 86
Teach Multimodal LLMs to Comprehend Electrocardiographic Images Paper • 2410.19008 • Published 15 days ago • 22
JudgeBench: A Benchmark for Evaluating LLM-based Judges Paper • 2410.12784 • Published 20 days ago • 40
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks Paper • 2410.12381 • Published 21 days ago • 41
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio Paper • 2410.12787 • Published 20 days ago • 30