Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions Paper • 2406.10638 • Published Jun 15, 2024
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published 17 days ago • 52
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published 17 days ago • 52
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding Paper • 2409.14485 • Published Sep 22, 2024
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published 17 days ago • 52
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 34
Efficient Multimodal Learning from Data-centric Perspective Paper • 2402.11530 • Published Feb 18, 2024 • 1
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published Jul 11, 2024 • 17
MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding Paper • 2406.04264 • Published Jun 6, 2024 • 1
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval Paper • 2406.04292 • Published Jun 6, 2024 • 1
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation Paper • 2402.03216 • Published Feb 5, 2024 • 4
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder Paper • 2205.12035 • Published May 24, 2022