-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 41 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 33 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 13 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 60
Collections
Discover the best community collections!
Collections including paper arxiv:2408.02718
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 21 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 80 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
EVLM: An Efficient Vision-Language Model for Visual Understanding
Paper • 2407.14177 • Published • 42 -
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
Paper • 2407.04172 • Published • 22 -
facebook/chameleon-7b
Image-Text-to-Text • Updated • 19.1k • 164 -
vidore/colpali
Updated • 35.5k • 365
-
E5-V: Universal Embeddings with Multimodal Large Language Models
Paper • 2407.12580 • Published • 39 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 33 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 60
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 12 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 30
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 13 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 4