A curated list of papers on vision-language models, with the most influential ones at the top.
-
Improved Baselines with Visual Instruction Tuning
Paper β’ 2310.03744 β’ Published β’ 37 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper β’ 2403.05525 β’ Published β’ 40 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper β’ 2308.12966 β’ Published β’ 7 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper β’ 2404.01331 β’ Published β’ 25