PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published about 1 month ago • 120
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Paper • 2406.11030 • Published Jun 16, 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning Paper • 2406.02265 • Published Jun 4, 2024 • 6
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings Paper • 2404.16820 • Published Apr 25, 2024 • 15
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models Paper • 2311.07022 • Published Nov 13, 2023 • 1
Measuring Progress in Fine-grained Vision-and-Language Understanding Paper • 2305.07558 • Published May 12, 2023 • 1