Model Human Metric Auto Metric Identify (Binary Accuracy) Humans 95 92 Ground-truth Caption _ GPT3 (Oracle) 68 62 74 BLIP2 FlanT5-XXL (Fine-tuned) 27 24 73 BLIP2 FlanT5-XL (Fine-tuned) 15 18 60 Predicted Caption _ GPT3 33 42 59 BLIP2 FlanT5-XXL (Zero-shot) 0 0 50