nitzanguetta commited on
Commit
4dd5ec2
·
verified ·
1 Parent(s): 318fc5f

Upload Visual-Riddles-Leaderboard.tsv

Browse files
Files changed (1) hide show
  1. Visual-Riddles-Leaderboard.tsv +14 -12
Visual-Riddles-Leaderboard.tsv CHANGED
@@ -1,12 +1,14 @@
1
- Model Image Captioning Visual Question Answering Image-Text Matching Human Metric - Explanation of Violation Auto Metric - Explanation of Violation Identify - Explanation of Violation
2
- Humans 95 92
3
- Ground-truth Caption _ GPT3 (Oracle) 68 62 74
4
- BLIP2 FlanT5-XXL (Fine-tuned) 177 57 84 27 24 73
5
- BLIP2 FlanT5-XL (Fine-tuned) 174 55 81 15 18 60
6
- Predicted Caption _ GPT3 33 42 59
7
- BLIP2 FlanT5-XXL (Zero-shot) 120 55 71 0 0 50
8
- CLIP ViT-L/14 (Zero-shot) 70
9
- OFA Large (Zero-shot) 0 38
10
- CoCa ViT-L-14 MSCOCO (Zero-shot) 102 72
11
- BLIP Large (Zero-shot) 65 39 77
12
- BLIP2 FlanT5-XXL (Text only FT) 2 24 94
 
 
 
1
+ Model Open Ended VQA: % Human Rating Multiple Choice VQA: % Accuracy Hints-Multiple Choice VQA: % Accuracy Attributions-Multiple Choice VQA: % Accuracy Refernce Based-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings Refernce Free-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings Automatic Evaluation: % Auto-Rater Ratings Hints-Automatic Evaluation: % Auto-Rater Ratings Attributions-Automatic Evaluation: % Auto-Rater Ratings
2
+ Humans 82 78
3
+ Gemini Pro 1.5 40 38 66 72 87 52 53 62 29
4
+ Gemini Pro Vision 30 41 62 75 38 34 47
5
+ GPT4 34 45 69 82 86 51 38 61 25
6
+ LlaVA-1.6-34B 15 24 30 76 43 21 16
7
+ LlaVA-1.5-7B 13 17 29 70 35 19 30
8
+ InstructBlip 13 20 28
9
+ Gemini Pro 1.5 Caption _ Gemini Pro 1.5 23
10
+ Human (Oracle) Caption _ Gemini Pro 1.5 50
11
+ Claude 3.5 Sonnet 46 45 39
12
+ GPT4o 55 83 50
13
+ Qwen-VL-Max 35 53 26
14
+ Molmo-7B 34 42 36