zwgao commited on
Commit
2dce269
1 Parent(s): f4c1979

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -40
README.md CHANGED
@@ -52,52 +52,76 @@ InternVL 2.5 is a multimodal large language model series, featuring models of va
52
 
53
  ### Image Benchmarks
54
 
55
-
56
- | Benchmark | InternVL-Chat-V1.5 | InternVL2-26B | InternVL2.5-26B | Cambrian-34B | VILA-1.5-40B | InternVL2-40B | InternVL2.5-38B |
57
- |---------------------|------------------- |-------------- |---------------- |---------------|-------------- |----------------|----------------|
58
- | MMMU (val) | 46.8 | 51.2 | 60.0 | 49.7 | 55.1 | 55.2 | 63.9 |
59
- | MMMU (test) | 41.0 | 43.8 | 51.8 | - | 46.9 | 49.3 | 57.6 |
60
- | MMMU-PRO (overall) | 24.7 | 30.0 | 37.1 | - | 25.0 | 34.2 | 46.0 |
61
- | MathVista (mini) | 53.5 | 59.4 | 67.7 | 53.2 | 49.5 | 63.7 | 71.9 |
62
- | MathVision (mini) | 15.8 | 23.4 | 28.0 | - | - | 21.4 | 32.2 |
63
- | MathVision (full) | 15.0 | 17.0 | 23.1 | - | - | 16.9 | 31.8 |
64
- | MathVerse (mini) | 28.4 | 31.1 | 40.1 | - | - | 36.3 | 49.4 |
65
- | Olympiad Bench | 0.6 | 3.5 | 8.8 | - | - | 3.9 | 12.1 |
66
- | AI2D (w / wo M) | 80.7 / 89.8 | 84.5 / 92.5 | 86.4 / 94.4 | 79.5 / - | 69.9 / - | 86.6 / 94.5 | 87.6 / 95.1 |
67
- | ChartQA (test avg.) |83.8 | 84.9 | 87.2 | 75.6 | 67.2 | 86.2 | 88.2 |
68
- | TextVQA (val) | 80.6 | 82.3 | 82.4 | 76.7 | 73.6 | 83.0 | 82.7 |
69
- | DocVQA (test) |90.9 | 92.9 | 94.0 | 75.5 | - | 93.9 | 95.3 |
70
- | InfoVQA (test) | 72.5 | 75.9 | 79.8 | 46.0 | - | 78.7 | 83.6 |
71
- | OCR-Bench |724 | 825 | 852 | 600 | 460 | 837 | 842 |
72
- | SEED-2 Plus | 66.3 | 67.6 | 70.8 | - | - | 69.2 | 71.2 |
73
- | CharXiv (RQ / DQ) | 29.2 / 58.5 | 33.4 / 62.4 | 35.9 / 73.5 | 27.3 / 59.7 | 24.0 / 38.7 | 32.3 / 66.0 | 42.4 / 79.6 |
74
- | VCR-EN-Easy (EM / Jaccard) |14.7 / 51.4 | 74.5 / 86.7 | 94.4 / 98.0 | 79.7 / 89.3 | - | 84.7 / 92.6 | 94.7 / 98.2 |
75
- | BLINK (val) |46.6 | 56.2 | 61.8 | - | - | 57.2 | 63.2 |
76
- | Mantis Eval | 66.8 | 69.6 | 75.6 | - | - |71.4 | 78.3 |
77
- | MMIU | 37.4 | 42.6 | 49.4 | - | - | 47.9 | 55.3 |
78
- | Muir Bench | 38.5 | 50.6 | 61.1 | - | - | 54.4 | 62.7 |
79
- | MMT (val) | 58.0 | 60.6 | 66.9 | - | - |66.2 | 70.0 |
80
- | MIRB (avg.) | 50.3 | 53.7 | 55.7 | - | - |55.2 | 61.2 |
81
- | RealWorld QA | 66.0 | 68.3 | 74.5 | 67.8 | - | 71.8 | 73.5 |
82
- | MME-RW (EN) | 49.4 | 58.7 | 61.8 | 44.1 | - |61.8 | 64.0 |
83
- | WildVision (win rate)|56.6 | 62.2 | 65.2 | - | - | 63.2 | 66.4 |
84
- | R-Bench | 67.9 | 70.1 | 72.9 | - | - | 73.3 | 72.1 |
85
- | MME (sum) | 2194.2 | 2260.7 | 2373.3 | - | - | 2307.5 | 2455.8 |
86
- | MMB (EN / CN) |82.2 / 82.0 | 83.4 / 82.0 | 85.4 / 85.5 | 80.4 / 79.2 | - | 86.8 / 86.5 | 86.5 / 86.3 |
87
- | MMBv1.1 (EN) | 80.3 | 81.5 | 84.2 | 78.3 | - | 85.1 | 85.5 |
88
- | MMVet (turbo) | 61.5 | 62.1 | 65.0 | 53.2 | - | 65.5 | 68.8 |
89
- | MMVetv2 (0613) |51.5 | 57.2 | 60.8 | - | - | 63.8 | 62.1 |
90
- | MMStar | 57.3 | 61.2 | 66.5 | 54.2 |- | 65.4 | 67.9 |
91
- | HallBench (avg.) | 50.3 | 50.7 | 55.0 | 41.6 | - | 56.9 | 56.8 |
92
- | MMHal (score) | 3.11 | 3.55 | 3.70 | - |- | 3.75 | 3.71 |
93
- | CRPE (relation) | 75.4 | 75.6 | 79.1 | - | - | 77.6 | 78.3 |
94
- | POPE (avg.) |88.4 | 88.0 | 90.6 | - |- | 88.4 | 90.7 |
95
 
96
 
97
 
98
 
99
  ### Video Benchmarks
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ### Multimodal Multilingual Understanding
102
  <table style="width: 100%; font-size: 8px; border-collapse: collapse; text-align: center;">
103
  <thead>
 
52
 
53
  ### Image Benchmarks
54
 
55
+ | Benchmark | InternVL2.5-26B | Cambrian-34B | VILA-1.5-40B | InternVL2.5-38B |
56
+ |----------------------------|-----------------|--------------|--------------|-----------------|
57
+ | MMMU (val) | 60.0 | 49.7 | 55.1 | 63.9 |
58
+ | MMMU (test) | 51.8 | - | 46.9 | 57.6 |
59
+ | MMMU-PRO (overall) | 37.1 | - | 25.0 | 46.0 |
60
+ | MathVista (mini) | 67.7 | 53.2 | 49.5 | 71.9 |
61
+ | MathVision (mini) | 28.0 | - | - | 32.2 |
62
+ | MathVision (full) | 23.1 | - | - | 31.8 |
63
+ | MathVerse (mini) | 40.1 | - | - | 49.4 |
64
+ | Olympiad Bench | 8.8 | - | - | 12.1 |
65
+ | AI2D (w / wo M) | 86.4 / 94.4 | 79.5 / - | 69.9 / - | 87.6 / 95.1 |
66
+ | ChartQA (test avg.) | 87.2 | 75.6 | 67.2 | 88.2 |
67
+ | TextVQA (val) | 82.4 | 76.7 | 73.6 | 82.7 |
68
+ | DocVQA (test) | 94.0 | 75.5 | - | 95.3 |
69
+ | InfoVQA (test) | 79.8 | 46.0 | - | 83.6 |
70
+ | OCR-Bench | 852 | 600 | 460 | 842 |
71
+ | SEED-2 Plus | 70.8 | - | - | 71.2 |
72
+ | CharXiv (RQ / DQ) | 35.9 / 73.5 | 27.3 / 59.7 | 24.0 / 38.7 | 42.4 / 79.6 |
73
+ | VCR-EN-Easy (EM / Jaccard) | 94.4 / 98.0 | 79.7 / 89.3 | - | 94.7 / 98.2 |
74
+ | BLINK (val) | 61.8 | - | - | 63.2 |
75
+ | Mantis Eval | 75.6 | - | - | 78.3 |
76
+ | MMIU | 49.4 | - | - | 55.3 |
77
+ | Muir Bench | 61.1 | - | - | 62.7 |
78
+ | MMT (val) | 66.9 | - | - | 70.0 |
79
+ | MIRB (avg.) | 55.7 | - | - | 61.2 |
80
+ | RealWorld QA | 74.5 | 67.8 | - | 73.5 |
81
+ | MME-RW (EN) | 61.8 | 44.1 | - | 64.0 |
82
+ | WildVision (win rate) | 65.2 | - | - | 66.4 |
83
+ | R-Bench | 72.9 | - | - | 72.1 |
84
+ | MME (sum) | 2373.3 | - | - | 2455.8 |
85
+ | MMB (EN / CN) | 85.4 / 85.5 | 80.4 / 79.2 | - | 86.5 / 86.3 |
86
+ | MMBv1.1 (EN) | 84.2 | 78.3 | - | 85.5 |
87
+ | MMVet (turbo) | 65.0 | 53.2 | - | 68.8 |
88
+ | MMVetv2 (0613) | 60.8 | - | - | 62.1 |
89
+ | MMStar | 66.5 | 54.2 | - | 67.9 |
90
+ | HallBench (avg.) | 55.0 | 41.6 | - | 56.8 |
91
+ | MMHal (score) | 3.70 | - | - | 3.71 |
92
+ | CRPE (relation) | 79.1 | - | - | 78.3 |
93
+ | POPE (avg.) | 90.6 | - | - | 90.7 |
 
94
 
95
 
96
 
97
 
98
  ### Video Benchmarks
99
 
100
+ | Model Name | Video-MME (wo / w sub) | MVBench | MMBench-Video (val) | MLVU (M-Avg) | LongVideoBench (val total) | CG-Bench v1.1 (long / clue acc.) |
101
+ |---------------------------------------------|-------------|------|-------|-------|------|-------------|
102
+ | **InternVL2.5-1B** | 50.3 / 52.3 | 64.3 | 1.36 | 57.3 | 47.9 | - |
103
+ | Qwen2-VL-2B | 55.6 / 60.4 | 63.2 | - | - | - | - |
104
+ | **InternVL2.5-2B** | 51.9 / 54.1 | 68.8 | 1.44 | 61.4 | 52.0 | - |
105
+ | **InternVL2.5-4B** | 62.3 / 63.6 | 71.6 | 1.73 | 68.3 | 55.2 | - |
106
+ | VideoChat2-HD | 45.3 / 55.7 | 62.3 | 1.22 | 47.9 | - | - |
107
+ | MiniCPM-V-2.6 | 60.9 / 63.6 | - | 1.70 | - | 54.9 | - |
108
+ | LLaVA-OneVision-7B | 58.2 / - | 56.7 | - | - | - | - |
109
+ | Qwen2-VL-7B | 63.3 / 69.0 | 67.0 | 1.44 | - | 55.6 | - |
110
+ | **InternVL2.5-8B** | 64.2 / 66.9 | 72.0 | 1.68 | 68.9 | 60.0 | - |
111
+ | **InternVL2.5-26B** | 66.9 / 69.2 | 75.2 | 1.86 | 72.3 | 59.9 | - |
112
+ | Oryx-1.5-32B | 67.3 / 74.9 | 70.1 | 1.52 | 72.3 | - | - |
113
+ | VILA-1.5-40B | 60.1 / 61.1 | - | 1.61 | 56.7 | - | - |
114
+ | **InternVL2.5-38B** | 70.7 / 73.1 | 74.4 | 1.82 | 75.3 | 63.3 | - |
115
+ | GPT-4V/4T | 59.9 / 63.3 | 43.7 | 1.53 | 49.2 | 59.1 | - |
116
+ | GPT-4o-20240513 | 71.9 / 77.2 | - | 1.63 | 64.6 | 66.7 | - |
117
+ | GPT-4o-20240806 | - | - | 1.87 | - | - | - |
118
+ | Gemini-1.5-Pro | 75.0 / 81.3 | - | 1.30 | - | 64.0 | - |
119
+ | VideoLLaMA2-72B | 61.4 / 63.1 | 62.0 | - | - | - | - |
120
+ | LLaVA-OneVision-72B | 66.2 / 69.5 | 59.4 | - | 66.4 | 61.3 | - |
121
+ | Qwen2-VL-72B | 71.2 / 77.8 | 73.6 | 1.70 | - | - | 41.3 / 56.2 |
122
+ | InternVL2-Llama3-76B | 64.7 / 67.8 | 69.6 | 1.71 | 69.9 | 61.1 | - |
123
+ | **InternVL2.5-78B** | 72.1 / 74.0 | 76.4 | 1.97 | 75.7 | 63.6 | 42.2 / 58.5 |
124
+
125
  ### Multimodal Multilingual Understanding
126
  <table style="width: 100%; font-size: 8px; border-collapse: collapse; text-align: center;">
127
  <thead>