Model,Language Model,Open Source,Regular Text,Irregular Text,Artistic Text,Handwriting,Digit string,Non-semantic Text,ALL,Link Qwen-VL-Max,-,No,49,50,49,27,36,43,254,https://github.com/QwenLM/Qwen-VL Qwen-VL-Plus,-,No,49,49,48,36,23,43,248,https://github.com/QwenLM/Qwen-VL BlueLM-VL,-,No,46,39,42,20,27,35,209,- Gemini,-,No,47,35,45,31,25,32,215,https://deepmind.google/technologies/gemini/ GPT4V,-,No,39,37,41,11,1,38,167,https://openai.com/ mPLUG-DocOwl1.5,LLaMA-2 7B, Yes,45,39,39,22,15,22,182,https://arxiv.org/abs/2403.12895 TextMonkey,Qwen-7B,Yes,45,35,39,15,9,26,169,https://export.arxiv.org/abs/2403.04473 InternVL-Chat-Chinese,LLaMA2-13B,Yes,49,46,46,28,27,32,228,https://arxiv.org/abs/2312.14238 Monkey,Qwen-7B,Yes,44,37,40,14,11,28,174,https://arxiv.org/abs/2311.06607 InternLM-XComposer2,InternLM2-7B,Yes,45,37,37,12,7,22,160,https://arxiv.org/abs/2401.16420 QwenVL,Qwen-7B,Yes,46,39,42,14,10,28,179,https://arxiv.org/abs/2308.12966 mPLUG-Owl2,LLaMA2-7B,Yes,43,37,40,12,4,17,153,https://arxiv.org/abs/2311.04257 LLaVAR,LLaMA-13B.,Yes,48,42,43,28,12,13,186,https://arxiv.org/abs/2306.17107 LLaVA1.5-13B,Vicuna-v1.5-13B,Yes,48,44,43,30,7,4,176,https://arxiv.org/abs/2310.03744 InternLM-XComposer,InternLM-7B,Yes,49,44,46,23,13,17,192,https://arxiv.org/abs/2309.15112 LLaVA1.5-7B,Vicuna-v1.5-7B,Yes,43,40,41,26,5,5,160,https://arxiv.org/abs/2310.03744 mPLUG-Owl,LLaMA-2 7B,Yes,44,42,44,13,9,20,172,https://arxiv.org/abs/2304.14178 BLIVA,Vicuna-7B,Yes,48,42,42,24,5,4,165,https://arxiv.org/abs/2308.09936 InstructBLIP,Vicuna-7b,Yes,46,43,44,19,8,8,168,https://arxiv.org/abs/2305.06500 BLIP2-6.7B,OPT-6.7B,Yes,47,41,44,15,1,6,154,https://arxiv.org/abs/2301.12597 MiniGPT4V2,LLaMA2-13B,Yes,35,37,36,13,1,2,124,https://arxiv.org/abs/2310.09478