CogVLM2 Movie Caption LoRA
👋 Wechat · 💡Online Demo · 🎈Github Page · 📑 Paper
📍Experience the larger-scale CogVLM model on the ZhipuAI Open Platform.
Model introduction
We launch a new generation of CogVLM2 series of models and open source two models built with Meta-Llama-3-8B-Instruct. Compared with the previous generation of CogVLM open source models, the CogVLM2 series of open source models have the following improvements:
- Significant improvements in many benchmarks such as
TextVQA
,DocVQA
. - Support 8K content length.
- Support image resolution up to 1344 * 1344.
- Provide an open source model version that supports both Chinese and English.
You can see the details of the CogVLM2 family of open source models in the table below:
Model name | cogvlm2-llama3-chat-19B | cogvlm2-llama3-chinese-chat-19B |
---|---|---|
Base Model | Meta-Llama-3-8B-Instruct | Meta-Llama-3-8B-Instruct |
Language | English | Chinese, English |
Model size | 19B | 19B |
Task | Image understanding, dialogue model | Image understanding, dialogue model |
Text length | 8K | 8K |
Image resolution | 1344 * 1344 | 1344 * 1344 |
Benchmark
Our open source models have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models, as shown in the table below:
Model | Open Source | LLM Size | TextVQA | DocVQA | ChartQA | OCRbench | VCR_EASY | VCR_HARD | MMMU | MMVet | MMBench |
---|---|---|---|---|---|---|---|---|---|---|---|
CogVLM1.1 | ✅ | 7B | 69.7 | - | 68.3 | 590 | 73.9 | 34.6 | 37.3 | 52.0 | 65.8 |
LLaVA-1.5 | ✅ | 13B | 61.3 | - | - | 337 | - | - | 37.0 | 35.4 | 67.7 |
Mini-Gemini | ✅ | 34B | 74.1 | - | - | - | - | - | 48.0 | 59.3 | 80.6 |
LLaVA-NeXT-LLaMA3 | ✅ | 8B | - | 78.2 | 69.5 | - | - | - | 41.7 | - | 72.1 |
LLaVA-NeXT-110B | ✅ | 110B | - | 85.7 | 79.7 | - | - | - | 49.1 | - | 80.5 |
InternVL-1.5 | ✅ | 20B | 80.6 | 90.9 | 83.8 | 720 | 14.7 | 2.0 | 46.8 | 55.4 | 82.3 |
QwenVL-Plus | ❌ | - | 78.9 | 91.4 | 78.1 | 726 | - | - | 51.4 | 55.7 | 67.0 |
Claude3-Opus | ❌ | - | - | 89.3 | 80.8 | 694 | 63.85 | 37.8 | 59.4 | 51.7 | 63.3 |
Gemini Pro 1.5 | ❌ | - | 73.5 | 86.5 | 81.3 | - | 62.73 | 28.1 | 58.5 | - | - |
GPT-4V | ❌ | - | 78.0 | 88.4 | 78.5 | 656 | 52.04 | 25.8 | 56.8 | 67.7 | 75.0 |
CogVLM2-LLaMA3 | ✅ | 8B | 84.2 | 92.3 | 81.0 | 756 | 83.3 | 38.0 | 44.3 | 60.4 | 80.5 |
CogVLM2-LLaMA3-Chinese | ✅ | 8B | 85.0 | 88.4 | 74.7 | 780 | 79.9 | 25.1 | 42.8 | 60.5 | 78.9 |
All reviews were obtained without using any external OCR tools ("pixel only").