Visual Question Answering
Transformers
TensorBoard
Safetensors
internvl_chat
feature-extraction
custom_code
czczup commited on
Commit
93a63ac
β€’
1 Parent(s): 1352871

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -18,7 +18,10 @@ pipeline_tag: visual-question-answering
18
 
19
  > _Two interns holding hands, symbolizing the integration of InternViT and InternLM._
20
 
21
- \[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\] \[[CVPR Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)\]
 
 
 
22
 
23
  We introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding.
24
  We introduce three simple designs:
@@ -47,10 +50,10 @@ We introduce three simple designs:
47
 
48
  | Model | Vision Foundation Model | Release Date | Note |
49
  | :----------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------: | :----------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
50
- | InternVL-Chat-V1.5(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)) | InternViT-6B-448px-V1-5(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5)) | 2024.04.18 | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (πŸ”₯new) |
51
- | InternVL-Chat-V1.2-Plus(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) ) | InternViT-6B-448px-V1-2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) | 2024.02.21 | more SFT data and stronger |
52
- | InternVL-Chat-V1.2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) ) | InternViT-6B-448px-V1-2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) | 2024.02.11 | scaling up LLM to 34B |
53
- | InternVL-Chat-V1.1(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)) | InternViT-6B-448px-V1-0(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0)) | 2024.01.24 | support Chinese and stronger OCR |
54
 
55
  ## Architecture
56
 
@@ -73,7 +76,7 @@ We introduce three simple designs:
73
 
74
  ## Model Usage
75
 
76
- We provide an example code to run InternVL-Chat-V1.5 using `transformers`.
77
 
78
  You can also use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.
79
 
 
18
 
19
  > _Two interns holding hands, symbolizing the integration of InternViT and InternLM._
20
 
21
+ [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/)
22
+
23
+ [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#model-usage) [\[🌐 Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[πŸ“– 中文解读\]](https://zhuanlan.zhihu.com/p/675877376)
24
+
25
 
26
  We introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding.
27
  We introduce three simple designs:
 
50
 
51
  | Model | Vision Foundation Model | Release Date | Note |
52
  | :----------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------: | :----------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
53
+ | InternVL-Chat-V1-5(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)) | InternViT-6B-448px-V1-5(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5)) | 2024.04.18 | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (πŸ”₯new) |
54
+ | InternVL-Chat-V1-2-Plus(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) ) | InternViT-6B-448px-V1-2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) | 2024.02.21 | more SFT data and stronger |
55
+ | InternVL-Chat-V1-2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) ) | InternViT-6B-448px-V1-2(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) | 2024.02.11 | scaling up LLM to 34B |
56
+ | InternVL-Chat-V1-1(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)) | InternViT-6B-448px-V1-0(πŸ€— [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0)) | 2024.01.24 | support Chinese and stronger OCR |
57
 
58
  ## Architecture
59
 
 
76
 
77
  ## Model Usage
78
 
79
+ We provide an example code to run InternVL-Chat-V1-5 using `transformers`.
80
 
81
  You can also use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.
82