OpenGVLab
/

InternVL-Chat-V1-1

@@ -10,14 +10,14 @@ datasets:
 pipeline_tag: visual-question-answering
 ---
-# Model Card for InternVL-Chat-V1.1
 <p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/4IG0h_KJ2cvpp9Kdm0Jf7.webp" alt="Image Description" width="300" height="300">
 </p>
 \[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\]  \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
-We released InternVL-Chat-V1.1, featuring a structure similar to LLaVA, including a ViT, an MLP projector, and an LLM. In this version, we explored increasing the resolution to 448x448, enhancing OCR capabilities, and improving support for Chinese conversations.
 ## Model Details
 - **Model Type:** multimodal large language model (MLLM)
@@ -40,26 +40,26 @@ We released InternVL-Chat-V1.1, featuring a structure similar to LLaVA, includin
 ### Vision Foundation model
 | Model                   | Date       | Download                                                               | Note                             |
 | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
-| InternViT-6B-448px-V1.5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
-| InternViT-6B-448px-V1.2 | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2) | 448 resolution                   |
-| InternViT-6B-448px-V1.0 | 2024.01.30 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) | 448 resolution                   |
 | InternViT-6B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px)      | vision foundation model          |
 | InternVL-14B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px)      | vision-language foundation model |
 ### Multimodal Large Language Model (MLLM)
 | Model                   | Date       | Download                                                                    | Note                               |
 | ----------------------- | ---------- | --------------------------------------------------------------------------- | ---------------------------------- |
-| InternVL-Chat-V1.5      | 2024.04.18 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)            | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new)|
-| InternVL-Chat-V1.2-Plus | 2024.02.21 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus)       | more SFT data and stronger  |
-| InternVL-Chat-V1.2      | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2)            | scaling up LLM to 34B       |
-| InternVL-Chat-V1.1      | 2024.01.24 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)            | support Chinese and stronger OCR   |
 ## Model Usage
-We provide an example code to run InternVL-Chat-V1.1 using `transformers`.
 You also can use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.

 pipeline_tag: visual-question-answering
 ---
+# Model Card for InternVL-Chat-V1-1
 <p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/4IG0h_KJ2cvpp9Kdm0Jf7.webp" alt="Image Description" width="300" height="300">
 </p>
 \[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\]  \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
+We released InternVL-Chat-V1-1, featuring a structure similar to LLaVA, including a ViT, an MLP projector, and an LLM. In this version, we explored increasing the resolution to 448x448, enhancing OCR capabilities, and improving support for Chinese conversations.
 ## Model Details
 - **Model Type:** multimodal large language model (MLLM)
 ### Vision Foundation model
 | Model                   | Date       | Download                                                               | Note                             |
 | ----------------------- | ---------- | ---------------------------------------------------------------------- | -------------------------------- |
+| InternViT-6B-448px-V1-5 | 2024.04.20 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5) | support dynamic resolution, super strong OCR (🔥new) |
+| InternViT-6B-448px-V1-2 | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2) | 448 resolution                   |
+| InternViT-6B-448px-V1-0 | 2024.01.30 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0) | 448 resolution                   |
 | InternViT-6B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-224px)      | vision foundation model          |
 | InternVL-14B-224px      | 2023.12.22 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-14B-224px)      | vision-language foundation model |
 ### Multimodal Large Language Model (MLLM)
 | Model                   | Date       | Download                                                                    | Note                               |
 | ----------------------- | ---------- | --------------------------------------------------------------------------- | ---------------------------------- |
+| InternVL-Chat-V1-5      | 2024.04.18 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)            | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new)|
+| InternVL-Chat-V1-2-Plus | 2024.02.21 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus)       | more SFT data and stronger  |
+| InternVL-Chat-V1-2      | 2024.02.11 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2)            | scaling up LLM to 34B       |
+| InternVL-Chat-V1-1      | 2024.01.24 | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)            | support Chinese and stronger OCR   |
 ## Model Usage
+We provide an example code to run InternVL-Chat-V1-1 using `transformers`.
 You also can use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.