czczup commited on
Commit
64252c4
β€’
1 Parent(s): 4037281

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -13,9 +13,9 @@ new_version: OpenGVLab/InternViT-300M-448px-V2_5
13
 
14
  # InternViT-300M-448px
15
 
16
- [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[πŸ“œ Mini-InternVL\]](https://arxiv.org/abs/2410.16261)
17
 
18
- [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start) [\[πŸ“– 中文解读\]](https://zhuanlan.zhihu.com/p/706547971) [\[πŸ“– Documents\]](https://internvl.readthedocs.io/en/latest/)
19
 
20
  <div align="center">
21
  <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
@@ -32,7 +32,10 @@ This update primarily focuses on enhancing the efficiency of the vision foundati
32
  - **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
33
  To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
34
 
35
- ## Model Usage (Image Embeddings)
 
 
 
36
 
37
  ```python
38
  import torch
@@ -55,6 +58,10 @@ pixel_values = pixel_values.to(torch.bfloat16).cuda()
55
  outputs = model(pixel_values)
56
  ```
57
 
 
 
 
 
58
  ## Citation
59
 
60
  If you find this project useful in your research, please consider citing:
 
13
 
14
  # InternViT-300M-448px
15
 
16
+ [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[πŸ“œ InternVL 2.5\]](https://github.com/OpenGVLab/InternVL/blob/main/InternVL2_5_report.pdf)
17
 
18
+ [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start) [\[πŸ“– Documents\]](https://internvl.readthedocs.io/en/latest/)
19
 
20
  <div align="center">
21
  <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
 
32
  - **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
33
  To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
34
 
35
+ ## Quick Start
36
+
37
+ > \[!Warning\]
38
+ > 🚨 Note: In our experience, the InternViT V2.5 series is better suited for building MLLMs than traditional computer vision tasks.
39
 
40
  ```python
41
  import torch
 
58
  outputs = model(pixel_values)
59
  ```
60
 
61
+ ## License
62
+
63
+ This project is released under the MIT License.
64
+
65
  ## Citation
66
 
67
  If you find this project useful in your research, please consider citing: