czczup commited on
Commit
da8e588
Β·
verified Β·
1 Parent(s): cdbc309

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -8,14 +8,19 @@ datasets:
8
  - conceptual_captions
9
  - wanng/wukong100m
10
  pipeline_tag: image-feature-extraction
 
11
  ---
12
 
13
  # InternViT-300M-448px
14
 
15
- [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[πŸ“œ Mini-InternVL Report\]](https://arxiv.org/abs/2410.16261)
16
 
17
  [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start) [\[πŸ“– 中文解读\]](https://zhuanlan.zhihu.com/p/706547971) [\[πŸ“– Documents\]](https://internvl.readthedocs.io/en/latest/)
18
 
 
 
 
 
19
  This update primarily focuses on enhancing the efficiency of the vision foundation model. We developed InternViT-300M-448px by distilling knowledge from the robust vision foundation model, [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5). Like its predecessor, InternViT-300M-448px features a dynamic input resolution of 448Γ—448, with a basic tile size of 448Γ—448. During training, it allows for 1 to 12 tiles, and expands to 1 to 40 tiles during testing. Additionally, it inherits the powerful robustness, OCR capability, and high-resolution processing capacity from InternViT-6B-448px-V1-5.
20
 
21
  ## Model Details
@@ -72,5 +77,4 @@ If you find this project useful in your research, please consider citing:
72
  journal={arXiv preprint arXiv:2404.16821},
73
  year={2024}
74
  }
75
-
76
  ```
 
8
  - conceptual_captions
9
  - wanng/wukong100m
10
  pipeline_tag: image-feature-extraction
11
+ new_version: OpenGVLab/InternViT-300M-448px-V2_5
12
  ---
13
 
14
  # InternViT-300M-448px
15
 
16
+ [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[πŸ“œ Mini-InternVL\]](https://arxiv.org/abs/2410.16261)
17
 
18
  [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start) [\[πŸ“– 中文解读\]](https://zhuanlan.zhihu.com/p/706547971) [\[πŸ“– Documents\]](https://internvl.readthedocs.io/en/latest/)
19
 
20
+ <div align="center">
21
+ <img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
22
+ </div>
23
+
24
  This update primarily focuses on enhancing the efficiency of the vision foundation model. We developed InternViT-300M-448px by distilling knowledge from the robust vision foundation model, [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5). Like its predecessor, InternViT-300M-448px features a dynamic input resolution of 448Γ—448, with a basic tile size of 448Γ—448. During training, it allows for 1 to 12 tiles, and expands to 1 to 40 tiles during testing. Additionally, it inherits the powerful robustness, OCR capability, and high-resolution processing capacity from InternViT-6B-448px-V1-5.
25
 
26
  ## Model Details
 
77
  journal={arXiv preprint arXiv:2404.16821},
78
  year={2024}
79
  }
 
80
  ```