OpenGVLab
/

InternViT-300M-448px

Image Feature Extraction

feature-extraction

Model card Files Files and versions Community

czczup commited on May 26, 2024

Commit

a45c933

·

verified ·

1 Parent(s): e34cf8c

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ pipeline_tag: image-feature-extraction
 \[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\]  \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
-We develop InternViT-300M-448px based on the distillation of the strong foundation of [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5). This update primarily focuses on the efficiency of the vision foundation model. As same as InternViT-6B-448px-V1-5, the input resolution of this model is dynamic 448×448, where the basic tile size is 448×448, and the number of tiles ranges from 1 to 12 during training. Additionally, it inherits the powerful robustness, OCR capability, and high-resolution processing capability from InternViT-6B-448px-V1-5.
 ## Model Details
 - **Model Type:** vision foundation model, feature backbone

 \[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\]  \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
+We developed InternViT-300M-448px by leveraging knowledge distillation from the strong vision foundation model, [InternViT-6B-448px-V1-5](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5). This update primarily focuses on the efficiency of the vision foundation model. Like its predecessor, the input resolution of this model is dynamic 448×448, where the basic tile size is 448×448, and the number of tiles ranges from 1 to 12 during training. Additionally, it inherits the powerful robustness, OCR capability, and high-resolution processing capability from InternViT-6B-448px-V1-5.
 ## Model Details
 - **Model Type:** vision foundation model, feature backbone