Update README.md
Browse files
README.md
CHANGED
@@ -13,9 +13,9 @@ new_version: OpenGVLab/InternViT-300M-448px-V2_5
|
|
13 |
|
14 |
# InternViT-300M-448px
|
15 |
|
16 |
-
[\[π GitHub\]](https://github.com/OpenGVLab/InternVL) [\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5\]](https://arxiv.org/abs/2404.16821)
|
17 |
|
18 |
-
[\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#quick-start) [\[π
|
19 |
|
20 |
<div align="center">
|
21 |
<img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
|
@@ -32,7 +32,10 @@ This update primarily focuses on enhancing the efficiency of the vision foundati
|
|
32 |
- **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
|
33 |
To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
|
34 |
|
35 |
-
##
|
|
|
|
|
|
|
36 |
|
37 |
```python
|
38 |
import torch
|
@@ -55,6 +58,10 @@ pixel_values = pixel_values.to(torch.bfloat16).cuda()
|
|
55 |
outputs = model(pixel_values)
|
56 |
```
|
57 |
|
|
|
|
|
|
|
|
|
58 |
## Citation
|
59 |
|
60 |
If you find this project useful in your research, please consider citing:
|
|
|
13 |
|
14 |
# InternViT-300M-448px
|
15 |
|
16 |
+
[\[π GitHub\]](https://github.com/OpenGVLab/InternVL) [\[π Blog\]](https://internvl.github.io/blog/) [\[π InternVL 1.0\]](https://arxiv.org/abs/2312.14238) [\[π InternVL 1.5\]](https://arxiv.org/abs/2404.16821) [\[π InternVL 2.5\]](https://github.com/OpenGVLab/InternVL/blob/main/InternVL2_5_report.pdf)
|
17 |
|
18 |
+
[\[π¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[π€ HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[π Quick Start\]](#quick-start) [\[π Documents\]](https://internvl.readthedocs.io/en/latest/)
|
19 |
|
20 |
<div align="center">
|
21 |
<img width="500" alt="image" src="https://cdn-uploads.huggingface.co/production/uploads/64006c09330a45b03605bba3/zJsd2hqd3EevgXo6fNgC-.png">
|
|
|
32 |
- **Pretrain Dataset:** LAION-en, LAION-zh, COYO, GRIT, COCO, TextCaps, Objects365, OpenImages, All-Seeing, Wukong-OCR, LaionCOCO-OCR, and other OCR-related datasets.
|
33 |
To enhance the OCR capability of the model, we have incorporated additional OCR data alongside the general caption datasets. Specifically, we utilized PaddleOCR to perform Chinese OCR on images from Wukong and English OCR on images from LAION-COCO.
|
34 |
|
35 |
+
## Quick Start
|
36 |
+
|
37 |
+
> \[!Warning\]
|
38 |
+
> π¨ Note: In our experience, the InternViT V2.5 series is better suited for building MLLMs than traditional computer vision tasks.
|
39 |
|
40 |
```python
|
41 |
import torch
|
|
|
58 |
outputs = model(pixel_values)
|
59 |
```
|
60 |
|
61 |
+
## License
|
62 |
+
|
63 |
+
This project is released under the MIT License.
|
64 |
+
|
65 |
## Citation
|
66 |
|
67 |
If you find this project useful in your research, please consider citing:
|