czczup commited on
Commit
61d9457
1 Parent(s): 6b0fb3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -1,3 +1,46 @@
1
  ---
2
  license: mit
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ metrics:
4
+ - accuracy
5
+ - mean_iou
6
  ---
7
+
8
+ # Model Card for InternVL
9
+
10
+ This repository contains the PyTorch version of the InternVL model weights.
11
+
12
+ # What is InternVL?
13
+
14
+ \[[Paper](https://arxiv.org/abs/2312.14238)\] \[[GitHub](https://github.com/OpenGVLab/InternVL)\]
15
+
16
+ InternVL scales up the ViT to _**6B parameters**_ and aligns it with LLM.
17
+
18
+ It is _**the largest open-source vision/vision-language foundation model (14B)**_ to date, achieving _**32 state-of-the-art**_ performances on a wide range of tasks such as visual perception, cross-modal retrieval, multimodal dialogue, etc.
19
+
20
+ # Model Zoo
21
+
22
+ ## Pretrained Weights
23
+
24
+ | model name | type | download | size |
25
+ | ----------------------- | ------- | ---------------------------------------------------------------------------------------------- | :-----: |
26
+ | InternViT-6B-224px | pytorch | 🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL/blob/main/intern_vit_6b_224px.pth) | 12 GB |
27
+
28
+ ## Linear-Probe Image Classification
29
+
30
+ | model name | IN-1K | IN-ReaL | IN-V2 | IN-A | IN-R | IN-Sketch | download |
31
+ | ------------------ | :---: | :-----: | :---: | :--: | :--: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
32
+ | InternViT-6B-224px | 88.2 | 90.4 | 79.9 | 77.5 | 89.8 | 69.1 | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth) \| [log](https://github.com/OpenGVLab/InternVL/blob/main/classification/work_dirs/intern_vit_6b_1k_224/log_rank0.txt) |
33
+
34
+
35
+ ## Semantic Segmentation
36
+
37
+ | type | backbone | head | mIoU | config | download |
38
+ | --------------- | --------------------- | :-----: | :--: | :--------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
39
+ | few-shot (1/16) | InternViT-6B | Linear | 46.5 | [config](./configs/intern_vit_6b/few_shot/linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/linear_intern_vit_6b_504_5k_ade20k_bs16_lr4e-5_1of16.log) |
40
+ | few-shot (1/8) | InternViT-6B | Linear | 50.0 | [config](./configs/intern_vit_6b/few_shot/linear_intern_vit_6b_504_10k_ade20k_bs16_lr4e-5_1of8.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/linear_intern_vit_6b_504_10k_ade20k_bs16_lr4e-5_1of8.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/linear_intern_vit_6b_504_10k_ade20k_bs16_lr4e-5_1of8.log) |
41
+ | few-shot (1/4) | InternViT-6B | Linear | 53.3 | [config](./configs/intern_vit_6b/few_shot/linear_intern_vit_6b_504_20k_ade20k_bs16_lr4e-5_1of4.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/linear_intern_vit_6b_504_20k_ade20k_bs16_lr4e-5_1of4.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/linear_intern_vit_6b_504_20k_ade20k_bs16_lr4e-5_1of4.log) |
42
+ | few-shot (1/2) | InternViT-6B | Linear | 55.8 | [config](./configs/intern_vit_6b/few_shot/linear_intern_vit_6b_504_40k_ade20k_bs16_lr4e-5_1of2.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/linear_intern_vit_6b_504_40k_ade20k_bs16_lr4e-5_1of2.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/linear_intern_vit_6b_504_40k_ade20k_bs16_lr4e-5_1of2.log) |
43
+ | few-shot (1/1) | InternViT-6B | Linear | 57.2 | [config](./configs/intern_vit_6b/few_shot/linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_1of1.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_1of1.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_1of1.log) |
44
+ | linear probing | InternViT-6B (frozen) | Linear | 47.2 | [config](./configs/intern_vit_6b/linear_probing/linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/linear_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.log) |
45
+ | head tuning | InternViT-6B (frozen) | UperNet | 54.9 | [config](./configs/intern_vit_6b/head_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5_frozen.log) |
46
+ | full tuning | InternViT-6B | UperNet | 58.9 | [config](./configs/intern_vit_6b/full_tuning/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.py) | [ckpt](https://huggingface.co/OpenGVLab/InternVL/resolve/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.pth) \| [log](https://huggingface.co/OpenGVLab/InternVL/raw/main/upernet_intern_vit_6b_504_80k_ade20k_bs16_lr4e-5.log) |