Update README.md
Browse files
README.md
CHANGED
@@ -29,8 +29,7 @@ LLaVA-InternLM-7B | 69.0 | 68.5 | 66.7 | 63.8 | 37.3
|
|
29 |
LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5
|
30 |
llava-dinov2-internlm2-7b-v1 | 64.0 | 65.2 | 62.9 | 61.6 | 45.3
|
31 |
|
32 |
-
##
|
33 |
-
### Installation
|
34 |
```
|
35 |
git clone https://github.com/InternLM/xtuner
|
36 |
pip install -e ./xtuner[deepspeed]
|
@@ -46,6 +45,7 @@ xtuner chat internlm/internlm2-chat-7b \
|
|
46 |
--prompt-template internlm2_chat \
|
47 |
--image $IMAGE_PATH
|
48 |
```
|
|
|
49 |
## Common Errors
|
50 |
1.
|
51 |
```
|
@@ -80,7 +80,7 @@ pip install protobuf
|
|
80 |
|
81 |
## Data prepration
|
82 |
|
83 |
-
|
84 |
|
85 |
```
|
86 |
./data/llava_data
|
@@ -104,7 +104,7 @@ pip install protobuf
|
|
104 |
└── VG_100K_2
|
105 |
```
|
106 |
|
107 |
-
|
108 |
|
109 |
LLaVA-Pretrain
|
110 |
|
@@ -114,11 +114,11 @@ git lfs install
|
|
114 |
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
|
115 |
```
|
116 |
|
117 |
-
|
118 |
|
119 |
-
1
|
120 |
|
121 |
-
|
122 |
|
123 |
```shell
|
124 |
# Make sure you have git-lfs installed (https://git-lfs.com)
|
@@ -126,15 +126,15 @@ git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
|
|
126 |
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
|
127 |
```
|
128 |
|
129 |
-
2
|
130 |
|
131 |
-
1
|
132 |
|
133 |
-
2. GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
|
134 |
|
135 |
-
3. OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)
|
136 |
|
137 |
-
|
138 |
|
139 |
```shell
|
140 |
#!/bin/bash
|
@@ -149,12 +149,11 @@ git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
|
|
149 |
done
|
150 |
```
|
151 |
|
152 |
-
4
|
153 |
|
154 |
-
5
|
155 |
|
156 |
-
## Cheers!
|
157 |
-
|
158 |
1. Alignment module pretraining
|
159 |
```
|
160 |
NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
|
|
|
29 |
LLaVA-InternLM2-7B | 73.3 | 74.6 | 71.7 | 72.0 | 42.5
|
30 |
llava-dinov2-internlm2-7b-v1 | 64.0 | 65.2 | 62.9 | 61.6 | 45.3
|
31 |
|
32 |
+
## Installation
|
|
|
33 |
```
|
34 |
git clone https://github.com/InternLM/xtuner
|
35 |
pip install -e ./xtuner[deepspeed]
|
|
|
45 |
--prompt-template internlm2_chat \
|
46 |
--image $IMAGE_PATH
|
47 |
```
|
48 |
+
|
49 |
## Common Errors
|
50 |
1.
|
51 |
```
|
|
|
80 |
|
81 |
## Data prepration
|
82 |
|
83 |
+
1. File structure
|
84 |
|
85 |
```
|
86 |
./data/llava_data
|
|
|
104 |
└── VG_100K_2
|
105 |
```
|
106 |
|
107 |
+
2. Pretrain Data
|
108 |
|
109 |
LLaVA-Pretrain
|
110 |
|
|
|
114 |
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain --depth=1
|
115 |
```
|
116 |
|
117 |
+
3. Finetune Data
|
118 |
|
119 |
+
3.1 Text data
|
120 |
|
121 |
+
LLaVA-Instruct-150K
|
122 |
|
123 |
```shell
|
124 |
# Make sure you have git-lfs installed (https://git-lfs.com)
|
|
|
126 |
git clone https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K --depth=1
|
127 |
```
|
128 |
|
129 |
+
3.2 Image data
|
130 |
|
131 |
+
3.2.1 COCO (coco): [train2017](http://images.cocodataset.org/zips/train2017.zip)
|
132 |
|
133 |
+
3.2.2 GQA (gqa): [images](https://downloads.cs.stanford.edu/nlp/data/gqa/images.zip)
|
134 |
|
135 |
+
3.2.3 OCR-VQA (ocr_vqa): [download script](https://drive.google.com/drive/folders/1_GYPY5UkUy7HIcR0zq3ZCFgeZN7BAfm_?usp=sharing)
|
136 |
|
137 |
+
⚠️⚠️⚠️ Modify the name of OCR-VQA's images to keep the extension as `.jpg`!
|
138 |
|
139 |
```shell
|
140 |
#!/bin/bash
|
|
|
149 |
done
|
150 |
```
|
151 |
|
152 |
+
3.2.4 TextVQA (textvqa): [train_val_images](https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip)
|
153 |
|
154 |
+
3.2.5 VisualGenome (VG): [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip), [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip)
|
155 |
|
156 |
+
## Cheers! Now train your own model!
|
|
|
157 |
1. Alignment module pretraining
|
158 |
```
|
159 |
NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
|