FantasticGNU commited on
Commit
0f39c5e
1 Parent(s): 115d930

Upload 4 files

Browse files
Files changed (5) hide show
  1. .gitattributes +3 -0
  2. README.md +416 -3
  3. images/AnomalyGPT.png +3 -0
  4. images/compare.png +3 -0
  5. images/logo.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ images/AnomalyGPT.png filter=lfs diff=lfs merge=lfs -text
37
+ images/compare.png filter=lfs diff=lfs merge=lfs -text
38
+ images/logo.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,416 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center" width="100%">
2
+ <img src="./images/logo.png" alt="AnomalyGPT_logo" style="width: 40%; min-width: 300px; display: block; margin: auto;" />
3
+ </p>
4
+
5
+ # AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models
6
+
7
+ ![License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)
8
+
9
+ <p align="left">
10
+ 🌐 <a href="https://anomalygpt.github.io" target="_blank">Project Page</a> • 🤗 <a href="" target="_blank">Online Demo</a> • 📃 <a href="" target="_blank">Paper</a> • 🤖 <a href="https://huggingface.co/FantasticGNU/AnomalyGPT" target="_blank">Model</a> • 📹 <a href="https://www.youtube.com/watch?v=lcxBfy0YnNA" target="_blank">Video</a>
11
+ </p>
12
+
13
+
14
+ Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang
15
+
16
+
17
+
18
+ ****
19
+
20
+ <span id='all_catelogue'/>
21
+
22
+ ## Catalogue:
23
+
24
+ * <a href='#introduction'>1. Introduction</a>
25
+ * <a href='#environment'>2. Running AnomalyGPT Demo</a>
26
+ * <a href='#install_environment'>2.1. Environment Installation</a>
27
+ * <a href='#download_imagebind_model'>2.2. Prepare ImageBind Checkpoint</a>
28
+ * <a href='#download_vicuna_model'>2.3. Prepare Vicuna Checkpoint</a>
29
+ * <a href='#download_anomalygpt'>2.4. Prepare Delta Weights of AnomalyGPT</a>
30
+ * <a href='#running_demo'>2.5. Deploying Demo</a>
31
+ * <a href='#train_anomalygpt'>3. Train Your Own AnomalyGPT</a>
32
+ * <a href='#data_preparation'>3.1. Data Preparation</a>
33
+ * <a href='#training_configurations'>3.2. Training Configurations</a>
34
+ * <a href='#model_training'>3.3. Training AnoamlyGPT</a>
35
+ <!-- * <a href='#results'>4. Results</a> -->
36
+ * <a href='#license'>License</a>
37
+ * <a href='#citation'>Citation</a>
38
+ * <a href='#acknowledgments'>Acknowledgments</a>
39
+
40
+ ****
41
+
42
+ <span id='introduction'/>
43
+
44
+ ### 1. Introduction: <a href='#all_catelogue'>[Back to Top]</a>
45
+
46
+
47
+
48
+ <p align="center" width="100%">
49
+ <img src="./images/compare.png" alt="AnomalyGPT_logo" style="width: 80%; min-width: 400px; display: block; margin: auto;" />
50
+ </p>
51
+
52
+ **AnomalyGPT** is the first Large Vision-Language Model (LVLM) based Industrial Anomaly Detection (IAD) method that can detect anomalies in industrial images without the need for manually specified thresholds. Existing IAD methods can only provide anomaly scores and need manually threshold setting, while existing LVLMs cannot detect anomalies in the image. AnomalyGPT can not only indicate the presence and location of anomaly but also provide information about the image.
53
+
54
+ <img src="./images/AnomalyGPT.png" alt="AnomalyGPT" style="zoom:100%;" />
55
+
56
+ We leverage a pre-trained image encoder and a Large Language Model (LLM) to align IAD images and their corresponding textual descriptions via simulated anomaly data. We employ a lightweight, visual-textual feature-matching-based image decoder to obtain localization result, and design a prompt learner to provide fine-grained semantic to LLM and fine-tune the LVLM using prompt embeddings. Our method can also detect anomalies for previously unseen items with few normal sample provided.
57
+
58
+
59
+ ****
60
+
61
+ <span id='environment'/>
62
+
63
+ ### 2. Running AnomalyGPT Demo <a href='#all_catelogue'>[Back to Top]</a>
64
+
65
+ <span id='install_environment'/>
66
+
67
+ #### 2.1. Environment Installation
68
+
69
+ Clone the repository locally:
70
+
71
+ ```
72
+ git clone https://github.com/CASIA-IVA-Lab/AnomalyGPT.git
73
+ ```
74
+
75
+ Install the required packages:
76
+
77
+ ```
78
+ pip install -r requirements.txt
79
+ ```
80
+
81
+ <span id='download_imagebind_model'/>
82
+
83
+ #### 2.2. Prepare ImageBind Checkpoint:
84
+
85
+ You can download the pre-trained ImageBind model using [this link](https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth). After downloading, put the downloaded file (imagebind_huge.pth) in [[./pretrained_ckpt/imagebind_ckpt/]](./pretrained_ckpt/imagebind_ckpt/) directory.
86
+
87
+ <span id='download_vicuna_model'/>
88
+
89
+ #### 2.3. Prepare Vicuna Checkpoint:
90
+
91
+ To prepare the pre-trained Vicuna model, please follow the instructions provided [[here]](./pretrained_ckpt#1-prepare-vicuna-checkpoint).
92
+
93
+ <span id='download_anomalygpt'/>
94
+
95
+ #### 2.4. Prepare Delta Weights of AnomalyGPT:
96
+
97
+ We use the pre-trained parameters from [PandaGPT](https://github.com/yxuansu/PandaGPT) to initialize our model. You can get the weights of PandaGPT trained with different strategies in the table below. In our experiments and online demo, we use the Vicuna-7B and `openllmplayground/pandagpt_7b_max_len_1024` due to the limitation of computation resource. Better results are expected if switching to Vicuna-13B.
98
+
99
+ | **Base Language Model** | **Maximum Sequence Length** | **Huggingface Delta Weights Address** |
100
+ | :---------------------: | :-------------------------: | :----------------------------------------------------------: |
101
+ | Vicuna-7B (version 0) | 512 | [openllmplayground/pandagpt_7b_max_len_512](https://huggingface.co/openllmplayground/pandagpt_7b_max_len_512) |
102
+ | Vicuna-7B (version 0) | 1024 | [openllmplayground/pandagpt_7b_max_len_1024](https://huggingface.co/openllmplayground/pandagpt_7b_max_len_1024) |
103
+ | Vicuna-13B (version 0) | 256 | [openllmplayground/pandagpt_13b_max_len_256](https://huggingface.co/openllmplayground/pandagpt_13b_max_len_256) |
104
+ | Vicuna-13B (version 0) | 400 | [openllmplayground/pandagpt_13b_max_len_400](https://huggingface.co/openllmplayground/pandagpt_13b_max_len_400) |
105
+
106
+ Please put the downloaded 7B/13B delta weights file (pytorch_model.pt) in the [./pretrained_ckpt/pandagpt_ckpt/7b/](./pretrained_ckpt/pandagpt_ckpt/7b/) or [./pretrained_ckpt/pandagpt_ckpt/13b/](./pretrained_ckpt/pandagpt_ckpt/13b/) directory.
107
+
108
+ After that, you can download AnomalyGPT weights from the table below.
109
+
110
+ | Setup and Datasets | Weights Address |
111
+ | :---------------------------------------------------------: | :-------------------------------: |
112
+ | Unsupervised on MVTec-AD | [AnomalyGPT/train_mvtec](https://huggingface.co/FantasticGNU/AnomalyGPT/blob/main/train_mvtec/pytorch_model.pt) |
113
+ | Unsupervised on VisA | [AnomalyGPT/train_visa](https://huggingface.co/FantasticGNU/AnomalyGPT/blob/main/train_visa/pytorch_model.pt) |
114
+ | Supervised on MVTec-AD, VisA, MVTec-LOCO-AD and CrackForest | [AnomalyGPT/train_supervised](https://huggingface.co/FantasticGNU/AnomalyGPT/blob/main/train_supervised/pytorch_model.pt) |
115
+
116
+ After downloading, put the AnomalyGPT weights in the [./code/ckpt/](./code/ckpt/) directory.
117
+
118
+ In our [online demo](), we use the supervised setting as our default model to attain an enhanced user experience. You can also try other weights locally.
119
+
120
+ <span id='running_demo'/>
121
+
122
+ #### 2.5. Deploying Demo
123
+
124
+ Upon completion of previous steps, you can run the demo locally as
125
+ ```bash
126
+ cd ./code/
127
+ python web_demo.py
128
+ ```
129
+
130
+ ****
131
+
132
+ <span id='train_anomalygpt'/>
133
+
134
+ ### 3. Train Your Own AnomalyGPT <a href='#all_catelogue'>[Back to Top]</a>
135
+
136
+ **Prerequisites:** Before training the model, making sure the environment is properly installed and the checkpoints of ImageBind, Vicuna and PandaGPT are downloaded.
137
+
138
+ <span id='data_preparation'/>
139
+
140
+ #### 3.1. Data Preparation:
141
+
142
+ You can download MVTec-AD dataset from [[this link]](https://www.mvtec.com/company/research/datasets/mvtec-ad/downloads) and VisA from [[this link]](https://github.com/amazon-science/spot-diff). You can also download pre-training data of PandaGPT from [[here]](https://huggingface.co/datasets/openllmplayground/pandagpt_visual_instruction_dataset/tree/main). After downloading, put the data in the [[./data]](./data/) directory.
143
+
144
+ The directory of [[./data]](./data/) should look like:
145
+
146
+ ```
147
+ data
148
+ |---pandagpt4_visual_instruction_data.json
149
+ |---images
150
+ |-----|-- ...
151
+ |---mvtec_anomaly_detection
152
+ |-----|-- bottle
153
+ |-----|-----|----- ground_truth
154
+ |-----|-----|----- test
155
+ |-----|-----|----- train
156
+ |-----|-- capsules
157
+ |-----|-- ...
158
+ |----VisA
159
+ |-----|-- split_csv
160
+ |-----|-----|--- 1cls.csv
161
+ |-----|-----|--- ...
162
+ |-----|-- candle
163
+ |-----|-----|--- Data
164
+ |-----|-----|-----|----- Images
165
+ |-----|-----|-----|--------|------ Anomaly
166
+ |-----|-----|-----|--------|------ Normal
167
+ |-----|-----|-----|----- Masks
168
+ |-----|-----|-----|--------|------ Anomaly
169
+ |-----|-----|--- image_anno.csv
170
+ |-----|-- capsules
171
+ |-----|-----|----- ...
172
+ ```
173
+
174
+
175
+
176
+ <span id='training_configurations'/>
177
+
178
+ #### 3.2 Training Configurations
179
+
180
+ The table below show the training hyperparameters used in our experiments. The hyperparameters are selected based on the constrain of our computational resources, i.e. 2 x RTX3090 GPUs.
181
+
182
+ | **Base Language Model** | **Epoch Number** | **Batch Size** | **Learning Rate** | **Maximum Length** |
183
+ | :---------------------: | :--------------: | :------------: | :---------------: | :----------------: |
184
+ | Vicuna-7B | 50 | 16 | 1e-3 | 1024 |
185
+
186
+
187
+
188
+ <span id='model_training'/>
189
+
190
+ #### 3.3. Training AnomalyGPT
191
+
192
+ To train AnomalyGPT on MVTec-AD dataset, please run the following commands:
193
+ ```yaml
194
+ cd ./code
195
+ bash ./scripts/train_mvtec.sh
196
+ ```
197
+
198
+ The key arguments of the training script are as follows:
199
+ * `--data_path`: The data path for the json file `pandagpt4_visual_instruction_data.json`.
200
+ * `--image_root_path`: The root path for training images of PandaGPT.
201
+ * `--imagebind_ckpt_path`: The path of ImageBind checkpoint.
202
+ * `--vicuna_ckpt_path`: The directory that saves the pre-trained Vicuna checkpoints.
203
+ * `--max_tgt_len`: The maximum sequence length of training instances.
204
+ * `--save_path`: The directory which saves the trained delta weights. This directory will be automatically created.
205
+ * `--log_path`: The directory which saves the log. This directory will be automatically created.
206
+
207
+ Note that the epoch number can be set in the `epochs` argument at [./code/config/openllama_peft.yaml](./code/config/openllama_peft.yaml) file and the learning rate can be set in [./code/dsconfig/openllama_peft_stage_1.json](./code/dsconfig/openllama_peft_stage_1.json)
208
+
209
+ <!-- ****
210
+
211
+ <span id='results'/>
212
+
213
+ ### Results
214
+
215
+ <style>
216
+ th, td {
217
+ text-align: center;
218
+ }
219
+ </style>
220
+
221
+
222
+ <table>
223
+ <tr>
224
+ <th rowspan="2">Setup</td>
225
+ <th rowspan="2">Method</td>
226
+ <th colspan="3">MVTec-AD</th>
227
+ <th colspan="3">VisA</th>
228
+ </tr>
229
+ <tr>
230
+ <td>Image-AUC</td>
231
+ <td>Pixel-AUC</td>
232
+ <td>Accuracy</td>
233
+ <td>Image-AUC</td>
234
+ <td>Pixel-AUC</td>
235
+ <td>Accuracy</td>
236
+ </tr>
237
+ <tr>
238
+ <td rowspan=5>1-shot</td>
239
+ <td>SPADE</td>
240
+ <td>81.0 ± 2.0</td>
241
+ <td>91.2 ± 0.4</td>
242
+ <td>-</td>
243
+ <td>79.5 ± 4.0</td>
244
+ <td>95.6 ± 0.4</td>
245
+ <td>-</td>
246
+ </tr>
247
+ <tr>
248
+ <td>SPADE</td>
249
+ <td>76.6 ± 3.1</td>
250
+ <td>89.3 ± 0.9</td>
251
+ <td>-</td>
252
+ <td>62.8 ± 5.4</td>
253
+ <td>89.9 ± 0.8</td>
254
+ <td>-</td>
255
+ </tr>
256
+ <tr>
257
+ <td>SPADE</td>
258
+ <td>81.0 ± 2.0</td>
259
+ <td>91.2 ± 0.4</td>
260
+ <td>-</td>
261
+ <td>79.5 ± 4.0</td>
262
+ <td>95.6 ± 0.4</td>
263
+ <td>-</td>
264
+ </tr>
265
+ <tr>
266
+ <td>SPADE</td>
267
+ <td>81.0 ± 2.0</td>
268
+ <td>91.2 ± 0.4</td>
269
+ <td>-</td>
270
+ <td>79.5 ± 4.0</td>
271
+ <td>95.6 ± 0.4</td>
272
+ <td>-</td>
273
+ </tr>
274
+ <tr>
275
+ <td>SPADE</td>
276
+ <td>81.0 ± 2.0</td>
277
+ <td>91.2 ± 0.4</td>
278
+ <td>-</td>
279
+ <td>79.5 ± 4.0</td>
280
+ <td>95.6 ± 0.4</td>
281
+ <td>-</td>
282
+ </tr>
283
+ <tr>
284
+ <td rowspan=5>2-shot</td>
285
+ <td>SPADE</td>
286
+ <td>81.0 ± 2.0</td>
287
+ <td>91.2 ± 0.4</td>
288
+ <td>-</td>
289
+ <td>79.5 ± 4.0</td>
290
+ <td>95.6 ± 0.4</td>
291
+ <td>-</td>
292
+ </tr>
293
+ <tr>
294
+ <td>SPADE</td>
295
+ <td>81.0 ± 2.0</td>
296
+ <td>91.2 ± 0.4</td>
297
+ <td>-</td>
298
+ <td>79.5 ± 4.0</td>
299
+ <td>95.6 ± 0.4</td>
300
+ <td>-</td>
301
+ </tr>
302
+ <tr>
303
+ <td>SPADE</td>
304
+ <td>81.0 ± 2.0</td>
305
+ <td>91.2 ± 0.4</td>
306
+ <td>-</td>
307
+ <td>79.5 ± 4.0</td>
308
+ <td>95.6 ± 0.4</td>
309
+ <td>-</td>
310
+ </tr>
311
+ <tr>
312
+ <td>SPADE</td>
313
+ <td>81.0 ± 2.0</td>
314
+ <td>91.2 ± 0.4</td>
315
+ <td>-</td>
316
+ <td>79.5 ± 4.0</td>
317
+ <td>95.6 ± 0.4</td>
318
+ <td>-</td>
319
+ </tr>
320
+ <tr>
321
+ <td>SPADE</td>
322
+ <td>81.0 ± 2.0</td>
323
+ <td>91.2 ± 0.4</td>
324
+ <td>-</td>
325
+ <td>79.5 ± 4.0</td>
326
+ <td>95.6 ± 0.4</td>
327
+ <td>-</td>
328
+ </tr>
329
+ <tr>
330
+ <td rowspan=5>4-shot</td>
331
+ <td>SPADE</td>
332
+ <td>81.0 ± 2.0</td>
333
+ <td>91.2 ± 0.4</td>
334
+ <td>-</td>
335
+ <td>79.5 ± 4.0</td>
336
+ <td>95.6 ± 0.4</td>
337
+ <td>-</td>
338
+ </tr>
339
+ <tr>
340
+ <td>SPADE</td>
341
+ <td>81.0 ± 2.0</td>
342
+ <td>91.2 ± 0.4</td>
343
+ <td>-</td>
344
+ <td>79.5 ± 4.0</td>
345
+ <td>95.6 ± 0.4</td>
346
+ <td>-</td>
347
+ </tr>
348
+ <tr>
349
+ <td>SPADE</td>
350
+ <td>81.0 ± 2.0</td>
351
+ <td>91.2 ± 0.4</td>
352
+ <td>-</td>
353
+ <td>79.5 ± 4.0</td>
354
+ <td>95.6 ± 0.4</td>
355
+ <td>-</td>
356
+ </tr>
357
+ <tr>
358
+ <td>SPADE</td>
359
+ <td>81.0 ± 2.0</td>
360
+ <td>91.2 ± 0.4</td>
361
+ <td>-</td>
362
+ <td>79.5 ± 4.0</td>
363
+ <td>95.6 ± 0.4</td>
364
+ <td>-</td>
365
+ </tr>
366
+ <tr>
367
+ <td>SPADE</td>
368
+ <td>81.0 ± 2.0</td>
369
+ <td>91.2 ± 0.4</td>
370
+ <td>-</td>
371
+ <td>79.5 ± 4.0</td>
372
+ <td>95.6 ± 0.4</td>
373
+ <td>-</td>
374
+ </tr>
375
+ </table> -->
376
+
377
+
378
+
379
+ ****
380
+
381
+ <span id='license'/>
382
+
383
+ ### License
384
+
385
+ AnomalyGPT is licensed under the [Apache 2.0 license](./LICENSE).
386
+
387
+
388
+ ****
389
+
390
+ <span id='citation'/>
391
+
392
+ ### Citation:
393
+
394
+ If you found AnomalyGPT useful in your research or applications, please kindly cite using the following BibTeX:
395
+ ```
396
+ @article{gu2023anomalyagpt,
397
+ title={AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models},
398
+ author={Gu, Zhaopeng and Zhu, Bingke and Zhu, Guibo and Chen, Yingying and Tang, Ming and Wang, Jinqiao},
399
+ journal={arXiv preprint arXiv:},
400
+ year={2023}
401
+ }
402
+ ```
403
+
404
+
405
+ ****
406
+
407
+ <span id='acknowledgments'/>
408
+
409
+ ### Acknowledgments:
410
+
411
+
412
+ This repo benefits from [PandaGPT](https://github.com/yxuansu/PandaGPT), [APRIL-GAN](https://github.com/ByChelsea/VAND-APRIL-GAN), and [WinCLIP](https://arxiv.org/abs/2303.14814). Thanks for their wonderful works!
413
+
414
+
415
+
416
+
images/AnomalyGPT.png ADDED

Git LFS Details

  • SHA256: bc261a8010d32d06c08813b8cff924f72d767cc67b07528b50591556845c50e8
  • Pointer size: 132 Bytes
  • Size of remote file: 6.85 MB
images/compare.png ADDED

Git LFS Details

  • SHA256: 2337ac9cf854ef40bf5c4de02251a2058b785dd6ac2d1eef435a97541c918a29
  • Pointer size: 132 Bytes
  • Size of remote file: 5.55 MB
images/logo.png ADDED

Git LFS Details

  • SHA256: c515057e09d17eb10d9da3976d3c93ada3d444b1bb0bcb13c65d9d5d51997b30
  • Pointer size: 132 Bytes
  • Size of remote file: 1.48 MB