BUAADreamer commited on
Commit
2b8204f
·
verified ·
1 Parent(s): 926e2dc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +531 -1
README.md CHANGED
@@ -12,4 +12,534 @@ tags:
12
  - composed image retrieval
13
  - image retrieval
14
  - acmmm2024
15
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - composed image retrieval
13
  - image retrieval
14
  - acmmm2024
15
+ ---
16
+
17
+ # SPN4CIR: Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives (ACM MM 2024)
18
+
19
+ [![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://github.com/BUAADreamer/CCRK/blob/main/licence)
20
+ [![arxiv badge](https://img.shields.io/badge/arxiv-2404.11317-red)](https://arxiv.org/abs/2404.11317)
21
+ [![Pytorch](https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?e&logo=PyTorch&logoColor=white)](https://pytorch.org/)
22
+ [![GitHub Repo stars](https://img.shields.io/github/stars/BUAADreamer/SPN4CIR?style=social)](https://github.com/BUAADreamer/SPN4CIR/stargazers)
23
+ [![HF Model](https://img.shields.io/badge/🤗-Checkpoints%20and%20Data%20in%20HF-blue)](https://huggingface.co/BUAADreamer/SPN4CIR)
24
+
25
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-composed-image-retrieval-via/image-retrieval-on-fashion-iq)](https://paperswithcode.com/sota/image-retrieval-on-fashion-iq?p=improving-composed-image-retrieval-via)
26
+
27
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-composed-image-retrieval-via/image-retrieval-on-cirr)](https://paperswithcode.com/sota/image-retrieval-on-cirr?p=improving-composed-image-retrieval-via)
28
+
29
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-composed-image-retrieval-via/zero-shot-composed-image-retrieval-zs-cir-on-2)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-2?p=improving-composed-image-retrieval-via)
30
+
31
+ [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-composed-image-retrieval-via/zero-shot-composed-image-retrieval-zs-cir-on-1)](https://paperswithcode.com/sota/zero-shot-composed-image-retrieval-zs-cir-on-1?p=improving-composed-image-retrieval-via)
32
+
33
+
34
+ ## Table of Contents
35
+
36
+ - [Overview](#Overview)
37
+ - [Requirements](#Requirements)
38
+ - [Checkpoints](#Checkpoints)
39
+ - [Pre-Process](#Pre-Process)
40
+ - [CLIP4CIR](#CLIP4CIR)
41
+ - [TGCIR](#TGCIR)
42
+ - [BLIP4CIR](#BLIP4CIR)
43
+ - [BLIP24CIR](#BLIP24CIR)
44
+ - [ZSCIR](#ZSCIR)
45
+ - [Citation](#Citation)
46
+ - [Acknowledgement](#Acknowledgement)
47
+
48
+ ## Overview
49
+
50
+ > The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Furthermore, existing methods commonly use in-batch negative sampling, which reduces the negative number available for the model. To address the problem of lack of positives, we propose a data generation method by leveraging a multi-modal large language model to construct triplets for CIR. To introduce more negatives during fine-tuning, we design a two-stage fine-tuning framework for CIR, whose second stage introduces plenty of static representations of negatives to optimize the representation space rapidly. The above two improvements can be effectively stacked and designed to be plug-and-play, easily applied to existing CIR models without changing their original architectures. Extensive experiments and ablation analysis demonstrate that our method effectively scales positives and negatives and achieves state-of-the-art results on both FashionIQ and CIRR datasets. In addition, our methods also perform well in zero-shot composed image retrieval, providing a new CIR solution for the low-resources scenario.
51
+
52
+ <div align="center">
53
+ <img src="pics/overview.png" width="95%" height="auto" />
54
+ <img src="pics/result.png" width="95%" height="auto" />
55
+ </div>
56
+
57
+ ## Checkpoints
58
+
59
+ You can find all **checkpoints and data** at this huggingface repo: [https://huggingface.co/BUAADreamer/SPN4CIR](https://huggingface.co/BUAADreamer/SPN4CIR)
60
+
61
+ The checkpoints of the first stage model are taken from the repo of the original paper.
62
+
63
+ ## Requirements
64
+
65
+ - Prepare python 3.8.13 cuda 12.2 environment
66
+ - Install python3 environment
67
+
68
+ ```shell
69
+ pip3 install -r requirements.txt
70
+ ```
71
+
72
+ - Download FashionIQ and CIRR datasets from corresponding websites and merge them with data we provided like the
73
+ following structure:
74
+
75
+ ```shell
76
+ project_base_path
77
+ └─── tgcir
78
+ | train.py
79
+ | ...
80
+
81
+ └─── clip4cir
82
+ | train.py
83
+ | ...
84
+
85
+ └─── blip4cir
86
+ | train.py
87
+ | ...
88
+
89
+ └─── blip24cir
90
+ | train.py
91
+ | ...
92
+
93
+ └─── zscir
94
+ | ...
95
+
96
+ └─── data # ckpts of the first stage
97
+ └─── tgcir
98
+ └─── clip4cir
99
+ └─── blip4cir
100
+ └─── blip24cir
101
+
102
+ └─── mm_data # generated caption data
103
+ └─── fiq
104
+ └─── cirr
105
+ └─── zs
106
+
107
+ └─── checkpoints # ckpts of the second stage
108
+ └─── fiq_clip
109
+ └─── cirr_clip
110
+ └─── fiq_blip
111
+ └─── cirr_blip
112
+ └─── fiq_blip2
113
+ └─── cirr_blip2
114
+ └─── fiq_tgcir
115
+ └─── cirr_tgcir
116
+
117
+ └─── fashionIQ_dataset
118
+ └─── captions
119
+ | cap.dress.test.json
120
+ | cap.dress.train.json
121
+ | cap.dress.val.json
122
+ | cap.extend_*.train.json
123
+ | ...
124
+
125
+ └─── images
126
+ | B00006M009.jpg
127
+ | ...
128
+
129
+ └─── image_splits
130
+ | split.dress.test.json
131
+ | split.dress.train.json
132
+ | split.dress.val.json
133
+ | ...
134
+
135
+ | optimized_images.json
136
+
137
+ └─── cirr_dataset
138
+ └─── train
139
+ └─── 0
140
+ | train-10108-0-img0.png
141
+ | ...
142
+ ...
143
+
144
+ └─── dev
145
+ | dev-0-0-img0.png
146
+ | ...
147
+
148
+ └─── test1
149
+ | test1-0-0-img0.png
150
+ | ...
151
+
152
+ └─── cirr
153
+ └─── captions
154
+ | cap.rc2.test1.json
155
+ | cap.rc2.train.json
156
+ | cap.rc2.val.json
157
+ | cap.rc2.train.extend_*.json
158
+
159
+ └─── image_splits
160
+ | split.rc2.test1.json
161
+ | split.rc2.train.json
162
+ | split.rc2.val.json
163
+
164
+ | optimized_images.json
165
+ ```
166
+
167
+ ## Pre-Process
168
+
169
+ You can use the data we provide or reproduce these pre-process data by code below.
170
+
171
+ ### 0.Image De-Duplicate
172
+
173
+ For FashionIQ and CIRR, images should be de-duplicated first.
174
+
175
+ ```shell
176
+ #FashionIQ stage 2
177
+ python3 zscir/deduplicate_images.py --dataset fiq --dataset fashionIQ_dataset
178
+
179
+ #CIRR stage 2
180
+ python3 zscir/deduplicate_images.py --dataset cirr --dataset cirr_dataset
181
+ ```
182
+
183
+ ### 1.Caption Generation
184
+
185
+ ```shell
186
+ #FashionIQ
187
+ python3 zscir/captioner_llava.py --cir_data fiq --k 5
188
+
189
+ #CIRR
190
+ python3 zscir/captioner_llava.py --cir_data cirr --k 10
191
+
192
+ # out-of-domain
193
+ python3 zscir/captioner_llava.py --cir_data cc --cc_id 0
194
+ python3 zscir/captioner_llava.py --cir_data cc --cc_id 32
195
+ python3 zscir/captioner_llava.py --cir_data cc --cc_id 64
196
+ python3 zscir/captioner_llava.py --cir_data cc --cc_id 96
197
+ python3 zscir/captioner_llava.py --cir_data cc --cc_id 128
198
+ python3 zscir/captioner_llava.py --cir_data cc --cc_id 160
199
+ python3 zscir/captioner_llava.py --cir_data cc --cc_id 192
200
+ ```
201
+
202
+ ### 2.Image Pair Match
203
+
204
+ ```shell
205
+ #FashionIQ
206
+ python3 zscir/srm_utils.py --dataset fiq --data_path fashionIQ_dataset
207
+
208
+ #CIRR
209
+ python3 zscir/srm_utils.py --dataset cirr --data_path cirr_dataset
210
+ ```
211
+
212
+ ### 3.Modified Text Generation
213
+
214
+ ```shell
215
+ # tgcir
216
+ python3 zscir/get_cir_data.py --model tgcir --data fiq --refer --i2i_rank 10000 --i2i_rank_max 20000 --p_list 2
217
+ python3 zscir/get_cir_data.py --model tgcir --data cirr --i2i_rank 10000 --i2i_rank_max 15000
218
+
219
+ # clip4cir
220
+ python3 zscir/get_cir_data.py --model clip --data fiq --refer --i2i_rank 10000 --i2i_rank_max 20000 --p_list 2 --word_num 4
221
+ python3 zscir/get_cir_data.py --model clip --data cirr --i2i_rank 10000 --i2i_rank_max 15000 --word_num 8
222
+
223
+ # blip4cir
224
+ python3 zscir/get_cir_data.py --model blip --data fiq --refer --K 3000 --p_list 2
225
+ python3 zscir/get_cir_data.py --model blip --data cirr
226
+
227
+ # blip24cir
228
+ python3 zscir/get_cir_data.py --model blip2 --data fiq --K 6000 --refer --p_list 2
229
+ python3 zscir/get_cir_data.py --model blip2 --data cirr --refer
230
+
231
+ # zs
232
+ # In-Domain
233
+ python3 zscir/get_cir_data.py --model zs --data fiq --p_list 2 --word_num 5
234
+ python3 zscir/get_cir_data.py --model zs --data cirr
235
+ # Our-of-Domain
236
+ python3 zscir/get_cir_data.py --model zs --data ccfiq --p_list 2 --word_num 10
237
+ python3 zscir/get_cir_data.py --model zs --data cccirr
238
+ ```
239
+
240
+ ## CLIP4CIR
241
+
242
+ We train our model on one Tesla V100 32G with following commands.
243
+
244
+ You can use the pre-computed data we provide or reproduce these pre-process data by code below.
245
+
246
+ ### Train
247
+
248
+ Train the second stage from the first stage model.
249
+
250
+ ```shell
251
+ #FashionIQ
252
+ python3 clip4cir/train.py --dataset fiq --batch-size 256 --num-epochs 3 \
253
+ --output_path checkpoints/fiq_clip \
254
+ --bank_path checkpoints/fiq_clip/fiq_bank.pth \
255
+ --learning-rate 2e-5 --tau 0.02 \
256
+ --model_path data/clip4cir/fiq_stage1.pt --plus
257
+
258
+ #CIRR
259
+ python3 clip4cir/train.py --dataset cirr --batch-size 256 --num-epochs 3 \
260
+ --output_path checkpoints/cirr_clip \
261
+ --bank_path checkpoints/cirr_clip/cirr_bank.pth \
262
+ --learning-rate 2e-5 --tau 0.02 \
263
+ --model_path data/clip4cir/cirr_stage1.pt --plus
264
+ ```
265
+
266
+ ### Validation on FashionIQ and CIRR
267
+
268
+ ```shell
269
+ #FashionIQ
270
+ python3 clip4cir/validate.py --dataset fiq --data_path fashionIQ_dataset \
271
+ --model_path checkpoints/fiq_clip/best.pt
272
+
273
+ #CIRR
274
+ python3 clip4cir/validate.py --dataset cirr --data_path cirr_dataset \
275
+ --model_path checkpoints/cirr_clip/best.pt
276
+ ```
277
+
278
+ ### Test On CIRR
279
+
280
+ ```shell
281
+ # Generate 2 json files at submission/clip4cir/
282
+ # Then submit them to the test website: https://cirr.cecs.anu.edu.au/test_process
283
+ python3 clip4cir/cirr_test_submission.py --model_path checkpoints/cirr_clip/best.pt \
284
+ --submission-name clip4cir --data_path cirr_dataset
285
+ ```
286
+
287
+ ## TGCIR
288
+
289
+ We train our model on one Tesla V100 32G with following commands.
290
+
291
+ You can use the pre-computed data we provide or reproduce these pre-process data by code below.
292
+
293
+ ### Train
294
+
295
+ Train the second stage from the first stage model.
296
+
297
+ ```shell
298
+ #FashionIQ
299
+ python3 tgcir/train.py --dataset fiq --batch-size 256 --num-epochs 5 \
300
+ --output_path checkpoints/fiq_tg \
301
+ --bank_path checkpoints/fiq_tg/fiq_bank.pth \
302
+ --learning-rate 2e-5 --tau 0.02 \
303
+ --model_path data/tgcir/fiq_stage1.pt --plus
304
+
305
+ #CIRR
306
+ python3 tgcir/train.py --dataset cirr --batch-size 256 --num-epochs 5 \
307
+ --output_path checkpoints/cirr_tg \
308
+ --bank_path checkpoints/cirr_tg/cirr_bank.pth \
309
+ --learning-rate 2e-5 --tau 0.01 \
310
+ --model_path data/tgcir/cirr_stage1.pt --plus
311
+ ```
312
+
313
+ ### Validation on FashionIQ and CIRR
314
+
315
+ ```shell
316
+ #FashionIQ
317
+ python3 tgcir/validate.py --dataset fiq --data_path fashionIQ_dataset \
318
+ --model_path checkpoints/fiq_tg/best.pt
319
+
320
+ #CIRR
321
+ python3 tgcir/validate.py --dataset cirr --data_path cirr_dataset \
322
+ --model_path checkpoints/cirr_tg/best.pt
323
+ ```
324
+
325
+ ### Test On CIRR
326
+
327
+ ```shell
328
+ # Generate 2 json files at submission/tgcir/
329
+ # Then submit them to the test website: https://cirr.cecs.anu.edu.au/test_process
330
+ python3 tgcir/cirr_test_submission.py --model_path checkpoints/cirr_tg/best.pt \
331
+ --submission-name tgcir --data_path cirr_dataset
332
+ ```
333
+
334
+ ## BLIP4CIR
335
+
336
+ We train our model on one Tesla V100 32G with following commands.
337
+
338
+ You can use the pre-computed data we provide or reproduce these pre-process data by code below.
339
+
340
+ ### Train
341
+
342
+ Train the second stage from the first stage model.
343
+
344
+ ```shell
345
+ #FashionIQ
346
+ python3 blip4cir/train.py --dataset fiq --batch-size 128 --num-epochs 10 \
347
+ --output_path checkpoints/fiq_blip \
348
+ --bank_path checkpoints/fiq_blip/fiq_bank.pth \
349
+ --learning-rate 5e-6 --tau 0.03 \
350
+ --model_path data/blip4cir/fiq_stage1.pt --plus
351
+
352
+ #CIRR
353
+ python3 blip4cir/train.py --dataset cirr --batch-size 128 --num-epochs 3 \
354
+ --output_path checkpoints/cirr_blip \
355
+ --bank_path checkpoints/cirr_blip/cirr_bank.pth \
356
+ --learning-rate 6e-6 --tau 0.02 \
357
+ --model_path data/blip4cir/cirr_stage1.pt --plus
358
+ ```
359
+
360
+ ### Validation on FashionIQ and CIRR
361
+
362
+ ```shell
363
+ #FashionIQ
364
+ python3 blip4cir/validate.py --dataset fiq --data_path fashionIQ_dataset \
365
+ --model_path checkpoints/fiq_blip/best.pt
366
+
367
+ #CIRR
368
+ python3 blip4cir/validate.py --dataset cirr --data_path cirr_dataset \
369
+ --model_path checkpoints/cirr_blip/best.pt
370
+ ```
371
+
372
+ ### Test On CIRR
373
+
374
+ ```shell
375
+ # Generate 2 json files at submission/blip4cir/
376
+ # Then submit them to the test website: https://cirr.cecs.anu.edu.au/test_process
377
+ python3 blip4cir/cirr_test_submission.py --model_path checkpoints/cirr_blip/best.pt \
378
+ --submission-name blip4cir --data_path cirr_dataset
379
+ ```
380
+
381
+ ## BLIP24CIR
382
+
383
+ We train our model on one Tesla V100 32G with following commands.
384
+
385
+ You can use the pre-computed data we provide or reproduce these pre-process data by code below.
386
+
387
+ ### Train
388
+
389
+ Train the second stage from the first stage model.
390
+
391
+ ```shell
392
+ #FashionIQ
393
+ python3 blip24cir/train.py --dataset fiq --batch-size 32 --num-epochs 3 \
394
+ --output_path checkpoints/fiq_blip2 \
395
+ --bank_path checkpoints/fiq_blip2/fiq_bank.pth \
396
+ --learning-rate 1e-5 --tau 0.05 \
397
+ --model_path data/blip24cir/fiq_stage1.pt --plus
398
+
399
+ #CIRR
400
+ python3 blip24cir/train.py --dataset cirr --batch-size 32 --num-epochs 3 \
401
+ --output_path checkpoints/cirr_blip2 \
402
+ --bank_path checkpoints/cirr_blip2/cirr_bank.pth \
403
+ --learning-rate 1e-5 --tau 0.05 \
404
+ --model_path data/blip24cir/cirr_stage1.pt --plus
405
+ ```
406
+
407
+ ### Validation on FashionIQ and CIRR
408
+
409
+ ```shell
410
+ #FashionIQ
411
+ python3 blip24cir/validate.py --dataset fiq --data_path fashionIQ_dataset \
412
+ --model_path checkpoints/fiq_blip2/best.pt
413
+
414
+ #CIRR
415
+ python3 blip24cir/validate.py --dataset cirr --data_path cirr_dataset \
416
+ --model_path checkpoints/cirr_blip2/best.pt
417
+ ```
418
+
419
+ ### Test On CIRR
420
+
421
+ ```shell
422
+ # Generate 2 json files at submission/blip24cir/
423
+ # Then submit them to the test website: https://cirr.cecs.anu.edu.au/test_process
424
+ python3 blip24cir/cirr_test_submission.py --model_path checkpoints/cirr_blip2/best.pt \
425
+ --submission-name blip24cir --data_path cirr_dataset
426
+ ```
427
+
428
+ ## ZSCIR
429
+
430
+ We train our model on one Tesla V100 32G with following commands.
431
+
432
+ You can use the pre-computed data we provide or reproduce these pre-process data by code below.
433
+
434
+ ### Train
435
+
436
+ Train the model using generated data.
437
+
438
+ ```shell
439
+ # Out-Of-Domain
440
+
441
+ #FashionIQ
442
+ #base
443
+ python3 zscir/train.py --dataset fiq --batch-size 48 --num-epochs 10 \
444
+ --output_path checkpoints/fiq_zs_cc_base \
445
+ --learning-rate 2e-6 --tau 0.01 \
446
+ --use_cc
447
+
448
+ #bank
449
+ python3 zscir/train_bank.py --dataset fiq --batch-size 128 --num-epochs 5 \
450
+ --output_path checkpoints/fiq_zs_cc \
451
+ --learning-rate 2e-6 --tau 0.02 \
452
+ --bank_path checkpoints/fiq_zs_cc/fiq_bank.pth \
453
+ --use_cc --model_path checkpoints/fiq_zs_cc_base/best.pt
454
+
455
+ #CIRR
456
+ #base
457
+ python3 zscir/train.py --dataset cirr --batch-size 48 --num-epochs 10 \
458
+ --output_path checkpoints/cirr_zs_cc_base \
459
+ --learning-rate 2e-6 --tau 0.01 \
460
+ --use_cc
461
+
462
+ #bank
463
+ python3 zscir/train_bank.py --dataset cirr --batch-size 128 --num-epochs 5 \
464
+ --output_path checkpoints/cirr_zs_cc \
465
+ --learning-rate 2e-6 --tau 0.02 \
466
+ --bank_path checkpoints/cirr_zs_cc/cirr_bank.pth \
467
+ --use_cc --model_path checkpoints/cirr_zs_cc_base/best.pt
468
+
469
+ # In-Domain
470
+ #FashionIQ
471
+ #base
472
+ python3 zscir/train.py --dataset fiq --batch-size 48 --num-epochs 10 \
473
+ --output_path checkpoints/fiq_zs_base \
474
+ --learning-rate 2e-6 --tau 0.01
475
+
476
+ #bank
477
+ python3 zscir/train_bank.py --dataset fiq --batch-size 128 --num-epochs 5 \
478
+ --output_path checkpoints/fiq_zs \
479
+ --learning-rate 2e-6 --tau 0.02 \
480
+ --bank_path checkpoints/fiq_zs/fiq_bank.pth \
481
+ --model_path checkpoints/fiq_zs_base/best.pt
482
+
483
+ #CIRR
484
+ #base
485
+ python3 zscir/train.py --dataset cirr --batch-size 48 --num-epochs 10 \
486
+ --output_path checkpoints/cirr_zs_base \
487
+ --learning-rate 2e-6 --tau 0.01
488
+
489
+ #bank
490
+ python3 zscir/train_bank.py --dataset cirr --batch-size 128 --num-epochs 5 \
491
+ --output_path checkpoints/cirr_zs \
492
+ --learning-rate 2e-6 --tau 0.02 \
493
+ --bank_path checkpoints/cirr_zs/cirr_bank_2.pth \
494
+ --model_path checkpoints/cirr_zs_base/best.pt
495
+ ```
496
+
497
+ ### Validation on FashionIQ and CIRR
498
+
499
+ ```shell
500
+ #FashionIQ
501
+ python3 zscir/validate.py --dataset fiq --data_path fashionIQ_dataset \
502
+ --model_path checkpoints/fiq_zs/best.pt
503
+
504
+ python3 zscir/validate.py --dataset fiq --data_path fashionIQ_dataset \
505
+ --model_path checkpoints/fiq_zs_cc/best.pt
506
+
507
+ #CIRR
508
+ python3 zscir/validate.py --dataset cirr --data_path cirr_dataset \
509
+ --model_path checkpoints/cirr_zs/best.pt
510
+
511
+ python3 zscir/validate.py --dataset cirr --data_path cirr_dataset \
512
+ --model_path checkpoints/cirr_zs_cc/best.pt
513
+ ```
514
+
515
+ ### Test On CIRR
516
+
517
+ ```shell
518
+ # Generate 2 json files at submission/zscir/
519
+ # Then submit them to the test website: https://cirr.cecs.anu.edu.au/test_process
520
+ python3 zscir/cirr_test_submission.py --model_path checkpoints/cirr_zs/best.pt \
521
+ --submission-name zscir --data_path cirr_dataset
522
+
523
+ python3 zscir/cirr_test_submission.py --model_path checkpoints/cirr_zs_cc/best.pt \
524
+ --submission-name zscir_cc --data_path cirr_dataset
525
+ ```
526
+
527
+ ## Citation
528
+
529
+ ```latex
530
+ @article{feng2024improving,
531
+ title={Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives},
532
+ author={Feng, Zhangchi and Zhang, Richong and Nie, Zhijie},
533
+ journal={arXiv preprint arXiv:2404.11317},
534
+ year={2024}
535
+ }
536
+ ```
537
+
538
+ ## Acknowledgement
539
+
540
+ About code, our project is based on [CLIP4Cir](https://github.com/ABaldrati/CLIP4Cir). Some of our code are learned from [TG-CIR](https://anosite.wixsite.com/tg-cir), [SPRC](https://github.com/chunmeifeng/SPRC), [Candidate-Reranking-CIR](https://github.com/Cuberick-Orion/Candidate-Reranking-CIR).
541
+
542
+ About data, we train and evaluate on two CIR dataset [FashionIQ](https://github.com/XiaoxiaoGuo/fashion-iq/) and [CIRR](https://github.com/Cuberick-Orion/CIRR). We use [LLaVA](https://github.com/haotian-liu/LLaVA) to do caption generation and [Unicom](https://github.com/deepglint/unicom) to do image pair match.
543
+
544
+ Thanks for their great jobs! If you need to use a particular part of our code, please cite the relevant papers.
545
+