## Prompt Tuning for Generative Multimodal Pretrained Models
### Overview
This is the code for **"Prompt Tuning for Generative Multimodal Pretrained Models"**, [Check our paper on ArXiv](https://arxiv.org/abs/2208.02532). This paper explores prompt tuning for generative multimodal pretrained models, instead of the constrastive learning models. We specifically focuses on the unified sequence-to-sequence learning framework and implement on our OFA models.
### Requirements
* python 3.7.4
* pytorch 1.8.1
* torchvision 0.9.1
* JAVA 1.8 (for COCO evaluation)
### Installation
```bash
pip install -r requirements.txt
```
### Datasets and Checkpoints
See [datasets.md](datasets.md) and [checkpoints.md](checkpoints.md).
### Training
We provide a demo script (`run_scripts/refcoco/train_refcoco_prefix.sh`) that has all the required parts for training.
```sh
sh ./run_scripts/refcoco/train_refcoco_prefix.sh
```
A few options of note:
* `--encoder-prompt` :: whether to insert prompts to the encoder
* `--decoder-prompt` :: whether to insert prompts to the decoder
* `--encoder-prompt-length` :: encoder prompt length
* `--decoder-prompt-length` :: decoder prompt length
* `--bitfit` :: whether to use bitfit
* `--adapter` :: whether to use adapter
* `--adapter-dim` :: adapter projection dim
We recommend that your workspace directory should be organized like this:
```
OFA/
├── checkpoints/
│ ├── ofa_base.pt
│ ├── ofa_large.pt
│ └── ...
├── criterions/
├── data/
├── dataset/
│ ├── caption_data/
│ ├── refcoco_data/
│ └── ...
├── fairseq/
├── models/
├── run_scripts/
├── tasks/
├── train.py
├── trainer.py
└── utils/
```