OFA-OCR / prompt_tuning.md
JustinLin610's picture
first commit
ee21b96
|
raw
history blame
1.99 kB

Prompt Tuning for Generative Multimodal Pretrained Models

Overview

This is the code for "Prompt Tuning for Generative Multimodal Pretrained Models", Check our paper on ArXiv. This paper explores prompt tuning for generative multimodal pretrained models, instead of the constrastive learning models. We specifically focuses on the unified sequence-to-sequence learning framework and implement on our OFA models.

Requirements

  • python 3.7.4
  • pytorch 1.8.1
  • torchvision 0.9.1
  • JAVA 1.8 (for COCO evaluation)

Installation

pip install -r requirements.txt

Datasets and Checkpoints

See datasets.md and checkpoints.md.

Training

We provide a demo script (run_scripts/refcoco/train_refcoco_prefix.sh) that has all the required parts for training.

sh ./run_scripts/refcoco/train_refcoco_prefix.sh

A few options of note:

  • --encoder-prompt :: whether to insert prompts to the encoder
  • --decoder-prompt :: whether to insert prompts to the decoder
  • --encoder-prompt-length :: encoder prompt length
  • --decoder-prompt-length :: decoder prompt length
  • --bitfit :: whether to use bitfit
  • --adapter :: whether to use adapter
  • --adapter-dim :: adapter projection dim

We recommend that your workspace directory should be organized like this:

OFA/
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ ofa_base.pt
β”‚   β”œβ”€β”€ ofa_large.pt
β”‚   └── ...
β”œβ”€β”€ criterions/
β”œβ”€β”€ data/
β”œβ”€β”€ dataset/
β”‚   β”œβ”€β”€ caption_data/
β”‚   β”œβ”€β”€ refcoco_data/
β”‚   └── ...
β”œβ”€β”€ fairseq/
β”œβ”€β”€ models/
β”œβ”€β”€ run_scripts/
β”œβ”€β”€ tasks/
β”œβ”€β”€ train.py
β”œβ”€β”€ trainer.py
└── utils/