Checkpoints (OFA-CN)

We provide checkpoints of OFA-CN, which is the Chinese version of OFA. We provide Base-size and Large-size models, including pretrained and finetuned models on image captioning and referring expression comprehension. Note that we translated the texts in the RefCOCO(-/+/g) datasets and finetuned OFA-CN on them. We plan to release the related new datasets in the near future.

Checkpoints

Below we provide the links for downloading the Chinese OFA checkpoints.

Pretraining

Pretrained checkpoint (OFA-CN-Large) (~443M parameters)
Pretrained checkpoint (OFA-CN-Base) (~160M parameters)

Finetuning (OFA-Large)

Finetuning (OFA-Base)

Model Card

Below we provide the basic information of the base-size and large-size OFA-CN.

Model	#Params	Backbone	Hidden Size	Intermediate Size	#Heads	#Enc. Layers	#Dec. Layers
OFA_Base	160M	ResNet101	768	3072	12	6	6
OFA_Large	443M	ResNet152	1024	4096	16	12	12

Results

Below we provide the results of OFA-CN and the baselines for comparison.

MUGE Caption

Model	BLEU@4	ROUGE-L	CIDEr-D
Trm	7.33	51.51	11.00
M6	16.19	55.06	30.75
OFA_Base	26.23	58.95	50.70
OFA_Large	27.32	59.20	53.51

RefCOCO-CN Series

Model	RefCOCO(val/testA/testB)	RefCOCO+(val/testA/testB)	RefCOCOg(val/test-u)
OFA_Base(random-init)	30.13/35.07/25.03	17.89/20.90/15.83	20.30/20.45
OFA_Base	82.18/86.07/76.68	69.38/77.26/60.14	73.57/72.53
OFA_Large	82.84/86.54/76.50	71.30/78.56/61.85	71.96/71.30