OFA-Sys
/

ofa-huge-vqa

Transformers

PyTorch

ofa

Inference Endpoints

Model card Files Files and versions Community

JustinLin610 commited on Nov 9, 2022

Commit

34dc2df

•

1 Parent(s): 22ed627

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -3,10 +3,14 @@ license: apache-2.0
 ---
 # OFA-huge-vqa
 This is the **huge** version of OFA model finetuned for **VQA**. OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework.
 The directory includes 4 files, namely `config.json` which consists of model configuration, `vocab.json` and `merge.txt` for our OFA tokenizer, and lastly `pytorch_model.bin` which consists of model weights. There is no need to worry about the mismatch between Fairseq and transformers, since we have addressed the issue yet.
 To use it in transformers, please refer to https://github.com/OFA-Sys/OFA/tree/feature/add_transformers. Install the transformers and download the models as shown below.
 ```
 git clone --single-branch --branch feature/add_transformers https://github.com/OFA-Sys/OFA.git
@@ -15,7 +19,7 @@ git clone https://huggingface.co/OFA-Sys/OFA-huge-vqa
 ```
 After, refer the path to OFA-large to `ckpt_dir`, and prepare an image for the testing example below. Also, ensure that you have pillow and torchvision in your environment.
-```
 >>> from PIL import Image
 >>> from torchvision import transforms
 >>> from transformers import OFATokenizer, OFAModel
@@ -39,7 +43,7 @@ After, refer the path to OFA-large to `ckpt_dir`, and prepare an image for the t
 >>> patch_img = patch_resize_transform(img).unsqueeze(0)
->>> # using the generator of fairseq version
 >>> model = OFAModel.from_pretrained(ckpt_dir, use_cache=True)
 >>> generator = sequence_generator.SequenceGenerator(
                     tokenizer=tokenizer,
@@ -53,7 +57,7 @@ After, refer the path to OFA-large to `ckpt_dir`, and prepare an image for the t
 >>> gen_output = generator.generate([model], data)
 >>> gen = [gen_output[i][0]["tokens"] for i in range(len(gen_output))]
->>> # using the generator of huggingface version
 >>> model = OFAModel.from_pretrained(ckpt_dir, use_cache=False)
 >>> gen = model.generate(inputs, patch_images=patch_img, num_beams=5, no_repeat_ngram_size=3)

 ---
 # OFA-huge-vqa
+## Introduction
 This is the **huge** version of OFA model finetuned for **VQA**. OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-to-sequence learning framework.
 The directory includes 4 files, namely `config.json` which consists of model configuration, `vocab.json` and `merge.txt` for our OFA tokenizer, and lastly `pytorch_model.bin` which consists of model weights. There is no need to worry about the mismatch between Fairseq and transformers, since we have addressed the issue yet.
+## How to use
 To use it in transformers, please refer to https://github.com/OFA-Sys/OFA/tree/feature/add_transformers. Install the transformers and download the models as shown below.
 ```
 git clone --single-branch --branch feature/add_transformers https://github.com/OFA-Sys/OFA.git
 ```
 After, refer the path to OFA-large to `ckpt_dir`, and prepare an image for the testing example below. Also, ensure that you have pillow and torchvision in your environment.
+```python
 >>> from PIL import Image
 >>> from torchvision import transforms
 >>> from transformers import OFATokenizer, OFAModel
 >>> patch_img = patch_resize_transform(img).unsqueeze(0)
+# using the generator of fairseq version
 >>> model = OFAModel.from_pretrained(ckpt_dir, use_cache=True)
 >>> generator = sequence_generator.SequenceGenerator(
                     tokenizer=tokenizer,
 >>> gen_output = generator.generate([model], data)
 >>> gen = [gen_output[i][0]["tokens"] for i in range(len(gen_output))]
+# using the generator of huggingface version
 >>> model = OFAModel.from_pretrained(ckpt_dir, use_cache=False)
 >>> gen = model.generate(inputs, patch_images=patch_img, num_beams=5, no_repeat_ngram_size=3)