README.md · hezarai/vit-gpt2-fa-image-captioning-flickr30k at bf0b1a72f00bce18178a21ec66fcad7fd9049fe6

metadata

language:
  - fa
library_name: hezar
tags:
  - image-to-text
  - hezar
metrics:
  - wer
pipeline_tag: image-to-text
datasets:
  - hezarai/flickr30k-fa

A Persian image captioning model constructed from a ViT + GPT2 architecture trained on flickr30k-fa (created by Sajjad Ayoubi). The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (GPT2) was initialized from https://huggingface.co/HooshvareLab/gpt2-fa .

Usage

pip install hezar

from hezar.models import Model

model = Model.load("hezarai/vit-gpt2-fa-image-captioning-flickr30k")
captions = model.predict("example_image.jpg")
print(captions)