arxyzan's picture
Update README.md
d480281
|
raw
history blame
554 Bytes
---
language:
- fa
metrics:
- wer
pipeline_tag: image-to-text
---
A Persian image captioning model constructed from a ViT + GPT2 architecture trained on flickr30k-fa.
The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (GPT2) was initialized
from https://huggingface.co/HooshvareLab/gpt2-fa .
## Usage
```
pip install hezar
```
```python
from hezar import Model
model = Model.load("hezarai/vit-gpt2-fa-image-captioning-flickr30k")
captions = model.predict("example_image.jpg")
print(captions)
```