hezarai
/

vit-roberta-fa-image-captioning-flickr30k

Model card Files Files and versions Community

arxyzan commited on Sep 29, 2023

Commit

a51bbf9

•

1 Parent(s): 0e764a4

Create README.md

Files changed (1) hide show

README.md +23 -0

README.md ADDED Viewed

	@@ -0,0 +1,23 @@

+---
+language:
+- fa
+metrics:
+- wer
+pipeline_tag: image-to-text
+---
+A Persian image captioning model constructed from a ViT + RoBERTa architecture trained on flickr30k-fa.
+The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (RoBERTa) was initialized
+from https://huggingface.co/HooshvareLab/roberta-fa-zwnj-base .
+## Usage
+```
+pip install hezar
+```
+```python
+from hezar import Model
+model = Model.load("hezarai/vit-gpt2-fa-image-captioning-flickr30k")
+captions = model.predict("example_image.jpg")
+print(captions)
+```