jaimin
/

image_caption

vision-encoder-decoder

image-text-to-text

image-captioning

Inference Endpoints

Model card Files Files and versions Community

jaimin commited on Feb 19, 2023

Commit

ffeedea

·

1 Parent(s): 20e51b2

Update README.md

Files changed (1) hide show

README.md +5 -35

README.md CHANGED Viewed

@@ -3,27 +3,8 @@ tags:
 - image-to-text
 - image-captioning
 license: apache-2.0
-widget:
-- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
-  example_title: Savanna
-- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
-  example_title: Football Match
-- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
-  example_title: Airport
 ---
-# nlpconnect/vit-gpt2-image-captioning
-This is an image captioning model trained by @ydshieh in [flax ](https://github.com/huggingface/transformers/tree/main/examples/flax/image-captioning) this is pytorch version of [this](https://huggingface.co/ydshieh/vit-gpt2-coco-en-ckpts).
-# The Illustrated Image Captioning using transformers
-![](https://ankur3107.github.io/assets/images/vision-encoder-decoder.png)
-* https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/
 # Sample running code
 ```python
@@ -32,9 +13,9 @@ from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTok
 import torch
 from PIL import Image
-model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
-feature_extractor = ViTFeatureExtractor.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
-tokenizer = AutoTokenizer.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model.to(device)
@@ -63,8 +44,6 @@ def predict_step(image_paths):
   return preds
-predict_step(['doctor.e16ba4e4.jpg']) # ['a woman in a hospital bed with a woman in a hospital bed']
 ```
 # Sample running code using transformers pipeline
@@ -73,18 +52,9 @@ predict_step(['doctor.e16ba4e4.jpg']) # ['a woman in a hospital bed with a woman
 from transformers import pipeline
-image_to_text = pipeline("image-to-text", model="nlpconnect/vit-gpt2-image-captioning")
-image_to_text("https://ankur3107.github.io/assets/images/image-captioning-example.png")
-# [{'generated_text': 'a soccer game with a player jumping to catch the ball '}]
-```
-# Contact for any help
-* https://huggingface.co/ankur310794
-* https://twitter.com/ankur310794
-* http://github.com/ankur3107
-* https://www.linkedin.com/in/ankur310794

 - image-to-text
 - image-captioning
 license: apache-2.0
 ---
 # Sample running code
 ```python
 import torch
 from PIL import Image
+model = VisionEncoderDecoderModel.from_pretrained("jaimin/image_caption")
+feature_extractor = ViTFeatureExtractor.from_pretrained("jaimin/image_caption")
+tokenizer = AutoTokenizer.from_pretrained("jaimin/image_caption")
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 model.to(device)
   return preds
 ```
 # Sample running code using transformers pipeline
 from transformers import pipeline
+image_to_text = pipeline("image-to-text", model="jaimin/image_caption")
+```