# Model Card for vit-gpt2-image-captioning ## Model Details This model is a VisionEncoderDecoderModel using a ViT encoder and GPT-2 decoder to generate captions for images. It was fine-tuned by adding context information to assist in generating meaningful captions. - **Base Model**: nlpconnect/vit-gpt2-image-captioning - **Processor**: ViTImageProcessor - **Tokenizer**: GPT-2 Tokenizer - **Generated Caption Example**: "{generated_text}" ## Intended Use This model is intended for generating captions for stock-related images, with an initial context provided for more accurate descriptions. ## Limitations - The model might generate incorrect or biased descriptions depending on the input image or context. - It requires specific context inputs for the best performance. ## How to Use ```python from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer model = VisionEncoderDecoderModel.from_pretrained("your_username/your_model_name") processor = ViTImageProcessor.from_pretrained("your_username/your_model_name") tokenizer = AutoTokenizer.from_pretrained("your_username/your_model_name") ``` ## License This model is licensed under the same terms as the original nlpconnect/vit-gpt2-image-captioning.