# Model Card for vit-gpt2-image-captioning

    ## Model Details
    This model is a VisionEncoderDecoderModel using a ViT encoder and GPT-2 decoder to generate captions for images. It was fine-tuned by adding context information to assist in generating meaningful captions.

    - **Base Model**: nlpconnect/vit-gpt2-image-captioning
    - **Processor**: ViTImageProcessor
    - **Tokenizer**: GPT-2 Tokenizer
    - **Generated Caption Example**: "{generated_text}"

    ## Intended Use
    This model is intended for generating captions for stock-related images, with an initial context provided for more accurate descriptions.

    ## Limitations
    - The model might generate incorrect or biased descriptions depending on the input image or context.
    - It requires specific context inputs for the best performance.

    ## How to Use
    ```python
    from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
    model = VisionEncoderDecoderModel.from_pretrained("your_username/your_model_name")
    processor = ViTImageProcessor.from_pretrained("your_username/your_model_name")
    tokenizer = AutoTokenizer.from_pretrained("your_username/your_model_name")
    ```

    ## License
    This model is licensed under the same terms as the original nlpconnect/vit-gpt2-image-captioning.