YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
# Model Card for vit-gpt2-image-captioning
## Model Details
This model is a VisionEncoderDecoderModel using a ViT encoder and GPT-2 decoder to generate captions for images. It was fine-tuned by adding context information to assist in generating meaningful captions.
- **Base Model**: nlpconnect/vit-gpt2-image-captioning
- **Processor**: ViTImageProcessor
- **Tokenizer**: GPT-2 Tokenizer
- **Generated Caption Example**: "{generated_text}"
## Intended Use
This model is intended for generating captions for stock-related images, with an initial context provided for more accurate descriptions.
## Limitations
- The model might generate incorrect or biased descriptions depending on the input image or context.
- It requires specific context inputs for the best performance.
## How to Use
```python
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
model = VisionEncoderDecoderModel.from_pretrained("your_username/your_model_name")
processor = ViTImageProcessor.from_pretrained("your_username/your_model_name")
tokenizer = AutoTokenizer.from_pretrained("your_username/your_model_name")
```
## License
This model is licensed under the same terms as the original nlpconnect/vit-gpt2-image-captioning.
- Downloads last month
- 12
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.