Image Captioning Model created with VisionEncoderDecoderModel architecture using "microsoft/swinv2-base-patch4-window12to16-192to256-22kto1k-ft" as image_encoder and "openai/gpt2" as text_decoder. It has been trained on a variant of the WikiArt dataset that can be found at "AterMors/wikiart_recaption".
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.