Edit model card

Image Captioning Model created with VisionEncoderDecoderModel architecture using "microsoft/swinv2-base-patch4-window12to16-192to256-22kto1k-ft" as image_encoder and "openai/gpt2" as text_decoder. It has been trained on a variant of the WikiArt dataset that can be found at "AterMors/wikiart_recaption".

Downloads last month: 7

Safetensors

Model size

240M params

Tensor type

F32

Inference Examples

Image-to-Text

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for AterMors/Swin2-GTP2_art-caption

Base model

microsoft/swinv2-base-patch4-window12to16-192to256-22kto1k-ft

Finetuned

(11)

this model

AterMors
/

Swin2-GTP2_art-caption

Model tree for AterMors/Swin2-GTP2_art-caption

Dataset used to train AterMors/Swin2-GTP2_art-caption