Introduction

This model is defined as proposed in the book "mastering pytorch". It is based on CNN-encoder and a LSTM-decoder.

The CNN-encoder is based on a pretrained RESNET-152. The last layer of the resnet is replaced by a vector embedding layer of 256 elements. The LSTM-decoder use an input of 256, a hidden layer of 512, and uses the vocabulary size.

The model has been trained as a pure learning exercise, and so the model performances remain relatively mean.

Training procedure

For the sake of the exercise, the model has been trained for only 5 epochs.

It has been trained on the COCO dataset.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.