Seungjun
/

image_captioner

pytorch_model_hub_mixin

model_hub_mixin

Inference Endpoints

Model card Files Files and versions Community

Seungjun commited on Jun 4, 2024

Commit

1fd3456

·

verified ·

1 Parent(s): 36bb8af

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -11,4 +11,13 @@ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://h
 - Docs: [More Information Needed]
-test commit

 - Docs: [More Information Needed]
+## About the project
+This is a decoder of image captioning model.
+The image will be first preprocessed and resized to (224, 224) and then passed to ViT_b_32(with no classification layer), and then this will output
+(N, 768). Then this will be repeated 32(max_length) times and will be passed to K, V to CrossMultiHeadAttention block in decoder. This model was trained with
+Microsoft COCO2017 dataset and acheived 0.54 of masked_accuracy on validation set.
+## Sample Code
+To use this model, first you need to download ViT_b_32 which will be used as encoder and download decoder from this repo.