Spaces:
Runtime error
This demo loads the
FlaxCLIPVisionBertForSequenceClassificationModel
present in themodel
directory of this repository. The checkpoint is loaded fromckpt/ckpt-60k-5999
which is pre-trained checkpoint with 60k steps and 5999 fine-tuning steps. 100 random examples are present in thedummy_vqa_multilingual.tsv
which respective images in theimages/val2014
directory.You can also upload your image using the
Upload your image
file uplaoder and type in a question of your choosing.We provide
English Translation
of the question for users who are not acquainted with the other languages. This is done usingmtranslate
to keep things flexible enough and needs internet connection as it uses the Google Translate API.The model predicts the answers from a list of 3129 answers which have their labels present in
answer_reverse_mapping.json
.Lastly, once can choose the
Answer Language
which is also a saved dictionary created usingmtranslate
library for the 3129 answer options.The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.