Spaces:
Runtime error
This demo loads the
FlaxCLIPVisionBertForSequenceClassification
present in themodel
directory of this repository. The checkpoint is loaded fromflax-community/clip-vision-bert-vqa-ft-6k
which is pre-trained checkpoint with 60k steps and 6k fine-tuning steps. 100 random validation set examples are present in thedummy_vqa_multilingual.tsv
with respective images in theimages/val2014
directory.We provide
English Translation
of the question for users who are not well-acquainted with the other languages. This is done usingmtranslate
to keep things flexible enough and needs internet connection as it uses the Google Translate API.The model predicts the answers from a list of 3129 answers which have their labels present in
answer_reverse_mapping.json
.Lastly, one can choose the
Answer Language
which also uses a saved dictionary created usingmtranslate
library for the 3129 answer options.The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.