sections/usage.md · flax-community/Multilingual-VQA at 8b842e0eb6fc68751d44254ce1029d71b2580bf2

This demo loads the FlaxCLIPVisionBertForSequenceClassificationModel present in the model directory of this repository. The checkpoint is loaded from ckpt/ckpt-60k-5999 which is pre-trained checkpoint with 60k steps and 5999 fine-tuning steps. 100 random examples are present in the dummy_vqa_multilingual.tsv which respective images in the images/val2014 directory.
You can also upload your image using the Upload your image file uplaoder and type in a question of your choosing.
We provide English Translation of the question for users who are not acquainted with the other languages. This is done using mtranslate to keep things flexible enough and needs internet connection as it uses the Google Translate API.
The model predicts the answers from a list of 3129 answers which have their labels present in answer_reverse_mapping.json.
Lastly, once can choose the Answer Language which is also a saved dictionary created using mtranslate library for the 3129 answer options.
The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.