Spaces:

flax-community
/

Multilingual-VQA

Runtime error

App Files Files Community

Multilingual-VQA / sections /usage.md

gchhablani's picture

Rearrange app

69e32d1 over 3 years ago

|

1.13 kB

	- This demo loads the `FlaxCLIPVisionBertForSequenceClassificationModel` present in the `model` directory of this repository. The checkpoint is loaded from `ckpt/ckpt-60k-5999` which is pre-trained checkpoint with 60k steps and 5999 fine-tuning steps. 100 random examples are present in the `dummy_vqa_multilingual.tsv` which respective images in the `images/val2014` directory.

	- You can also upload your image using the `Upload your image` file uplaoder and type in a question of your choosing.

	- We provide `English Translation` of the question for users who are not acquainted with the other languages. This is done using `mtranslate` to keep things flexible enough and needs internet connection as it uses the Google Translate API.

	- The model predicts the answers from a list of 3129 answers which have their labels present in `answer_reverse_mapping.json`.

	- Lastly, once can choose the `Answer Language` which is also a saved dictionary created using `mtranslate` library for the 3129 answer options.

	- The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.