Spaces:
Runtime error
Runtime error
Commit
·
627e34d
1
Parent(s):
8b842e0
Update application
Browse files- app.py +5 -6
- sections/{about.md → abstract.md} +1 -1
- sections/checkpoints.md +1 -1
- sections/social_impact.md +1 -1
- sections/usage.md +2 -0
app.py
CHANGED
@@ -57,11 +57,6 @@ st.write("[Gunjan Chhablani](https://huggingface.co/gchhablani), [Bhavitvya Mali
|
|
57 |
with st.beta_expander("Usage"):
|
58 |
st.markdown(read_markdown("usage.md"))
|
59 |
|
60 |
-
with st.beta_expander("Method"):
|
61 |
-
st.image("./misc/Multilingual-VQA.png")
|
62 |
-
st.markdown(read_markdown("pretraining.md"))
|
63 |
-
st.markdown(read_markdown("finetuning.md"))
|
64 |
-
|
65 |
first_index = 20
|
66 |
# Init Session State
|
67 |
if state.image_file is None:
|
@@ -122,8 +117,12 @@ fig = plotly_express_horizontal_bar_plot(values, translated_labels)
|
|
122 |
st.plotly_chart(fig, use_container_width = True)
|
123 |
|
124 |
|
125 |
-
st.write(read_markdown("
|
126 |
st.write(read_markdown("caveats.md"))
|
|
|
|
|
|
|
|
|
127 |
st.write(read_markdown("challenges.md"))
|
128 |
st.write(read_markdown("social_impact.md"))
|
129 |
st.write(read_markdown("references.md"))
|
|
|
57 |
with st.beta_expander("Usage"):
|
58 |
st.markdown(read_markdown("usage.md"))
|
59 |
|
|
|
|
|
|
|
|
|
|
|
60 |
first_index = 20
|
61 |
# Init Session State
|
62 |
if state.image_file is None:
|
|
|
117 |
st.plotly_chart(fig, use_container_width = True)
|
118 |
|
119 |
|
120 |
+
st.write(read_markdown("abstract.md"))
|
121 |
st.write(read_markdown("caveats.md"))
|
122 |
+
st.write("# Methodology")
|
123 |
+
st.image("./misc/Multilingual-VQA.png", caption="Masked LM model for Image-text Pretraining.")
|
124 |
+
st.markdown(read_markdown("pretraining.md"))
|
125 |
+
st.markdown(read_markdown("finetuning.md"))
|
126 |
st.write(read_markdown("challenges.md"))
|
127 |
st.write(read_markdown("social_impact.md"))
|
128 |
st.write(read_markdown("references.md"))
|
sections/{about.md → abstract.md}
RENAMED
@@ -1,2 +1,2 @@
|
|
1 |
-
#
|
2 |
This project is focused on Mutilingual Visual Question Answering. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to provide a Proof-of-Concept with our simple ViT+BERT model which can be trained on multilingual text checkpoints with pre-trained image encoders and made to perform well enough. Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (already in English), French, German and Spanish using the Marian models. We achieved 0.49 accuracy on the multilingual validation set we created. With better captions, and hyperparameter-tuning, we expect to see higher performance.
|
|
|
1 |
+
# Abstract
|
2 |
This project is focused on Mutilingual Visual Question Answering. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to provide a Proof-of-Concept with our simple ViT+BERT model which can be trained on multilingual text checkpoints with pre-trained image encoders and made to perform well enough. Due to lack of good-quality multilingual data, we translate subsets of the Conceptual 12M dataset into English (already in English), French, German and Spanish using the Marian models. We achieved 0.49 accuracy on the multilingual validation set we created. With better captions, and hyperparameter-tuning, we expect to see higher performance.
|
sections/checkpoints.md
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
|
2 |
- Pre-trained checkpoint: [multilingual-vqa](https://huggingface.co/flax-community/multilingual-vqa)
|
3 |
- Fine-tuned on 45k pretrained checkpoint: [multilingual-vqa-pt-45k-ft](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft)
|
4 |
- Fine-tuned on 45k pretrained checkpoint with AdaFactor (others use AdamW): [multilingual-vqa-pt-45k-ft-adf](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft-adf)
|
|
|
1 |
+
# Checkpoints
|
2 |
- Pre-trained checkpoint: [multilingual-vqa](https://huggingface.co/flax-community/multilingual-vqa)
|
3 |
- Fine-tuned on 45k pretrained checkpoint: [multilingual-vqa-pt-45k-ft](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft)
|
4 |
- Fine-tuned on 45k pretrained checkpoint with AdaFactor (others use AdamW): [multilingual-vqa-pt-45k-ft-adf](https://huggingface.co/flax-community/multilingual-vqa-pt-45k-ft-adf)
|
sections/social_impact.md
CHANGED
@@ -1,2 +1,2 @@
|
|
1 |
# Social Impact
|
2 |
-
Multilingual Visual Question Answering has not received a lot of attention. There are very few multilingual VQA datasets, and that is what we wanted to address here. Our initial plan was to include 4 high-resource and 4 low-resource languages in our training data. However, the existing translations do not perform as well and we would have received poor labels,
|
|
|
1 |
# Social Impact
|
2 |
+
Multilingual Visual Question Answering has not received a lot of attention. There are very few multilingual VQA datasets, and that is what we wanted to address here. Our initial plan was to include 4 high-resource and 4 low-resource languages in our training data. However, the existing translations do not perform as well and we would have received poor labels, not to mention, with a longer training time. We hope to improve this in the future by using better translators (for e.g. Google Translate API) to get more multilingual data, especially in low-resource languages. Regardless, our aim with this project was to provide with a pipeline approach to deal with Multilingual visuo-linguistic pretraining and perform Multilingual Visual Question Answering.
|
sections/usage.md
CHANGED
@@ -10,4 +10,6 @@
|
|
10 |
|
11 |
- The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.
|
12 |
|
|
|
|
|
13 |
|
|
|
10 |
|
11 |
- The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.
|
12 |
|
13 |
+
For more info, scroll to the end of this app.
|
14 |
+
|
15 |
|