Spaces:

flax-community
/

Multilingual-VQA

Runtime error

App Files Files Community

gchhablani commited on Jul 19, 2021

Commit

0808df5

•

1 Parent(s): 74cb830

Update app

Browse files

Files changed (8) hide show

app.py +28 -23
hf_logo.png +0 -0
misc/mvqa-logo-2.png +0 -0
misc/mvqa-logo-white.png +0 -0
misc/mvqa-logo.png +0 -0
sections/acknowledgements.md +3 -1
sections/intro.md +5 -0
sections/usage.md +1 -5

app.py CHANGED Viewed

@@ -66,7 +66,7 @@ st.set_page_config(
     page_title="Multilingual VQA",
     layout="wide",
     initial_sidebar_state="collapsed",
-    page_icon="./misc/mvqa-logo.png",
 )
 st.title("Multilingual Visual Question Answering")
@@ -74,8 +74,26 @@ st.write(
     "[Gunjan Chhablani](https://huggingface.co/gchhablani), [Bhavitvya Malik](https://huggingface.co/bhavitvyamalik)"
 )
 with st.beta_expander("Usage"):
-    st.markdown(read_markdown("usage.md"))
 first_index = 20
 # Init Session State
@@ -92,7 +110,7 @@ if state.image_file is None:
 col1, col2 = st.beta_columns([6, 4])
-if col2.button("Get a random example"):
     sample = dummy_data.sample(1).reset_index()
     state.image_file = sample.loc[0, "image_file"]
     state.question = sample.loc[0, "question"].strip("- ")
@@ -116,24 +134,26 @@ transformed_image = get_transformed_image(state.image)
 # Display Image
 col1.image(state.image, use_column_width="auto")
 # Display Question
-question = col2.text_input(label="Question", value=state.question)
-col2.markdown(
     f"""**English Translation**: {question if state.question_lang_id == "en" else translate(question, 'en')}"""
 )
-col2.markdown("**Actual Answer in English**: " + answer_reverse_mapping[str(state.answer_label)])
 question_inputs = get_text_attributes(question)
 # Select Language
 options = ["en", "de", "es", "fr"]
-state.answer_lang_id = col2.selectbox(
     "Answer Language",
     index=options.index(state.answer_lang_id),
     options=options,
     format_func=lambda x: code_to_name[x],
 )
 # Display Top-5 Predictions
 with st.spinner("Loading model..."):
     model = load_model(checkpoints[0])
@@ -144,18 +164,3 @@ labels, values = get_top_5_predictions(logits, answer_reverse_mapping)
 translated_labels = translate_labels(labels, state.answer_lang_id)
 fig = plotly_express_horizontal_bar_plot(values, translated_labels)
 st.plotly_chart(fig, use_container_width=True)
-st.write(read_markdown("abstract.md"))
-st.write(read_markdown("caveats.md"))
-st.write("# Methodology")
-st.image(
-    "./misc/Multilingual-VQA.png", caption="Masked LM model for Image-text Pretraining."
-)
-st.markdown(read_markdown("pretraining.md"))
-st.markdown(read_markdown("finetuning.md"))
-st.write(read_markdown("challenges.md"))
-st.write(read_markdown("social_impact.md"))
-st.write(read_markdown("references.md"))
-st.write(read_markdown("checkpoints.md"))
-st.write(read_markdown("acknowledgements.md"))

     page_title="Multilingual VQA",
     layout="wide",
     initial_sidebar_state="collapsed",
+    page_icon="./misc/mvqa-logo-white.png",
 )
 st.title("Multilingual Visual Question Answering")
     "[Gunjan Chhablani](https://huggingface.co/gchhablani), [Bhavitvya Malik](https://huggingface.co/bhavitvyamalik)"
 )
+image_col, intro_col = st.beta_columns([2,8])
+image_col.image("./misc/mvqa-logo-white.png", use_column_width='always')
+intro_col.write(read_markdown('intro.md'))
 with st.beta_expander("Usage"):
+    st.write(read_markdown("usage.md"))
+with st.beta_expander("Article"):
+    st.write(read_markdown("abstract.md"))
+    st.write(read_markdown("caveats.md"))
+    st.write("# Methodology")
+    st.image(
+        "./misc/Multilingual-VQA.png", caption="Masked LM model for Image-text Pretraining."
+    )
+    st.markdown(read_markdown("pretraining.md"))
+    st.markdown(read_markdown("finetuning.md"))
+    st.write(read_markdown("challenges.md"))
+    st.write(read_markdown("social_impact.md"))
+    st.write(read_markdown("references.md"))
+    st.write(read_markdown("checkpoints.md"))
+    st.write(read_markdown("acknowledgements.md"))
 first_index = 20
 # Init Session State
 col1, col2 = st.beta_columns([6, 4])
+if col2.button("Get a random example", help="Get a random example from the 100 "):
     sample = dummy_data.sample(1).reset_index()
     state.image_file = sample.loc[0, "image_file"]
     state.question = sample.loc[0, "question"].strip("- ")
 # Display Image
 col1.image(state.image, use_column_width="auto")
+new_col1, new_col2 = st.beta_columns([5,5])
 # Display Question
+question = new_col1.text_input(label="Question", value=state.question)
+new_col1.markdown(
     f"""**English Translation**: {question if state.question_lang_id == "en" else translate(question, 'en')}"""
 )
 question_inputs = get_text_attributes(question)
 # Select Language
 options = ["en", "de", "es", "fr"]
+state.answer_lang_id = new_col2.selectbox(
     "Answer Language",
     index=options.index(state.answer_lang_id),
     options=options,
     format_func=lambda x: code_to_name[x],
 )
+new_col2.markdown("**Actual Answer in English**: " + answer_reverse_mapping[str(state.answer_label)])
 # Display Top-5 Predictions
 with st.spinner("Loading model..."):
     model = load_model(checkpoints[0])
 translated_labels = translate_labels(labels, state.answer_lang_id)
 fig = plotly_express_horizontal_bar_plot(values, translated_labels)
 st.plotly_chart(fig, use_container_width=True)

hf_logo.png DELETED Viewed

Binary file (5.64 kB)

misc/mvqa-logo-2.png ADDED Viewed

misc/mvqa-logo-white.png ADDED Viewed

misc/mvqa-logo.png CHANGED Viewed

sections/acknowledgements.md CHANGED Viewed

@@ -1,4 +1,6 @@
 # Acknowledgements
 We thank [Nilakshan Kunananthaseelan](https://huggingface.co/knilakshan20) for helping us whenever he could get a chance. We also thank [Abheesht Sharma](https://huggingface.co/abheesht) for helping in the discussions in the initial phases. [Luke Melas](https://github.com/lukemelas) helped us get the CC-12M data on our TPU-VMs and we are very grateful to him.
-This project would not be possible without the help of [Patrick](https://huggingface.co/patrickvonplaten) and [Suraj](https://huggingface.co/valhalla) who met with us and helped review our approach and guided us throughout the project.

 # Acknowledgements
 We thank [Nilakshan Kunananthaseelan](https://huggingface.co/knilakshan20) for helping us whenever he could get a chance. We also thank [Abheesht Sharma](https://huggingface.co/abheesht) for helping in the discussions in the initial phases. [Luke Melas](https://github.com/lukemelas) helped us get the CC-12M data on our TPU-VMs and we are very grateful to him.
+This project would not be possible without the help of [Patrick](https://huggingface.co/patrickvonplaten) and [Suraj](https://huggingface.co/valhalla) who met with us and helped review our approach and guided us throughout the project.
+Lastly, we thank the Google Team for helping answer our queries on the Slack channel, and for providing us TPU-VMs.

sections/intro.md ADDED Viewed

	@@ -0,0 +1,5 @@

+This demo uses a [ViTBert model checkpoint](https://huggingface.co/flax-community/multilingual-vqa-pt-60k-ft/tree/main/ckpt-5999) fine-tuned on a [MarianMT](https://huggingface.co/transformers/model_doc/marian.html)-translated version of the [VQA v2 dataset](https://visualqa.org/challenge.html). The fine-tuning is performed afterpre-training using text-only Masked LM on approximately 10 million image-text pairs taken from the [Conceptual 12M dataset](https://github.com/google-research-datasets/conceptual-12m) translated using [MBart](https://huggingface.co/transformers/model_doc/mbart.html). The translations are performed in the following four languages: English, French, German and Spanish.
+The model predicts one out of 3129 classes in English which can be found [here](https://huggingface.co/spaces/flax-community/Multilingual-VQA/blob/main/answer_reverse_mapping.json), and then the translated versions are provided based on the language chosen as `Answer Language`. The question can be present or written in any of the following: English, French, German and Spanish.
+For more details, click on `Usage` or `Article` 🤗 below.

sections/usage.md CHANGED Viewed

@@ -8,8 +8,4 @@
 - Lastly, once can choose the `Answer Language` which also uses a saved dictionary created using `mtranslate` library for the 3129 answer options.
-- The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.
-For more info, scroll to the end of this app.


8
9	- Lastly, once can choose the `Answer Language` which also uses a saved dictionary created using `mtranslate` library for the 3129 answer options.
10
11	+ - The top-5 predictions are displayed below and their respective confidence scores are shown in form of a bar plot.