Spaces:
Runtime error
Runtime error
bhavitvyamalik
commited on
Commit
•
099abfc
1
Parent(s):
47dc27b
add contributions
Browse files- app.py +1 -1
- sections/intro/contributions.md +5 -0
app.py
CHANGED
@@ -177,7 +177,7 @@ def main():
|
|
177 |
st.set_page_config(
|
178 |
page_title="Multilingual Image Captioning",
|
179 |
layout="wide",
|
180 |
-
initial_sidebar_state="
|
181 |
page_icon="./misc/mic-logo.png",
|
182 |
)
|
183 |
|
|
|
177 |
st.set_page_config(
|
178 |
page_title="Multilingual Image Captioning",
|
179 |
layout="wide",
|
180 |
+
initial_sidebar_state="auto",
|
181 |
page_icon="./misc/mic-logo.png",
|
182 |
)
|
183 |
|
sections/intro/contributions.md
CHANGED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Our novel contributions include:
|
2 |
+
- A [multilingual variant of the Conceptual-12M dataset (mBART50)](https://huggingface.co/datasets/flax-community/conceptual-12m-mbart-50-multilingual) containing 2.5M image-text pairs each in four languages - English, French, German and Spanish, translated using mBART-50 model.
|
3 |
+
- A [multilingual variant of the Conceptual-12M dataset (MarianMT)](https://huggingface.co/datasets/flax-community/conceptual-12m-multilingual-marian) containing 2.5M image-text pairs each in four languages - English, French, German and Spanish, translated using MarianMT model.
|
4 |
+
- [A fusion of CLIP Vision Transformer and mBART50 model](https://github.com/gchhablani/multilingual-vqa/tree/main/models/flax_clip_vision_bert). It takes in visual embeddings from CLIP-Vision transformer and feeds into the `encoder_hidden_states` of a mBART50 decoder. This is done for deep cross-modal interaction via cross-attention between the two models.
|
5 |
+
- A [pre-trained checkpooint](https://huggingface.co/flax-community/clip-vit-base-patch32_mbart-large-50) on our multilingual Conceptual-12M variant.
|