Spaces:
Runtime error
Runtime error
A newer version of the Streamlit SDK is available:
1.40.1
The model is shown in the image above. We create a custom model in Flax which integerates the CLIP Vision model as an encoder inside mBART model. We also use custom configs and modules in order to accomodate for these changes, and allow loading from mBART and CLIP Vision checkpoints. The image is fed to the CLIP Vision encoder and the shifted token ids are fed to the mBART decoder. We use the facebook/mbart-large-50
and openai/clip-vit-base-patch32
checkpoints for mBART and CLIP Vision models, respectively. All our code is available on GitHub.
Our model reached eval loss of ~2.6 around ~70K steps. Here are the BLEU scores (out of 1) for different languages:
Language | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 |
---|---|---|---|---|
English | 0.13083 | 0.08887 | 0.06681 | 0.04899 |
Spanish | 0.15981 | 0.09858 | 0.06918 | 0.04776 |
German | 0.14234 | 0.09817 | 0.07405 | 0.0515 |
French | 0.13021 | 0.08862 | 0.06598 | 0.04647 |