--- license: mit base_model: facebook/mbart-large-50 tags: - generated_from_trainer model-index: - name: mbart-large-50-finetuned-es-en-2 results: [] datasets: - Helsinki-NLP/opus-100 metrics: - bleu language: - en - es --- # mbart-large-50-finetuned-es-en-2 This model is a fine-tuned version of [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) on the [Helsinki-NLP/opus-100](https://huggingface.co/datasets/Helsinki-NLP/opus-100). ## Model description mBART-50 is a multilingual Sequence-to-Sequence model pre-trained using the "Multilingual Denoising Pretraining" objective. It was introduced in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning paper. mBART-50 is a multilingual Sequence-to-Sequence model. It was created to show that multilingual translation models can be created through multilingual fine-tuning. Instead of fine-tuning on one direction, a pre-trained model is fine-tuned many directions simultaneously. mBART-50 is created using the original mBART model and extended to add extra 25 languages to support multilingual machine translation models of 50 languages. The pre-training objective is explained below. Multilingual Denoising Pretraining: The model incorporates N languages by concatenating data: D = {D1, ..., DN } where each Di is a collection of monolingual documents in language i. The source documents are noised using two schemes, first randomly shuffling the original sentences' order, and second a novel in-filling scheme, where spans of text are replaced with a single mask token. The model is then tasked to reconstruct the original text. 35% of each instance's words are masked by random sampling a span length according to a Poisson distribution (λ = 3.5). The decoder input is the original text with one position offset. A language id symbol LID is used as the initial token to predict the sentence. ## Intended uses & limitations More information needed ## Training and evaluation data 10,000 random samples from the dataset ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 3 - eval_batch_size: 3 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 3.0 ### Training results ### BLEU SCORE: 25.6431 ### Framework versions - Transformers 4.40.0.dev0 - Pytorch 2.2.1+cu121 - Datasets 2.18.0 - Tokenizers 0.15.2