eng-mal-translator
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1660
- Bleu: 16.9895
Overview
This project utilizes a custom dataset for training a translation model from English to Malayalam. The model leverages the facebook/nllb-200-distilled-600M
architecture from Hugging Face's Transformers library, fine-tuned on the dataset. It aims to provide accurate translations from English text inputs into Malayalam.
Dataset Used
The training data consists of a curated dataset containing parallel English-Malayalam text pairs, ensuring robust training and evaluation of the translation model. https://huggingface.co/datasets/Govardhan-06/flores_eng_mal
Model Used
The translation model employed is based on the facebook/nllb-200-distilled-600M
architecture, chosen for its efficiency and performance in sequence-to-sequence tasks.
Functionality
Users can input English text, and the model will generate corresponding Malayalam translations, facilitating cross-language communication and understanding.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu |
---|---|---|---|---|
No log | 1.0 | 226 | 1.1084 | 15.0719 |
No log | 2.0 | 452 | 1.0917 | 16.3698 |
1.1672 | 3.0 | 678 | 1.0952 | 16.2931 |
1.1672 | 4.0 | 904 | 1.0994 | 16.7858 |
0.8967 | 5.0 | 1130 | 1.1154 | 16.5906 |
0.8967 | 6.0 | 1356 | 1.1300 | 17.7039 |
0.7415 | 7.0 | 1582 | 1.1414 | 16.8886 |
0.7415 | 8.0 | 1808 | 1.1523 | 17.1442 |
0.6532 | 9.0 | 2034 | 1.1628 | 16.9454 |
0.6532 | 10.0 | 2260 | 1.1660 | 16.9895 |
Framework versions
- Transformers 4.42.3
- Pytorch 2.3.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 33