metadata
library_name: transformers
license: apache-2.0
base_model: Helsinki-NLP/opus-mt-mul-en
tags:
- generated_from_trainer
- code switching
- hinglish
- code mixing
metrics:
- bleu
model-index:
- name: marianMT_hin_eng_cs
results: []
language:
- hi
- en
datasets:
- ar5entum/hindi-english-code-mixed
marianMT_hin_eng_cs
This model is a fine-tuned version of Helsinki-NLP/opus-mt-mul-en on ar5entum/hindi-english-code-mixed dataset. It achieves the following results on the evaluation set:
- Loss: 0.1450
- Bleu: 77.8649
- Gen Len: 74.8945
Model description
The model is specifically designed to translate Hindi text written in Devanagari script into a mixed format where Hindi words are retained in Devanagari while English words are converted to Roman script. This model effectively handles the complexities of code-switching, producing output that accurately reflects the intended language mixing.
Example:
Hindi | Hindi + English CS |
---|---|
तो वो टोटली मेरे घर के प्लान पे डिपेंड करता है | to वो totally मेरे घर के plan पे depend करता है |
मांग लो भाई बहुत नेसेसरी है | मांग लो भाई बहुत necessary है |
टेलीविज़न में क्या प्रोग्राम चल रहा है? | television में क्या program चल रहा है? |
from transformers import MarianMTModel, MarianTokenizer
class HinEngCS:
def __init__(self, model_name='ar5entum/marianMT_hin_eng_cs'):
self.model_name = model_name
self.tokenizer = MarianTokenizer.from_pretrained(model_name)
self.model = MarianMTModel.from_pretrained(model_name)
def predict(self, input_text):
tokenized_text = self.tokenizer(input_text, return_tensors='pt')
translated = self.model.generate(**tokenized_text)
translated_text = self.tokenizer.decode(translated[0], skip_special_tokens=True)
return translated_text
model = HinEngCS()
input_text = "आज मैं नानयांग टेक्नोलॉजिकल यूनिवर्सिटी में अनेक समझौते होते हुए देखूंगा जो कि उच्च शिक्षा साइंस टेक्नोलॉजी और इनोवेशन में हमारे सहयोग को और बढ़ाएंगे।"
model.predict(input_text)
# आज मैं नानयांग technological university में अनेक समझौते होते हुए देखूंगा जो कि उच्च शिक्षा science technology और innovation में हमारे सहयोग को और बढ़ाएंगे।
Training Procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 50
- eval_batch_size: 50
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 100
- total_eval_batch_size: 100
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 30.0
Training results
Training Loss | Epoch | Step | Bleu | Gen Len | Validation Loss |
---|---|---|---|---|---|
1.5823 | 1.0 | 1118 | 11.6257 | 77.1622 | 1.1778 |
0.921 | 2.0 | 2236 | 33.2917 | 76.1459 | 0.6357 |
0.6472 | 3.0 | 3354 | 47.3533 | 75.9194 | 0.4504 |
0.5246 | 4.0 | 4472 | 55.2169 | 75.6871 | 0.3579 |
0.4228 | 5.0 | 5590 | 60.8262 | 75.5777 | 0.3041 |
0.3745 | 6.0 | 6708 | 64.8987 | 75.4424 | 0.2693 |
0.3552 | 7.0 | 7826 | 67.7607 | 75.2438 | 0.2455 |
0.3324 | 8.0 | 8944 | 69.635 | 75.1036 | 0.2274 |
0.2912 | 9.0 | 10062 | 71.3086 | 75.0326 | 0.2117 |
0.2591 | 10.0 | 11180 | 72.392 | 74.9607 | 0.2001 |
0.2471 | 11.0 | 12298 | 73.4758 | 74.9251 | 0.1899 |
0.236 | 12.0 | 13416 | 74.4219 | 74.833 | 0.1822 |
0.2265 | 13.0 | 14534 | 75.1435 | 74.9069 | 0.1745 |
0.2152 | 14.0 | 15652 | 75.7614 | 74.7409 | 0.1695 |
0.2078 | 15.0 | 16770 | 76.2353 | 74.7092 | 0.1641 |
0.2048 | 16.0 | 17888 | 76.7381 | 74.7274 | 0.1593 |
0.1975 | 17.0 | 19006 | 76.9954 | 74.7217 | 0.1559 |
0.1943 | 18.0 | 20124 | 77.421 | 74.6641 | 0.1524 |
0.1987 | 19.0 | 21242 | 77.8231 | 74.6833 | 0.1495 |
0.1855 | 20.0 | 22360 | 78.0784 | 74.6804 | 0.1472 |
Framework versions
- Transformers 4.45.0.dev0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1