ar5entum
/

marianMT_hin_eng_cs

@@ -4,6 +4,9 @@ license: apache-2.0
 base_model: Helsinki-NLP/opus-mt-mul-en
 tags:
 - generated_from_trainer
 metrics:
 - bleu
 model-index:
@@ -29,16 +32,32 @@ It achieves the following results on the evaluation set:
 The model is specifically designed to translate Hindi text written in Devanagari script into a mixed format where Hindi words are retained in Devanagari while English words are converted to Roman script. This model effectively handles the complexities of code-switching, producing output that accurately reflects the intended language mixing.
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters

 base_model: Helsinki-NLP/opus-mt-mul-en
 tags:
 - generated_from_trainer
+- code switching
+- hinglish
+- code mixing
 metrics:
 - bleu
 model-index:
 The model is specifically designed to translate Hindi text written in Devanagari script into a mixed format where Hindi words are retained in Devanagari while English words are converted to Roman script. This model effectively handles the complexities of code-switching, producing output that accurately reflects the intended language mixing.
+Example:
+| Hindi                                     | Hindi + English CS                        |
+|:-----------------------------------------:|:-----------------------------------------:|
+|तो वो टोटली मेरे घर के प्लान पे डिपेंड करता है           |to वो totally मेरे घर के plan पे depend करता है  |
+|मांग लो भाई बहुत नेसेसरी है                        |मांग लो भाई बहुत necessary है                  |
+```
+from transformers import MarianMTModel, MarianTokenizer
+class HinEngCS:
+    def __init__(self, model_name='ar5entum/marianMT_hin_eng_cs'):
+        self.model_name = model_name
+        self.tokenizer = MarianTokenizer.from_pretrained(model_name)
+        self.model = MarianMTModel.from_pretrained(model_name).to('cuda')
+    def predict(self, input_text):
+        tokenized_text = self.tokenizer(input_text, return_tensors='pt').to('cuda')
+        translated = self.model.generate(**tokenized_text)
+        translated_text = self.tokenizer.decode(translated[0], skip_special_tokens=True)
+        return translated_text
+model = HinEngCS()
+input_text = "आज मैं नानयांग टेक्नोलॉजिकल निवर्सिटी में अनेक समझौते होते हुए देखूंगा जो कि उच्च शिक्षा साइंस टेक्नोलॉजी और इनोवेशन में हमारे सहयोग को ओर बढ़ाएंगे।"
+model.predict(input_text)
+# आज मैं नानयांग technological innovation में अनेक समझौते होते हुए देखूंगा जो कि उच्च शिक्षा science technology और innovation में हमारे सहयोग को ओर बढ़ाएंगे
+```
 ### Training hyperparameters