Update README.md
Browse files
README.md
CHANGED
@@ -4,6 +4,9 @@ license: apache-2.0
|
|
4 |
base_model: Helsinki-NLP/opus-mt-mul-en
|
5 |
tags:
|
6 |
- generated_from_trainer
|
|
|
|
|
|
|
7 |
metrics:
|
8 |
- bleu
|
9 |
model-index:
|
@@ -29,16 +32,32 @@ It achieves the following results on the evaluation set:
|
|
29 |
|
30 |
The model is specifically designed to translate Hindi text written in Devanagari script into a mixed format where Hindi words are retained in Devanagari while English words are converted to Roman script. This model effectively handles the complexities of code-switching, producing output that accurately reflects the intended language mixing.
|
31 |
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
### Training hyperparameters
|
44 |
|
|
|
4 |
base_model: Helsinki-NLP/opus-mt-mul-en
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
+
- code switching
|
8 |
+
- hinglish
|
9 |
+
- code mixing
|
10 |
metrics:
|
11 |
- bleu
|
12 |
model-index:
|
|
|
32 |
|
33 |
The model is specifically designed to translate Hindi text written in Devanagari script into a mixed format where Hindi words are retained in Devanagari while English words are converted to Roman script. This model effectively handles the complexities of code-switching, producing output that accurately reflects the intended language mixing.
|
34 |
|
35 |
+
Example:
|
36 |
+
| Hindi | Hindi + English CS |
|
37 |
+
|:-----------------------------------------:|:-----------------------------------------:|
|
38 |
+
|तो वो टोटली मेरे घर के प्लान पे डिपेंड करता है |to वो totally मेरे घर के plan पे depend करता है |
|
39 |
+
|मांग लो भाई बहुत नेसेसरी है |मांग लो भाई बहुत necessary है |
|
40 |
+
|
41 |
+
```
|
42 |
+
from transformers import MarianMTModel, MarianTokenizer
|
43 |
+
|
44 |
+
class HinEngCS:
|
45 |
+
def __init__(self, model_name='ar5entum/marianMT_hin_eng_cs'):
|
46 |
+
self.model_name = model_name
|
47 |
+
self.tokenizer = MarianTokenizer.from_pretrained(model_name)
|
48 |
+
self.model = MarianMTModel.from_pretrained(model_name).to('cuda')
|
49 |
+
|
50 |
+
def predict(self, input_text):
|
51 |
+
tokenized_text = self.tokenizer(input_text, return_tensors='pt').to('cuda')
|
52 |
+
translated = self.model.generate(**tokenized_text)
|
53 |
+
translated_text = self.tokenizer.decode(translated[0], skip_special_tokens=True)
|
54 |
+
return translated_text
|
55 |
+
model = HinEngCS()
|
56 |
+
|
57 |
+
input_text = "आज मैं नानयांग टेक्नोलॉजिकल निवर्सिटी में अनेक समझौते होते हुए देखूंगा जो कि उच्च शिक्षा साइंस टेक्नोलॉजी और इनोवेशन में हमारे सहयोग को ओर बढ़ाएंगे।"
|
58 |
+
model.predict(input_text)
|
59 |
+
# आज मैं नानयांग technological innovation में अनेक समझौते होते हुए देखूंगा जो कि उच्च शिक्षा science technology और innovation में हमारे सहयोग को ओर बढ़ाएंगे
|
60 |
+
```
|
61 |
|
62 |
### Training hyperparameters
|
63 |
|