ar5entum commited on
Commit
fe1a82d
·
verified ·
1 Parent(s): e022fa8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -10
README.md CHANGED
@@ -4,6 +4,9 @@ license: apache-2.0
4
  base_model: Helsinki-NLP/opus-mt-mul-en
5
  tags:
6
  - generated_from_trainer
 
 
 
7
  metrics:
8
  - bleu
9
  model-index:
@@ -29,16 +32,32 @@ It achieves the following results on the evaluation set:
29
 
30
  The model is specifically designed to translate Hindi text written in Devanagari script into a mixed format where Hindi words are retained in Devanagari while English words are converted to Roman script. This model effectively handles the complexities of code-switching, producing output that accurately reflects the intended language mixing.
31
 
32
-
33
- ## Intended uses & limitations
34
-
35
- More information needed
36
-
37
- ## Training and evaluation data
38
-
39
- More information needed
40
-
41
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ### Training hyperparameters
44
 
 
4
  base_model: Helsinki-NLP/opus-mt-mul-en
5
  tags:
6
  - generated_from_trainer
7
+ - code switching
8
+ - hinglish
9
+ - code mixing
10
  metrics:
11
  - bleu
12
  model-index:
 
32
 
33
  The model is specifically designed to translate Hindi text written in Devanagari script into a mixed format where Hindi words are retained in Devanagari while English words are converted to Roman script. This model effectively handles the complexities of code-switching, producing output that accurately reflects the intended language mixing.
34
 
35
+ Example:
36
+ | Hindi | Hindi + English CS |
37
+ |:-----------------------------------------:|:-----------------------------------------:|
38
+ |तो वो टोटली मेरे घर के प्लान पे डिपेंड करता है |to वो totally मेरे घर के plan पे depend करता है |
39
+ |मांग लो भाई बहुत नेसेसरी है |मांग लो भाई बहुत necessary है |
40
+
41
+ ```
42
+ from transformers import MarianMTModel, MarianTokenizer
43
+
44
+ class HinEngCS:
45
+ def __init__(self, model_name='ar5entum/marianMT_hin_eng_cs'):
46
+ self.model_name = model_name
47
+ self.tokenizer = MarianTokenizer.from_pretrained(model_name)
48
+ self.model = MarianMTModel.from_pretrained(model_name).to('cuda')
49
+
50
+ def predict(self, input_text):
51
+ tokenized_text = self.tokenizer(input_text, return_tensors='pt').to('cuda')
52
+ translated = self.model.generate(**tokenized_text)
53
+ translated_text = self.tokenizer.decode(translated[0], skip_special_tokens=True)
54
+ return translated_text
55
+ model = HinEngCS()
56
+
57
+ input_text = "आज मैं नानयांग टेक्नोलॉजिकल निवर्सिटी में अनेक समझौते होते हुए देखूंगा जो कि उच्च शिक्षा साइंस टेक्नोलॉजी और इनोवेशन में हमारे सहयोग को ओर बढ़ाएंगे।"
58
+ model.predict(input_text)
59
+ # आज मैं नानयांग technological innovation में अनेक समझौते होते हुए देखूंगा जो कि उच्च शिक्षा science technology और innovation में हमारे सहयोग को ओर बढ़ाएंगे
60
+ ```
61
 
62
  ### Training hyperparameters
63