milmor commited on
Commit
331abcf
1 Parent(s): b6a901e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -50
README.md CHANGED
@@ -1,51 +1,51 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - es
5
- - nah
6
- tags:
7
- - translation
8
- widget:
9
- - text: "translate Spanish to Nahuatl: mi hermano es un ajolote"
10
-
11
- ---
12
-
13
- # t5-small-spanish-nahuatl
14
- ## Model description
15
- This model is a T5 Transformer ([t5-small](https://huggingface.co/t5-small)) fine-tuned on 29,007 spanish and nahuatl sentences using 12,890 samples collected from the web and 16,117 samples from the Axolotl dataset.
16
-
17
- The dataset is normalized using 'sep' normalization from [py-elotl](https://github.com/ElotlMX/py-elotl).
18
-
19
-
20
- ## Usage
21
- ```python
22
- from transformers import AutoModelForSeq2SeqLM
23
- from transformers import AutoTokenizer
24
-
25
- model = AutoModelForSeq2SeqLM.from_pretrained('milmor/t5-small-spanish-nahuatl')
26
- tokenizer = AutoTokenizer.from_pretrained('milmor/t5-small-spanish-nahuatl')
27
-
28
- model.eval()
29
- sentence = 'muchas flores son blancas'
30
- input_ids = tokenizer('translate Spanish to Nahuatl: ' + sentence, return_tensors='pt').input_ids
31
- outputs = model.generate(input_ids)
32
- # outputs = miak xochitl istak
33
- outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
34
- ```
35
-
36
-
37
- ## Evaluation results
38
- The model is evaluated on 400 validation sentences.
39
- - Validation loss: 1.36
40
-
41
- _Note: Since the Axolotl corpus contains multiple misalignments, the real Validation loss is slightly better. These misalignments also introduce noise into the training._
42
-
43
-
44
- ## References
45
- - Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits
46
- of transfer learning with a unified Text-to-Text transformer.
47
-
48
- - Ximena Gutierrez-Vasques, Gerardo Sierra, and Hernandez Isaac. 2016. Axolotl: a web accessible parallel corpus for Spanish-Nahuatl. In International Conference on Language Resources and Evaluation (LREC).
49
-
50
-
51
  > Created by [Emilio Alejandro Morales](https://huggingface.co/milmor).
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - es
5
+ - nah
6
+ tags:
7
+ - translation
8
+ widget:
9
+ - text: "translate Spanish to Nahuatl: muchas flores son blancas"
10
+
11
+ ---
12
+
13
+ # t5-small-spanish-nahuatl
14
+ ## Model description
15
+ This model is a T5 Transformer ([t5-small](https://huggingface.co/t5-small)) fine-tuned on 29,007 spanish and nahuatl sentences using 12,890 samples collected from the web and 16,117 samples from the Axolotl dataset.
16
+
17
+ The dataset is normalized using 'sep' normalization from [py-elotl](https://github.com/ElotlMX/py-elotl).
18
+
19
+
20
+ ## Usage
21
+ ```python
22
+ from transformers import AutoModelForSeq2SeqLM
23
+ from transformers import AutoTokenizer
24
+
25
+ model = AutoModelForSeq2SeqLM.from_pretrained('milmor/t5-small-spanish-nahuatl')
26
+ tokenizer = AutoTokenizer.from_pretrained('milmor/t5-small-spanish-nahuatl')
27
+
28
+ model.eval()
29
+ sentence = 'muchas flores son blancas'
30
+ input_ids = tokenizer('translate Spanish to Nahuatl: ' + sentence, return_tensors='pt').input_ids
31
+ outputs = model.generate(input_ids)
32
+ # outputs = miak xochitl istak
33
+ outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
34
+ ```
35
+
36
+
37
+ ## Evaluation results
38
+ The model is evaluated on 400 validation sentences.
39
+ - Validation loss: 1.36
40
+
41
+ _Note: Since the Axolotl corpus contains multiple misalignments, the real Validation loss is slightly better. These misalignments also introduce noise into the training._
42
+
43
+
44
+ ## References
45
+ - Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits
46
+ of transfer learning with a unified Text-to-Text transformer.
47
+
48
+ - Ximena Gutierrez-Vasques, Gerardo Sierra, and Hernandez Isaac. 2016. Axolotl: a web accessible parallel corpus for Spanish-Nahuatl. In International Conference on Language Resources and Evaluation (LREC).
49
+
50
+
51
  > Created by [Emilio Alejandro Morales](https://huggingface.co/milmor).