sandrarrey commited on
Commit
543c409
1 Parent(s): 79e5305

Update README_English.md

Browse files
Files changed (1) hide show
  1. README_English.md +3 -3
README_English.md CHANGED
@@ -27,13 +27,13 @@ onmt_translate -src input_text -model NOS-MT-es-gl -output ./output_file.txt -r
27
 
28
  **Training**
29
 
30
- In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of spanish-portuguese translations, which we have converted into spanish-galician by means of portuguese-galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
31
 
32
 
33
  **Training process**
34
 
35
  + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
36
- + Vocabulary for the models was created by the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) da open NMT
37
  + Using the .yaml in this repository you can replicate the training process as follows
38
 
39
  ```bash
@@ -47,7 +47,7 @@ The parameters used for the development of the model can be directly viewed in t
47
 
48
  **Evaluation**
49
 
50
- The BLEU evaluation of the models is done by mixing internally developed tests (gold1, gold2, test-suite) with other datasets available in Galician (Flores).
51
 
52
  | GOLD 1 | GOLD 2 | FLORES | TEST-SUITE|
53
  | ------------- |:-------------:| -------:|----------:|
 
27
 
28
  **Training**
29
 
30
+ In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of Spanish-Portuguese translations, which we have converted into Spanish-Galician by means of Portuguese-Galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
31
 
32
 
33
  **Training process**
34
 
35
  + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
36
+ + Vocabulary for the models was created by the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) of OpenNMT
37
  + Using the .yaml in this repository you can replicate the training process as follows
38
 
39
  ```bash
 
47
 
48
  **Evaluation**
49
 
50
+ The BLEU evaluation of the models is done by mixing internally developed tests (gold1, gold2, test-suite) and other datasets available in Galician (Flores).
51
 
52
  | GOLD 1 | GOLD 2 | FLORES | TEST-SUITE|
53
  | ------------- |:-------------:| -------:|----------:|