proxectonos
/

Nos_MT-OpenNMT-es-gl

Model card Files Files and versions Community

sandrarrey commited on Feb 28, 2023

Commit

543c409

•

1 Parent(s): 79e5305

Update README_English.md

Files changed (1) hide show

README_English.md +3 -3

README_English.md CHANGED Viewed

@@ -27,13 +27,13 @@ onmt_translate -src input_text -model NOS-MT-es-gl -output ./output_file.txt -r
 **Training**
-In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of spanish-portuguese translations, which we have converted into spanish-galician by means of portuguese-galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
 **Training process**
 + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
-+ Vocabulary for the models was created by the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) da open NMT
 + Using the .yaml in this repository you can replicate the training process as follows
 ```bash
@@ -47,7 +47,7 @@ The parameters used for the development of the model can be directly viewed in t
 **Evaluation**
-The BLEU evaluation of the models is done by mixing internally developed tests (gold1, gold2, test-suite) with other datasets available in Galician (Flores).
 | GOLD 1        | GOLD 2        | FLORES  | TEST-SUITE|
 | ------------- |:-------------:| -------:|----------:|

 **Training**
+In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of Spanish-Portuguese translations, which we have converted into Spanish-Galician by means of Portuguese-Galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
 **Training process**
 + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
++ Vocabulary for the models was created by the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) of OpenNMT
 + Using the .yaml in this repository you can replicate the training process as follows
 ```bash
 **Evaluation**
+The BLEU evaluation of the models is done by mixing internally developed tests (gold1, gold2, test-suite) and other datasets available in Galician (Flores).
 | GOLD 1        | GOLD 2        | FLORES  | TEST-SUITE|
 | ------------- |:-------------:| -------:|----------:|