sandrarrey
commited on
Commit
•
543c409
1
Parent(s):
79e5305
Update README_English.md
Browse files- README_English.md +3 -3
README_English.md
CHANGED
@@ -27,13 +27,13 @@ onmt_translate -src input_text -model NOS-MT-es-gl -output ./output_file.txt -r
|
|
27 |
|
28 |
**Training**
|
29 |
|
30 |
-
In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of
|
31 |
|
32 |
|
33 |
**Training process**
|
34 |
|
35 |
+ Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
|
36 |
-
+ Vocabulary for the models was created by the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py)
|
37 |
+ Using the .yaml in this repository you can replicate the training process as follows
|
38 |
|
39 |
```bash
|
@@ -47,7 +47,7 @@ The parameters used for the development of the model can be directly viewed in t
|
|
47 |
|
48 |
**Evaluation**
|
49 |
|
50 |
-
The BLEU evaluation of the models is done by mixing internally developed tests (gold1, gold2, test-suite)
|
51 |
|
52 |
| GOLD 1 | GOLD 2 | FLORES | TEST-SUITE|
|
53 |
| ------------- |:-------------:| -------:|----------:|
|
|
|
27 |
|
28 |
**Training**
|
29 |
|
30 |
+
In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of Spanish-Portuguese translations, which we have converted into Spanish-Galician by means of Portuguese-Galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
|
31 |
|
32 |
|
33 |
**Training process**
|
34 |
|
35 |
+ Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
|
36 |
+
+ Vocabulary for the models was created by the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) of OpenNMT
|
37 |
+ Using the .yaml in this repository you can replicate the training process as follows
|
38 |
|
39 |
```bash
|
|
|
47 |
|
48 |
**Evaluation**
|
49 |
|
50 |
+
The BLEU evaluation of the models is done by mixing internally developed tests (gold1, gold2, test-suite) and other datasets available in Galician (Flores).
|
51 |
|
52 |
| GOLD 1 | GOLD 2 | FLORES | TEST-SUITE|
|
53 |
| ------------- |:-------------:| -------:|----------:|
|