imdbo commited on
Commit
12b6250
1 Parent(s): e91e5de

Update README_English.md

Browse files
Files changed (1) hide show
  1. README_English.md +13 -10
README_English.md CHANGED
@@ -11,40 +11,43 @@ metrics:
11
  License: MIT
12
  ---
13
 
14
- **Model Description**
15
 
16
- OpenNMT model for English-Galician using a transformer architecture.
17
 
18
  **How to translate**
19
 
20
  + Open bash terminal
21
- + Install [Python 3.9](https://www.python.org/downloads/release/python-390/)
22
  + Install [Open NMT toolkit v.2.2](https://github.com/OpenNMT/OpenNMT-py)
23
  + Translate an input_text using the NOS-MT-en-gl model with the following command:
24
 
25
  ```bash
26
  onmt_translate -src input_text -model NOS-MT-en-gl -output ./output_file.txt -replace_unk -gpu 0
27
  ```
28
- + The result of the translation will be in the PATH indicated by the -output flag.
29
 
30
  **Training**
31
 
32
- In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of English-Portuguese translations, which we have converted into English-Galician by means of Portuguese-Galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
 
 
 
33
 
34
  **Training process**
35
 
36
- + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
37
- + The vocabulary for the models was generated through the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) of OpenNMT
38
- + Using .yaml in this repository you can replicate the training process as follows
39
 
40
  ```bash
41
  onmt_build_vocab -config bpe-en-gl_emb.yaml -n_sample 100000
42
  onmt_train -config bpe-en-gl_emb.yaml
43
  ```
44
 
45
- **Hyper-parameters**
46
 
47
- The parameters used for the development of the model can be directly consulted in the same .yaml file bpe-en-gl_emb.yaml
48
 
49
  **Evaluation**
50
 
 
11
  License: MIT
12
  ---
13
 
14
+ **Model description**
15
 
16
+ Model developed with OpenNMT for the Galician-Spanish pair using the transformer architecture.
17
 
18
  **How to translate**
19
 
20
  + Open bash terminal
21
+ + Install [Python 3.9](https://www.python.org/downloads/release/python-390/)
22
  + Install [Open NMT toolkit v.2.2](https://github.com/OpenNMT/OpenNMT-py)
23
  + Translate an input_text using the NOS-MT-en-gl model with the following command:
24
 
25
  ```bash
26
  onmt_translate -src input_text -model NOS-MT-en-gl -output ./output_file.txt -replace_unk -gpu 0
27
  ```
28
+ + The resulting translation will be in the PATH indicated by the -output flag.
29
 
30
  **Training**
31
 
32
+ To train this model, we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora).
33
+
34
+ Authentic corpora are corpora produced by human translators. Synthetic corpora are Spanish-Portuguese translations, which have been converted to Spanish-Galician by means of Portuguese-Galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
35
+
36
 
37
  **Training process**
38
 
39
+ + Tokenisation was performed with a modified version of the [linguakit](https://github.com/citiususc/Linguakit) tokeniser (tokenizer.pl) that does not append a new line after each token.
40
+ + All BPE models were generated with the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py)
41
+ + Using the .yaml in this repository, it is possible to replicate the original training process. Before training the model, please verify that the path to each target (tgt) and (src) file is correct. Once this is done, proceed as follows:
42
 
43
  ```bash
44
  onmt_build_vocab -config bpe-en-gl_emb.yaml -n_sample 100000
45
  onmt_train -config bpe-en-gl_emb.yaml
46
  ```
47
 
48
+ **Hyperparameters**
49
 
50
+ You may find the parameters used for this model inside the file bpe-en-gl_emb.yaml
51
 
52
  **Evaluation**
53