Update README_English.md
Browse files- README_English.md +1 -1
README_English.md
CHANGED
@@ -36,7 +36,7 @@ Authentic corpora are corpora produced by human translators. Synthetic corpora a
|
|
36 |
|
37 |
+ Tokenisation was performed with a modified version of the [linguakit](https://github.com/citiususc/Linguakit) tokeniser (tokenizer.pl) that does not append a new line after each token.
|
38 |
+ All BPE models were generated with the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py)
|
39 |
-
+ Using the .yaml in this repository it is possible to replicate the original training process. Before training the model, please verify that the path to each target (tgt) and (src) file is correct. Once this is done, proceed as follows:
|
40 |
|
41 |
```bash
|
42 |
onmt_build_vocab -config bpe-es-gl_emb.yaml -n_sample 100000
|
|
|
36 |
|
37 |
+ Tokenisation was performed with a modified version of the [linguakit](https://github.com/citiususc/Linguakit) tokeniser (tokenizer.pl) that does not append a new line after each token.
|
38 |
+ All BPE models were generated with the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py)
|
39 |
+
+ Using the .yaml in this repository, it is possible to replicate the original training process. Before training the model, please verify that the path to each target (tgt) and (src) file is correct. Once this is done, proceed as follows:
|
40 |
|
41 |
```bash
|
42 |
onmt_build_vocab -config bpe-es-gl_emb.yaml -n_sample 100000
|