jramompichel
commited on
Commit
•
bd4ec79
1
Parent(s):
a02febf
Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,21 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
license: mit
|
6 |
---
|
7 |
|
8 |
-
**Descrición do Modelo**
|
9 |
|
10 |
-
Modelo feito con OpenNMT para o par
|
11 |
|
12 |
-
**Como
|
13 |
|
14 |
+ Abrir terminal bash
|
15 |
+ Instalar [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
@@ -17,17 +23,17 @@ Modelo feito con OpenNMT para o par español-galego utilizando unha arquitectura
|
|
17 |
+ Traducir un input_text utilizando o modelo NOS-MT-en-gl co seguinte comando:
|
18 |
|
19 |
```bash
|
20 |
-
onmt_translate -src input_text -model NOS-MT-
|
21 |
```
|
22 |
+ O resultado da tradución estará no PATH indicado no flag -output.
|
23 |
|
24 |
-
**Adestramento**
|
25 |
|
26 |
Datos utilizados para o adestramento
|
27 |
|
28 |
Auténticos e Sintéticos (Transliteração)[Colocar Paper]
|
29 |
|
30 |
-
**Procedemento de adestramento**
|
31 |
|
32 |
+ Tokenization dos datasets feita co tokenizador de linguakit https://github.com/citiususc/Linguakit
|
33 |
|
@@ -40,11 +46,11 @@ onmt_build_vocab -config bpe-en-gl_emb.yaml -n_sample 100000
|
|
40 |
onmt_train -config bpe-en-gl_emb.yaml
|
41 |
```
|
42 |
|
43 |
-
**Hiperparámetros**
|
44 |
|
45 |
Os parámetros usados para o desenvolvimento do modelo poden ser consultados directamente no mesmo ficheiro .yaml bpe-en-gl_emb.yaml
|
46 |
|
47 |
-
**Avaliación**
|
48 |
A avalación dos modelos é feita cunha mistura de tests desenvolvidos internamente
|
49 |
(gold1, gold2, test-suite) con outros datasets disponíbeis en galego (Flores).
|
50 |
|
@@ -54,11 +60,29 @@ A avalación dos modelos é feita cunha mistura de tests desenvolvidos intername
|
|
54 |
|
55 |
|
56 |
|
57 |
-
**
|
|
|
|
|
|
|
|
|
58 |
|
59 |
-
|
|
|
|
|
|
|
|
|
|
|
60 |
|
61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
**Financiamento / Funding**
|
64 |
|
@@ -66,12 +90,4 @@ Esta investigación foi financiada polo proxecto "Nós: o galego na sociedade e
|
|
66 |
|
67 |
This research was funded by the project "Nós: Galician in the society and economy of artificial intelligence", agreement between Xunta de Galicia and University of Santiago de Compostela, and grant ED431G2019/04 by the Galician Ministry of Education, University and Professional Training, and the European Regional Development Fund (ERDF/FEDER program), and Groups of Reference: ED431C 2020/21.
|
68 |
|
69 |
-
|
70 |
-
**Citation Information**
|
71 |
-
|
72 |
-
@article{,
|
73 |
-
title={},
|
74 |
-
author={},
|
75 |
-
year={2022},
|
76 |
-
url={}
|
77 |
-
}
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language:
|
4 |
+
- gl
|
5 |
+
metrics:
|
6 |
+
- bleu (Gold1): 36.8
|
7 |
+
- bleu (Gold2): 47.1
|
8 |
+
- bleu (Flores): 32.3
|
9 |
+
- bleu (Test-suite): 42.7
|
10 |
---
|
11 |
license: mit
|
12 |
---
|
13 |
|
14 |
+
**Descrición do Modelo / Model Description**
|
15 |
|
16 |
+
Modelo feito con OpenNMT para o par inglés-galego utilizando unha arquitectura transformer.
|
17 |
|
18 |
+
**Como traducir / How to translate**
|
19 |
|
20 |
+ Abrir terminal bash
|
21 |
+ Instalar [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
|
|
23 |
+ Traducir un input_text utilizando o modelo NOS-MT-en-gl co seguinte comando:
|
24 |
|
25 |
```bash
|
26 |
+
onmt_translate -src input_text -model NOS-MT-en-gl -output ./output_file.txt -replace_unk -phrase_table phrase_table-en-gl.txt -gpu 0
|
27 |
```
|
28 |
+ O resultado da tradución estará no PATH indicado no flag -output.
|
29 |
|
30 |
+
**Adestramento / Training**
|
31 |
|
32 |
Datos utilizados para o adestramento
|
33 |
|
34 |
Auténticos e Sintéticos (Transliteração)[Colocar Paper]
|
35 |
|
36 |
+
**Procedemento de adestramento / Training process**
|
37 |
|
38 |
+ Tokenization dos datasets feita co tokenizador de linguakit https://github.com/citiususc/Linguakit
|
39 |
|
|
|
46 |
onmt_train -config bpe-en-gl_emb.yaml
|
47 |
```
|
48 |
|
49 |
+
**Hiperparámetros / Hyper-parameters**
|
50 |
|
51 |
Os parámetros usados para o desenvolvimento do modelo poden ser consultados directamente no mesmo ficheiro .yaml bpe-en-gl_emb.yaml
|
52 |
|
53 |
+
**Avaliación / Evaluation**
|
54 |
A avalación dos modelos é feita cunha mistura de tests desenvolvidos internamente
|
55 |
(gold1, gold2, test-suite) con outros datasets disponíbeis en galego (Flores).
|
56 |
|
|
|
60 |
|
61 |
|
62 |
|
63 |
+
**Licenzas do Modelo / Licensing information**
|
64 |
+
|
65 |
+
MIT License
|
66 |
+
|
67 |
+
Copyright (c) 2023 Proxecto Nós
|
68 |
|
69 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
70 |
+
of this software and associated documentation files (the "Software"), to deal
|
71 |
+
in the Software without restriction, including without limitation the rights
|
72 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
73 |
+
copies of the Software, and to permit persons to whom the Software is
|
74 |
+
furnished to do so, subject to the following conditions:
|
75 |
|
76 |
+
The above copyright notice and this permission notice shall be included in all
|
77 |
+
copies or substantial portions of the Software.
|
78 |
+
|
79 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
80 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
81 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
82 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
83 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
84 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
85 |
+
SOFTWARE.
|
86 |
|
87 |
**Financiamento / Funding**
|
88 |
|
|
|
90 |
|
91 |
This research was funded by the project "Nós: Galician in the society and economy of artificial intelligence", agreement between Xunta de Galicia and University of Santiago de Compostela, and grant ED431G2019/04 by the Galician Ministry of Education, University and Professional Training, and the European Regional Development Fund (ERDF/FEDER program), and Groups of Reference: ED431C 2020/21.
|
92 |
|
93 |
+
**Citation Information**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|