jramompichel
commited on
Commit
•
f998837
1
Parent(s):
40015b0
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,63 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
**Descrición do Modelo**
|
6 |
+
|
7 |
+
Modelo feito con OpenNMT para o par español-galego utilizando unha arquitectura transformer.
|
8 |
+
|
9 |
+
**Como utilizar**
|
10 |
+
|
11 |
+
+ Abrir terminal bash
|
12 |
+
+ Instalar [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
13 |
+
+ Instalar [Open NMT toolkit v.2.2](https://github.com/OpenNMT/OpenNMT-py)
|
14 |
+
+ Traducir un input_text utilizando o modelo NOS-MT-es-gl co seguinte comando:
|
15 |
+
|
16 |
+
```bash
|
17 |
+
onmt_translate -src input_text -model NOS-MT-es-gl -output ./output_file.txt -replace_unk -phrase_table phrase_table-es-gl.txt -gpu 0
|
18 |
+
```
|
19 |
+
+ O resultado da tradución estará no PATH indicado no flag -output.
|
20 |
+
|
21 |
+
**Adestramento**
|
22 |
+
|
23 |
+
Datos utilizados para o adestramento
|
24 |
+
|
25 |
+
As a data for fine-tuning we used the Softcatalà Catalan-German parallel corpus dataset, with sentences deduplicated and filtered by the GEnCaTa quality filter.
|
26 |
+
|
27 |
+
Auténticos e Sintéticos (Transliteração)[Colocar Paper]
|
28 |
+
|
29 |
+
**Procedemento de adestramento**
|
30 |
+
|
31 |
+
Tokenization
|
32 |
+
|
33 |
+
The original m2m100_418M model's sentencepiece tokenizer was used.
|
34 |
+
|
35 |
+
BPE
|
36 |
+
|
37 |
+
**Hiperparámetros**
|
38 |
+
|
39 |
+
The model was trained for 2 epochs with the default parameters and LR=2e−5LR = 2\mathrm{e}{-5}LR=2e−5.
|
40 |
+
|
41 |
+
Colocar o yaml para cada um dos pares
|
42 |
+
|
43 |
+
**Avaliación**
|
44 |
+
|
45 |
+
|
46 |
+
**Información adicional**
|
47 |
+
|
48 |
+
Licensing information
|
49 |
+
|
50 |
+
Apache License, Version 2.0
|
51 |
+
|
52 |
+
**Financiamento**
|
53 |
+
|
54 |
+
This work was funded by the Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya within the framework of Projecte AINA.
|
55 |
+
|
56 |
+
Citation Information
|
57 |
+
|
58 |
+
@article{garriga2022catalan,
|
59 |
+
title={A Catalan-German machine translation system based on the M2M-100 multilingual model},
|
60 |
+
author={Garriga Riba, Pol},
|
61 |
+
year={2022},
|
62 |
+
url={https://repositori.upf.edu/bitstream/handle/10230/54301/GarrigaRiba_2022.pdf?sequence=1&isAllowed=y}
|
63 |
+
}
|