File size: 1,892 Bytes
da5cf0a
 
 
631c674
4df187f
 
d071c0b
 
e3d87b9
8ac4fd7
d2428fa
 
4df187f
 
8acfa48
 
 
 
2e2a137
 
8acfa48
 
 
4df187f
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: cc-by-4.0
---
# Automatic Translation Alignment of Ancient Greek Texts
GRC-ALIGNMENT model is an XLM-RoBERTa-based model, fine-tuned for automatic multilingual text alignment at the word level. 
The model is trained on 12 million monolingual ancient Greek tokens with Masked Language Model (MLM) training objective. Further, the model is fine-tuned on 45k parallel sentences, mainly in ancient Greek-English, Greek-Latin, and Greek-Georgian.

### Multilingual Training Dataset
|                Languages                |Sentences |                                      Source                                      |
|:---------------------------------------|:-----------:|:--------------------------------------------------------------------------------|
| GRC-ENG                                 |      32.500 | Perseus Digital Library (Iliad, Odyssey, Xenophon, New Testament)                |
| GRC-LAT                                 |       8.200 | [Digital Fragmenta Historicorum Graecorum project](https://www.dfhg-project.org/) |
| GRC-KAT <br>GRC-ENG <br>GRC-LAT<br>GRC-ITA<br>GRC-POR |       4.000 | [UGARIT Translation Alignment Editor](https://ugarit.ialigner.com/ )             |

### Model Performance
| Languages | Alignment Error Rate |
|:---------:|:--------------------:|
| GRC-ENG   |     19.73% (IterMax) |
| GRC-POR   |     23.91% (IterMax) |
| GRC-LAT   |      10.60% (ArgMax) |

The gold standard datasets are available on [Github](https://github.com/UgaritAlignment/Alignment-Gold-Standards).

If you use this model, please cite our paper:
<pre>
@misc{yousef_palladino_wright_berti_2022,
 title={Automatic Translation Alignment for Ancient Greek and Latin},
 url={osf.io/8epsy},
 DOI={10.31219/osf.io/8epsy},
 publisher={OSF Preprints},
 author={Yousef, Tariq and Palladino, Chiara and Wright, David J and Berti, Monica},
 year={2022},
 month={Apr}
}</pre>