Itzune v1.0 EN -> EU machine translation argos model
This model was trained using argostrain training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the Opus project.
Model description
- Developed by: Basque community
- Model type: traslation
- Model version: v1.0
- Source Language: English
- Target Language: Basque
- License: MIT
Training Data
The English-Basque parallel sentences were collected from the following datasets:
Dataset | Sentences before cleaning |
---|---|
CCMatrix v1 | 7,788,871 |
OpenSubtitles v2018 | 805,780 |
XLEnt v1.2 | 800,631 |
GNOME v1 | 652,298 |
HPLT v1.1 | 610,694 |
EhuHac v1 | 585,210 |
WikiMatrix v1 | 119,480 |
KDE4 v2 | 100,160 |
wikimedia v20230407 | 60,990 |
bible-uedin v1 | 15,893 |
Tatoeba v2023-04-12 | 2,070 |
Wiktionary | 629 |
Total | 11,542,706 |