|
--- |
|
language: |
|
- dsb |
|
- cs |
|
- csb_Latn |
|
- hsb |
|
- pl |
|
- zlw |
|
- hu |
|
- vro |
|
- fi |
|
- liv_Latn |
|
- mdf |
|
- krl |
|
- fkv_Latn |
|
- mhr |
|
- et |
|
- sma |
|
- udm |
|
- vep |
|
- myv |
|
- kpv |
|
- se |
|
- izh |
|
- fiu |
|
|
|
tags: |
|
- translation |
|
|
|
license: apache-2.0 |
|
--- |
|
### zlw-fiu |
|
* source language name: West Slavic languages |
|
* target language name: Finno-Ugrian languages |
|
* OPUS readme: [README.md](https://object.pouta.csc.fi/Tatoeba-MT-models/zlw-fiu/README.md) |
|
* model: transformer |
|
* source language codes: dsb, cs, csb_Latn, hsb, pl, zlw |
|
* target language codes: hu, vro, fi, liv_Latn, mdf, krl, fkv_Latn, mhr, et, sma, udm, vep, myv, kpv, se, izh, fiu |
|
* dataset: opus |
|
* release date: 2021-02-18 |
|
* pre-processing: normalization + SentencePiece (spm32k,spm32k) |
|
* download original weights: [opus-2021-02-18.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/zlw-fiu/opus-2021-02-18.zip/zlw-fiu/opus-2021-02-18.zip) |
|
* a sentence-initial language token is required in the form of >>id<<(id = valid, usually three-letter target language ID) |
|
* Training data: |
|
* ces-fin: Tatoeba-train (1000000) |
|
* ces-hun: Tatoeba-train (1000000) |
|
* pol-est: Tatoeba-train (1000000) |
|
* pol-fin: Tatoeba-train (1000000) |
|
* pol-hun: Tatoeba-train (1000000) |
|
* Validation data: |
|
* ces-fin: Tatoeba-dev, 1000 |
|
* ces-hun: Tatoeba-dev, 1000 |
|
* est-pol: Tatoeba-dev, 1000 |
|
* fin-pol: Tatoeba-dev, 1000 |
|
* hun-pol: Tatoeba-dev, 1000 |
|
* mhr-pol: Tatoeba-dev, 461 |
|
* total-size-shuffled: 5426 |
|
* devset-selected: top 5000 lines of Tatoeba-dev.src.shuffled! |
|
* Test data: |
|
* newssyscomb2009.ces-hun: 502/9733 |
|
* newstest2009.ces-hun: 2525/54965 |
|
* Tatoeba-test.ces-fin: 88/408 |
|
* Tatoeba-test.ces-hun: 1911/10336 |
|
* Tatoeba-test.multi-multi: 4562/25497 |
|
* Tatoeba-test.pol-chm: 5/36 |
|
* Tatoeba-test.pol-est: 15/98 |
|
* Tatoeba-test.pol-fin: 609/3293 |
|
* Tatoeba-test.pol-hun: 1934/11285 |
|
* test set translations file: [test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/zlw-fiu/opus-2021-02-18.zip/zlw-fiu/opus-2021-02-18.test.txt) |
|
* test set scores file: [eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/zlw-fiu/opus-2021-02-18.zip/zlw-fiu/opus-2021-02-18.eval.txt) |
|
* BLEU-scores |
|
|Test set|score| |
|
|---|---| |
|
|Tatoeba-test.ces-fin|57.2| |
|
|Tatoeba-test.ces-hun|42.6| |
|
|Tatoeba-test.multi-multi|39.4| |
|
|Tatoeba-test.pol-hun|36.6| |
|
|Tatoeba-test.pol-fin|36.1| |
|
|Tatoeba-test.pol-est|20.9| |
|
|newssyscomb2009.ces-hun|13.9| |
|
|newstest2009.ces-hun|13.9| |
|
|Tatoeba-test.pol-chm|2.0| |
|
* chr-F-scores |
|
|Test set|score| |
|
|---|---| |
|
|Tatoeba-test.ces-fin|0.71| |
|
|Tatoeba-test.ces-hun|0.637| |
|
|Tatoeba-test.multi-multi|0.616| |
|
|Tatoeba-test.pol-hun|0.605| |
|
|Tatoeba-test.pol-fin|0.592| |
|
|newssyscomb2009.ces-hun|0.449| |
|
|newstest2009.ces-hun|0.443| |
|
|Tatoeba-test.pol-est|0.372| |
|
|Tatoeba-test.pol-chm|0.007| |
|
|
|
### System Info: |
|
* hf_name: zlw-fiu |
|
* source_languages: dsb,cs,csb_Latn,hsb,pl,zlw |
|
* target_languages: hu,vro,fi,liv_Latn,mdf,krl,fkv_Latn,mhr,et,sma,udm,vep,myv,kpv,se,izh,fiu |
|
* opus_readme_url: https://object.pouta.csc.fi/Tatoeba-MT-models/zlw-fiu/opus-2021-02-18.zip/README.md |
|
* original_repo: Tatoeba-Challenge |
|
* tags: ['translation'] |
|
* languages: ['dsb', 'cs', 'csb_Latn', 'hsb', 'pl', 'zlw', 'hu', 'vro', 'fi', 'liv_Latn', 'mdf', 'krl', 'fkv_Latn', 'mhr', 'et', 'sma', 'udm', 'vep', 'myv', 'kpv', 'se', 'izh', 'fiu'] |
|
* src_constituents: ['dsb', 'ces', 'csb_Latn', 'hsb', 'pol'] |
|
* tgt_constituents: ['hun', 'vro', 'fin', 'liv_Latn', 'mdf', 'krl', 'fkv_Latn', 'mhr', 'est', 'sma', 'udm', 'vep', 'myv', 'kpv', 'sme', 'izh'] |
|
* src_multilingual: True |
|
* tgt_multilingual: True |
|
* helsinki_git_sha: a0966db6db0ae616a28471ff0faf461b36fec07d |
|
* transformers_git_sha: 3857f2b4e34912c942694489c2b667d9476e55f5 |
|
* port_machine: bungle |
|
* port_time: 2021-06-29-15:24 |