metadata
language:
- fr
- en
tags:
- translation
license: apache-2.0
fra-eng
source language name: French
target language name: English
OPUS readme: README.md
model: transformer-align
source language code: fr
target language code: en
dataset: opus
release date: 2021-02-22
pre-processing: normalization + SentencePiece (spm32k,spm32k)
download original weights: opus-2021-02-22.zip
Training data:
- fra-eng: Tatoeba-train (180923857)
Validation data:
- eng-fra: Tatoeba-dev, 250098
- total-size-shuffled: 249757
- devset-selected: top 5000 lines of Tatoeba-dev.src.shuffled!
Test data:
- newsdiscussdev2015-enfr.fra-eng: 1500/27759
- newsdiscusstest2015-enfr.fra-eng: 1500/26995
- newssyscomb2009.fra-eng: 502/11821
- news-test2008.fra-eng: 2051/49380
- newstest2009.fra-eng: 2525/65402
- newstest2010.fra-eng: 2489/61724
- newstest2011.fra-eng: 3003/74681
- newstest2012.fra-eng: 3003/72812
- newstest2013.fra-eng: 3000/64505
- newstest2014-fren.fra-eng: 3003/70708
- Tatoeba-test.fra-eng: 10000/77174
test set translations file: test.txt
test set scores file: eval.txt
BLEU-scores
Test set score Tatoeba-test.fra-eng 57.8 newsdiscusstest2015-enfr.fra-eng 39.7 newstest2014-fren.fra-eng 38.4 newsdiscussdev2015-enfr.fra-eng 34.4 newstest2013.fra-eng 34.0 newstest2012.fra-eng 33.2 newstest2011.fra-eng 33.1 newstest2010.fra-eng 32.7 newssyscomb2009.fra-eng 31.1 newstest2009.fra-eng 30.5 news-test2008.fra-eng 26.5 chr-F-scores
Test set score Tatoeba-test.fra-eng 0.723 newstest2014-fren.fra-eng 0.636 newsdiscusstest2015-enfr.fra-eng 0.621 newstest2011.fra-eng 0.598 newstest2010.fra-eng 0.593 newstest2012.fra-eng 0.593 newstest2013.fra-eng 0.592 newsdiscussdev2015-enfr.fra-eng 0.587 newssyscomb2009.fra-eng 0.575 newstest2009.fra-eng 0.572 news-test2008.fra-eng 0.544
System Info:
- hf_name: fra-eng
- source_languages: fr
- target_languages: en
- opus_readme_url: https://object.pouta.csc.fi/Tatoeba-MT-models/fra-eng/opus-2021-02-22.zip/README.md
- original_repo: Tatoeba-Challenge
- tags: ['translation']
- languages: ['fr', 'en']
- src_constituents: ['fra']
- tgt_constituents: ['eng']
- src_multilingual: False
- tgt_multilingual: False
- helsinki_git_sha: 6faf2dab0b7b01a0e08a114dbacbb7deac54988d
- transformers_git_sha: e9a6c72b5edfb9561a981959b0e7c62d8ab9ef6c
- port_machine: 146-193-182-187.edr.inesc.pt
- port_time: 2023-11-06-16:20