|
--- |
|
language: |
|
- ru |
|
- zh |
|
- en |
|
tags: |
|
- translation |
|
- text2text-generation |
|
- t5 |
|
license: apache-2.0 |
|
datasets: |
|
- ccmatrix |
|
metrics: |
|
- sacrebleu |
|
widget: |
|
- example_title: translate zh-ru |
|
text: > |
|
translate to ru: 开发的目的是为用户提供个人同步翻译。 |
|
- example_title: translate ru-en |
|
text: > |
|
translate to en: Цель разработки — предоставить пользователям личного синхронного переводчика. |
|
- example_title: translate en-ru |
|
text: > |
|
translate to ru: The purpose of the development is to provide users with a personal synchronized interpreter. |
|
- example_title: translate en-zh |
|
text: > |
|
translate to zh: The purpose of the development is to provide users with a personal synchronized interpreter. |
|
- example_title: translate zh-en |
|
text: > |
|
translate to en: 开发的目的是为用户提供个人同步解释器。 |
|
- example_title: translate ru-zh |
|
text: > |
|
translate to zh: Цель разработки — предоставить пользователям личного синхронного переводчика. |
|
model-index: |
|
- name: utrobinmv/t5_translate_en_ru_zh_base_200 |
|
results: |
|
- task: |
|
type: translation |
|
name: Translation en-ru |
|
dataset: |
|
name: ntrex_en-ru |
|
type: ntrex |
|
config: ntrex en-ru |
|
split: test |
|
metrics: |
|
- type: sacrebleu |
|
value: 28.575940911021487 |
|
name: bleu |
|
verified: false |
|
- type: chrf |
|
value: 54.27996346886896 |
|
name: chrf |
|
verified: false |
|
- type: ter |
|
value: 62.494863914873584 |
|
name: ter |
|
verified: false |
|
- type: meteor |
|
value: 0.5174833677740809 |
|
name: meteor |
|
verified: false |
|
- type: rouge |
|
value: 0.1908317951570274 |
|
name: ROUGE-1 |
|
verified: false |
|
- type: rouge |
|
value: 0.065555552204933 |
|
name: ROUGE-2 |
|
verified: false |
|
- type: rouge |
|
value: 0.1895542893295215 |
|
name: ROUGE-L |
|
verified: false |
|
- type: rouge |
|
value: 0.1893813749889601 |
|
name: ROUGE-LSUM |
|
verified: false |
|
- type: bertscore |
|
value: 0.8554933660030365 |
|
name: bertscore_f1 |
|
verified: false |
|
- type: bertscore |
|
value: 0.8578473615646363 |
|
name: bertscore_precision |
|
verified: false |
|
- type: bertscore |
|
value: 0.8534188346862793 |
|
name: bertscore_recall |
|
verified: false |
|
source: |
|
name: NTREX dataset Benchmark |
|
url: https://huggingface.co/spaces/utrobinmv/TREX_benchmark_en_ru_zh |
|
|
|
- name: utrobinmv/t5_translate_en_ru_zh_base_200 |
|
results: |
|
- task: |
|
type: translation |
|
name: Translation ru-en |
|
dataset: |
|
name: ntrex_ru-en |
|
type: ntrex |
|
config: ntrex ru-en |
|
split: test |
|
metrics: |
|
- type: sacrebleu |
|
value: 28.575940911021487 |
|
name: bleu |
|
verified: false |
|
- type: chrf |
|
value: 54.27996346886896 |
|
name: chrf |
|
verified: false |
|
- type: ter |
|
value: 62.494863914873584 |
|
name: ter |
|
verified: false |
|
- type: meteor |
|
value: 0.5174833677740809 |
|
name: meteor |
|
verified: false |
|
- type: rouge |
|
value: 0.1908317951570274 |
|
name: ROUGE-1 |
|
verified: false |
|
- type: rouge |
|
value: 0.065555552204933 |
|
name: ROUGE-2 |
|
verified: false |
|
- type: rouge |
|
value: 0.1895542893295215 |
|
name: ROUGE-L |
|
verified: false |
|
- type: rouge |
|
value: 0.1893813749889601 |
|
name: ROUGE-LSUM |
|
verified: false |
|
- type: bertscore |
|
value: 0.8554933660030365 |
|
name: bertscore_f1 |
|
verified: false |
|
- type: bertscore |
|
value: 0.8578473615646363 |
|
name: bertscore_precision |
|
verified: false |
|
- type: bertscore |
|
value: 0.8534188346862793 |
|
name: bertscore_recall |
|
verified: false |
|
source: |
|
name: NTREX dataset Benchmark |
|
url: https://huggingface.co/spaces/utrobinmv/TREX_benchmark_en_ru_zh |
|
|
|
--- |
|
|
|
# T5 English, Russian and Chinese multilingual machine translation |
|
|
|
This model represents a conventional T5 transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en. |
|
|
|
The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to <lang>:'. In this case, the source language may not be specified, in addition, the source text may be multilingual. |
|
|
|
Example translate Russian to Chinese |
|
|
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024' |
|
model = T5ForConditionalGeneration.from_pretrained(model_name) |
|
tokenizer = T5Tokenizer.from_pretrained(model_name) |
|
|
|
prefix = 'translate to zh: ' |
|
src_text = prefix + "Цель разработки — предоставить пользователям личного синхронного переводчика." |
|
|
|
# translate Russian to Chinese |
|
input_ids = tokenizer(src_text, return_tensors="pt") |
|
|
|
generated_tokens = model.generate(**input_ids) |
|
|
|
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) |
|
print(result) |
|
#开发的目的是为用户提供个人同步翻译。 |
|
``` |
|
|
|
|
|
|
|
and Example translate Chinese to Russian |
|
|
|
```python |
|
from transformers import T5ForConditionalGeneration, T5Tokenizer |
|
|
|
model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024' |
|
model = T5ForConditionalGeneration.from_pretrained(model_name) |
|
tokenizer = T5Tokenizer.from_pretrained(model_name) |
|
|
|
prefix = 'translate to ru: ' |
|
src_text = prefix + "开发的目的是为用户提供个人同步翻译。" |
|
|
|
# translate Russian to Chinese |
|
input_ids = tokenizer(src_text, return_tensors="pt") |
|
|
|
generated_tokens = model.generate(**input_ids) |
|
|
|
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True) |
|
print(result) |
|
#Цель разработки - предоставить пользователям персональный синхронный перевод. |
|
``` |
|
|
|
|
|
|
|
## |
|
|
|
|
|
|
|
## Languages covered |
|
|
|
Russian (ru_RU), Chinese (zh_CN), English (en_US) |
|
|