|
--- |
|
language: |
|
- en |
|
- is |
|
- multilingual |
|
tags: |
|
- translation |
|
inference: |
|
parameters: |
|
src_lang: en_XX |
|
tgt_lang: is_IS |
|
decoder_start_token_id: 2 |
|
max_length: 512 |
|
widget: |
|
- text: I once owned a horse. It was black and white. |
|
--- |
|
# mBART based translation model |
|
This model was trained to translate multiple sentences at once, compared to one sentence at a time. |
|
|
|
It will occasionally combine sentences or add an extra sentence. |
|
|
|
This is the same model as are provided on CLARIN: https://repository.clarin.is/repository/xmlui/handle/20.500.12537/278 |
|
|
|
You can use the following example to get started: |
|
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline |
|
import torch |
|
|
|
device = torch.cuda.current_device() if torch.cuda.is_available() else -1 |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("mideind/nmt-doc-en-is-2022-10",src_lang="en_XX",tgt_lang="is_IS") |
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained("mideind/nmt-doc-en-is-2022-10") |
|
|
|
translate = pipeline("translation_XX_to_YY",model=model,tokenizer=tokenizer,device=device,src_lang="en_XX",tgt_lang="is_IS") |
|
|
|
target_seq = translate("I am using a translation model to translate text from English to Icelandic.",src_lang="en_XX",tgt_lang="is_IS",max_length=128) |
|
print(target_seq[0]['translation_text'].strip('YY ')) |
|
|