metadata
language:
- en
- is
- multilingual
tags:
- translation
inference:
parameters:
src_lang: en_XX
tgt_lang: is_IS
decoder_start_token_id: 2
max_length: 512
widget:
- text: I once owned a horse. It was black and white.
mBART based translation model
This model was trained to translate multiple sentences at once, compared to one sentence at a time.
It will occasionally combine sentences or add an extra sentence.
This is the same model as are provided on CLARIN: https://repository.clarin.is/repository/xmlui/handle/20.500.12537/278
You can use the following example to get started (note that it is necessary to alter the decoder_start_token_id
of the model):
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch
device = torch.cuda.current_device() if torch.cuda.is_available() else -1
tokenizer = AutoTokenizer.from_pretrained("mideind/nmt-doc-en-is-2022-10",src_lang="en_XX",tgt_lang="is_IS")
model = AutoModelForSeq2SeqLM.from_pretrained("mideind/nmt-doc-en-is-2022-10")
model.config.decoder_start_token_id = 2
translate = pipeline("translation_XX_to_YY",model=model,tokenizer=tokenizer,device=device,src_lang="en_XX",tgt_lang="is_IS")
target_seq = translate("I am using a translation model to translate text from English to Icelandic.",src_lang="en_XX",tgt_lang="is_IS",max_length=128)
print(target_seq[0]['translation_text'].strip('YY '))