hafsteinn's picture
Update README.md
0321fb7
|
raw
history blame
1.47 kB
metadata
language:
  - en
  - is
  - multilingual
tags:
  - translation
inference:
  parameters:
    src_lang: en_XX
    tgt_lang: is_IS
    decoder_start_token_id: 2
    max_length: 512
widget:
  - text: I once owned a horse. It was black and white.

mBART based translation model

This model was trained to translate multiple sentences at once, compared to one sentence at a time.

It will occasionally combine sentences or add an extra sentence.

This is the same model as are provided on CLARIN: https://repository.clarin.is/repository/xmlui/handle/20.500.12537/278

You can use the following example to get started (note that it is necessary to alter the decoder_start_token_id of the model):

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch

device = torch.cuda.current_device() if torch.cuda.is_available() else -1

tokenizer = AutoTokenizer.from_pretrained("mideind/nmt-doc-en-is-2022-10",src_lang="en_XX",tgt_lang="is_IS")

model = AutoModelForSeq2SeqLM.from_pretrained("mideind/nmt-doc-en-is-2022-10")
model.config.decoder_start_token_id = 2

translate = pipeline("translation_XX_to_YY",model=model,tokenizer=tokenizer,device=device,src_lang="en_XX",tgt_lang="is_IS")

target_seq = translate("I am using a translation model to translate text from English to Icelandic.",src_lang="en_XX",tgt_lang="is_IS",max_length=128)
print(target_seq[0]['translation_text'].strip('YY '))