Edit model card

Model Name

This is a multilingually fine-tuned version of NLLB based on nllb-200-distilled-600M using the text data of MuST-C v1.0 (En -> 8).

It is part of the paper Pushing the Limits of Zero-shot End-to-end Speech Translation. Details for the fine-tuning process are available at Appendix D.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("johntsi/nllb-200-distilled-600M_mustc_en-to-8")
model = AutoModelForSeq2SeqLM.from_pretrained("johntsi/nllb-200-distilled-600M_mustc_en-to-8")

model.eval()
model.to("cuda")

text = "Translate this text to German."
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    num_beams=5,
    forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"]
)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated_text)

Results

BLEU scores on MuST-C v1.0 tst-COMMON

Model De Es Fr It Nl Pt Ro Ru Average
nllb-200-distilled-600M (original) 32.7 36.9 45.2 32.2 36.0 37.4 30.3 21.0 34.0
nllb-200-distilled-600M_mustc_en-to-8 34.4 38.8 44.6 34.7 39.0 41.6 32.1 22.4 35.9
nllb-200-distilled-1.3B (original) 34.6 38.6 46.8 33.7 38.2 39.6 31.8 23.2 35.8
nllb-200-distilled-1.3B_mustc_en-to-8 35.3 39.9 45.8 36.0 40.6 43.1 32.6 23.9 37.2

Citation

If you find these models useful for your research, please cite our paper :)

@misc{tsiamas2024pushing,
      title={{Pushing the Limits of Zero-shot End-to-End Speech Translation}}, 
      author={Ioannis Tsiamas and Gerard I. Gállego and José A. R. Fonollosa and Marta R. Costa-jussà},
      year={2024},
      eprint={2402.10422},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
615M params
Tensor type
F32
·
Inference API
Examples
This model can be loaded on Inference API (serverless).

Collection including johntsi/nllb-200-distilled-600M_mustc_en-to-8