Massively Multilingual Speech (MMS) : Text-to-Speech Models

This repository contains the Ayta, Abellen (abp) language text-to-speech (TTS) model checkpoint.

This model is part of Facebook's Massively Multilingual Speech project, aiming to provide speech technology across a diverse range of languages. You can find more details about the supported languages and their ISO 639-3 codes in the MMS Language Coverage Overview.

Usage

Using this checkpoint from Hugging Face Transformers:

from transformers import VitsModel, VitsMmsTokenizer
import torch

model = VitsModel.from_pretrained("Matthijs/mms-tts-abp")
tokenizer = VitsMmsTokenizer.from_pretrained("Matthijs/mms-tts-abp")

text = "some example text in the Ayta, Abellen language"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    output = model(**inputs)

from IPython.display import Audio
Audio(output.audio[0], rate=16000)

Note: For certain checkpoints, the input text must be converted to the Latin alphabet first using the uroman tool.

Model credits

This model was developed by Vineel Pratap et al. and is licensed as CC-BY-NC 4.0

@article{pratap2023mms,
    title={Scaling Speech Technology to 1,000+ Languages},
    author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
    journal={arXiv},
    year={2023}
}