--- language: - lg - sw tags: - text-to-speech - TTS - speech-synthesis - VITS license: cc-by-4.0 datasets: - mozilla-foundation/common_voice_13_0 pipeline_tag: text-to-speech ---

# Text-to-Speech (TTS) with VITS trained on Kiswahili and Luganda Common Voice This repository provides all the necessary tools for Text-to-Speech (TTS) with Coqui TTS using a [VITS](https://arxiv.org/abs/2106.06103) fine-tuned on Kiswahili and Luganda Common Voice v13 from six speakers of a similar intonation. The pre-trained model takes in as input a text and produces a waveform/audio in output. # How to Synthesize Speech using our models First, you need to install TTS ``` pip install TTS ``` ### Perform Text-to-Speech (TTS) ```python from TTS.utils.synthesizer import Synthesizer synthesizer = Synthesizer( "", "", None, None, None, None, None, None, None, ) sentence_to_synthesize = "Your Kiswahili or Luganda sentence here" if sentence_to_synthesize: print(sentence_to_synthesize) wav = synthesizer.tts(sentence_to_synthesize, None, None, None) location = "output.wav" # Choose a desired name for the output file synthesizer.save_wav(wav, location) ``` ### Limitations We do not provide any warranty on the performance achieved by this model when used on other datasets. # **Citing** Please, cite our work if you use our models for your research or business. ```bibtex @inproceedings{buildingTTS, title={Building a Luganda Text-to-Speech Model from Crowdsourced Data}, author={Kagumire, Sulaiman and Katumba, Andrew and Nakatumba-Nabende, Joyce and Quinn, John}, booktitle={5th Workshop on African Natural Language Processing}, year ={2024} } ```