--- license: other license_name: coqui-public-model-license license_link: https://coqui.ai/cpml library_name: coqui pipeline_tag: text-to-speech widget: - text: "Abraham said today is a good day to sound like an African" --- # Afro-TTS Afro-TTS is the first pan-African accented English speech synthesis system capable of generating speech in 86 African accents. It includes 1000 personas representing the rich phonological diversity across the continent for applications in Education, Public Health, and Automated Content Creation. Afro-TTS lets you clone voices into different African accents by using just a quick 6-second audio clip. The model was adapted from the XTTS model which was developed by [Coqui Studio](https://coqui.ai/). Read more about this model in our paper: https://arxiv.org/abs/2406.11727 ### Features - Supports 86 unique African accents - Voice cloning with just a 6-second audio clip - Emotion and style transfer by cloning - Multi-accent English speech generation - 24kHz sampling rate for high-quality audio ## Performance Afro-TTS achieves near ground truth Mean Opinion Scores (MOS) for naturalness and accentedness. Objective and subjective evaluations demonstrated that the model generates natural-sounding accented speech, bridging the current gap in the representation of African voices in speech synthesis. ### Languages Afro-TTS supports only English languages for now. Stay tuned as we continue to add support for more languages. If you have any language requests, feel free to reach out! ### Code The code-base for the paper of this model can be found [here](https://github.com/intron-innovation/AfriSpeech-TTS) ### License This model is licensed under [Coqui Public Model License](https://coqui.ai/cpml). There's a lot that goes into a license for generative models, and you can read more of [the origin story of CPML here](https://coqui.ai/blog/tts/cpml). ### Contact Come and join in our Bioramp Community. We're active on [Masakhane Slack Server](https://join.slack.com/t/masakhane-nlp/shared_invite/zt-1zgnxx911-YWvICNas~mpeKDNqiO3r3g) and our [website](https://bioramp.org/). You can also mail the authors at sewade.ogun@inria.fr, tobi@intron.io #### Using Afro-TTS: Install the Coqui TTS package: ```bash pip install TTS ``` Run the following code: ```python from scipy.io.wavfile import write from TTS.tts.configs.xtts_config import XttsConfig from TTS.tts.models.xtts import Xtts config = XttsConfig() config.load_json("intronhealth/afro-tts/config.json") model = Xtts.init_from_config(config) model.load_checkpoint(config, checkpoint_dir="intronhealth/afro-tts/", eval=True) model.cuda() outputs = model.synthesize( "Abraham said today is a good day to sound like an African.", config, speaker_wav="audios/reference_accent.wav", gpt_cond_len=3, language="en", ) write("audios/output.wav", 24000, outputs['wav']) ``` ### BibTeX entry and citation info. ``` @misc{ogun20241000, title={1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis}, author={Sewade Ogun and Abraham T. Owodunni and Tobi Olatunji and Eniola Alese and Babatunde Oladimeji and Tejumade Afonja and Kayode Olaleye and Naome A. Etori and Tosin Adewumi}, year={2024}, eprint={2406.11727}, archivePrefix={arXiv}, primaryClass={id='eess.AS' full_name='Audio and Speech Processing' is_active=True alt_name=None in_archive='eess' is_general=False description='Theory and methods for processing signals representing audio, speech, and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems. Machine learning and pattern analysis applied to any of the above areas is also welcome. Specific topics of interest include: auditory modeling and hearing aids; acoustic beamforming and source localization; classification of acoustic scenes; speaker separation; active noise control and echo cancellation; enhancement; de-reverberation; bioacoustics; music signals analysis, synthesis and modification; music information retrieval; audio for multimedia and joint audio-video processing; spoken and written language modeling, segmentation, tagging, parsing, understanding, and translation; text mining; speech production, perception, and psychoacoustics; speech analysis, synthesis, and perceptual modeling and coding; robust speech recognition; speaker recognition and characterization; deep learning, online learning, and graphical models applied to speech, audio, and language signals; and implementation aspects ranging from system architecture to fast algorithms.'} } ```