|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- kbd |
|
datasets: |
|
- anzorq/kbd_speech |
|
pipeline_tag: text-to-speech |
|
--- |
|
# MMS-TTS Fine-tuned for Kabardian (Speaker: Sokhov Murat) |
|
|
|
This repository contains a fine-tuned version of Facebook's MMS-TTS model, adapted for generating speech in the Kabardian language. The model is trained on a dataset of audio recordings by the speaker Sokhov Murat. |
|
|
|
## Model Details |
|
|
|
- Base Model: [facebook/mms-tts](https://huggingface.co/facebook/mms-tts) |
|
- Fine-tuned on: [anzorq/kbd_speech](https://huggingface.co/datasets/anzorq/kbd_speech) dataset |
|
- Training steps: 5,100 |
|
- Speaker: [Sokhov Murat](https://www.instagram.com/carbatay) |
|
- Language: Circassian (Kabardian) |
|
|
|
## Usage |
|
|
|
To use this model for text-to-speech generation, you can leverage the `pipeline` functionality from the Transformers library. Here's an example: |
|
|
|
```python |
|
from transformers import pipeline |
|
import scipy |
|
|
|
model_id = "anzorq/mms_finetune_kbd_murat" |
|
synthesiser = pipeline("text-to-speech", model_id, device=0) # add device=0 if you want to use a GPU |
|
|
|
text = "дауэ ущыт?" |
|
speech = synthesiser(text) |
|
|
|
# Save the generated audio to a file |
|
scipy.io.wavfile.write("finetuned_output.wav", rate=speech["sampling_rate"], data=speech["audio"][0]) |
|
``` |
|
|
|
This code will generate an audio file `finetuned_output.wav` containing the speech synthesis for the provided Kabardian text. |
|
|
|
## Notes |
|
|
|
- Fine-tuned following the guide at https://github.com/ylacombe/finetune-hf-vits |
|
- Since no pre-trained MMS-TTS model was available for Kabardian, we fine-tuned a model for Chechen, which has the closest character set to Kabardian. |
|
- Do not use in production. This model's performance is considerably worse than that of the fine-tuned VITS model [anzorq/kbd-vits-tts-male](https://huggingface.co/anzorq/kbd-vits-tts-male) for Kabardian text-to-speech. |
|
|
|
## License |
|
|
|
The original MMS-TTS model by Meta is licensed under the CC-BY-NC-4.0 License. This fine-tuned version inherits the same license. |
|
|
|
## Acknowledgments |
|
|
|
- [AI at Meta](https://ai.meta.com//) for the original MMS-TTS model. |
|
- [Sokhov Murat](https://www.instagram.com/carbatay) for providing the audio recordings used for fine-tuning. |