Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- kbd
|
5 |
+
datasets:
|
6 |
+
- anzorq/kbd_speech
|
7 |
+
pipeline_tag: text-to-speech
|
8 |
+
---
|
9 |
+
# MMS-TTS Fine-tuned for Kabardian (Speaker: Sokhov Murat)
|
10 |
+
|
11 |
+
This repository contains a fine-tuned version of Facebook's MMS-TTS model, adapted for generating speech in the Kabardian language. The model is trained on a dataset of audio recordings by the speaker Sokhov Murat.
|
12 |
+
|
13 |
+
## Model Details
|
14 |
+
|
15 |
+
- Base Model: [facebook/mms-tts](https://huggingface.co/facebook/mms-tts)
|
16 |
+
- Fine-tuned on: [anzorq/kbd_speech](https://huggingface.co/datasets/anzorq/kbd_speech) dataset
|
17 |
+
- Speaker: Sokhov Murat
|
18 |
+
- Language: Circassian (Kabardian)
|
19 |
+
|
20 |
+
## Usage
|
21 |
+
|
22 |
+
To use this model for text-to-speech generation, you can leverage the `pipeline` functionality from the Transformers library. Here's an example:
|
23 |
+
|
24 |
+
```python
|
25 |
+
from transformers import pipeline
|
26 |
+
import scipy
|
27 |
+
|
28 |
+
model_id = "anzorq/mms_finetune_kbd_murat"
|
29 |
+
synthesiser = pipeline("text-to-speech", model_id, device=0) # add device=0 if you want to use a GPU
|
30 |
+
|
31 |
+
text = "дауэ ущыт?"
|
32 |
+
speech = synthesiser(text)
|
33 |
+
|
34 |
+
# Save the generated audio to a file
|
35 |
+
scipy.io.wavfile.write("finetuned_output.wav", rate=speech["sampling_rate"], data=speech["audio"][0])
|
36 |
+
```
|
37 |
+
|
38 |
+
This code will generate an audio file `finetuned_output.wav` containing the speech synthesis for the provided Kabardian text.
|
39 |
+
|
40 |
+
## Notes
|
41 |
+
|
42 |
+
- Since there is no pre-trained checkpoint for Kabardian in the original MMS-TTS model, a pre-trained checkpoint for a language with the closest character set (Chechen) was used for fine-tuning.
|
43 |
+
- This model's performance is considerably worse than that of the fine-tuned VITS model [anzorq/kbd-vits-tts-male](https://huggingface.co/anzorq/kbd-vits-tts-male) for Kabardian text-to-speech.
|
44 |
+
|
45 |
+
## License
|
46 |
+
|
47 |
+
The original MMS-TTS model by Meta is licensed under the CC-BY-NC-4.0 License. This fine-tuned version inherits the same license.
|
48 |
+
|
49 |
+
## Acknowledgments
|
50 |
+
|
51 |
+
- [AI at Meta](https://ai.meta.com//) for the original MMS-TTS model.
|
52 |
+
- [Sokhov Murat](https://www.instagram.com/carbatay) for providing the audio recordings used for fine-tuning.
|