|
--- |
|
license: mit |
|
datasets: |
|
- lunarlist/edited_common_voice |
|
language: |
|
- th |
|
library_name: nemo |
|
pipeline_tag: text-to-speech |
|
--- |
|
|
|
This model is a Thai TTS model that use a voice from [Common Voice dataset](https://commonvoice.mozilla.org/) and modify the voice to not to sound like the original. |
|
|
|
> pip install nemo_toolkit['tts'] soundfile |
|
|
|
```python |
|
from nemo.collections.tts.models import UnivNetModel |
|
from nemo.collections.tts.models import Tacotron2Model |
|
import torch |
|
import soundfile as sf |
|
|
|
model = Tacotron2Model.from_pretrained("lunarlist/tts-thai").to('cpu') |
|
vcoder_model = UnivNetModel.from_pretrained(model_name="tts_en_libritts_univnet") |
|
text='ภาษาไทย ง่าย นิด เดียว' |
|
dict_idx={k:i for i,k in enumerate(model.hparams["cfg"]['labels'])} |
|
parsed2=torch.Tensor([[66]+[dict_idx[i] for i in text if i]+[67]]).int().to("cpu") |
|
spectrogram2 = model.generate_spectrogram(tokens=parsed2) |
|
audio2 = vcoder_model.convert_spectrogram_to_audio(spec=spectrogram2) |
|
|
|
# Save the audio to disk in a file called speech.wav |
|
sf.write("speech.wav", audio2.to('cpu').detach().numpy()[0], 22050) |
|
``` |
|
|
|
Medium: [Text-To-Speech ภาษาไทยด้วย Tacotron2](https://medium.com/@taetiyateachamatavorn/text-to-speech-%E0%B8%A0%E0%B8%B2%E0%B8%A9%E0%B8%B2%E0%B9%84%E0%B8%97%E0%B8%A2%E0%B8%94%E0%B9%89%E0%B8%A7%E0%B8%A2-tacotron2-986417b44edc) |