pradnya-hf-dev
commited on
Commit
•
b410a00
1
Parent(s):
a00ad35
Update README.md
Browse files
README.md
CHANGED
@@ -18,9 +18,9 @@ metrics:
|
|
18 |
|
19 |
**IMPORTANT: This is a work in progress. This model is not providing meaningful output at the moment**
|
20 |
|
21 |
-
# Text-to-Speech (TTS) with
|
22 |
|
23 |
-
This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a [
|
24 |
|
25 |
The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
|
26 |
|
@@ -34,38 +34,38 @@ pip install speechbrain
|
|
34 |
Please notice that we encourage you to read our tutorials and learn more about
|
35 |
[SpeechBrain](https://speechbrain.github.io).
|
36 |
|
37 |
-
### Perform Text-to-Speech (TTS) with
|
38 |
|
39 |
```
|
40 |
import torchaudio
|
41 |
-
from speechbrain.pretrained import
|
42 |
from speechbrain.pretrained import HIFIGAN
|
43 |
|
44 |
# Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
|
45 |
-
fastspeech2 =
|
46 |
-
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-
|
47 |
|
48 |
# Running the TTS
|
49 |
-
mel_output,
|
50 |
|
51 |
# Running Vocoder (spectrogram-to-waveform)
|
52 |
waveforms = hifi_gan.decode_batch(mel_output)
|
53 |
|
54 |
# Save the waverform
|
55 |
-
torchaudio.save('example_TTS.wav',waveforms.squeeze(1),
|
56 |
```
|
57 |
|
58 |
If you want to generate multiple sentences in one-shot, you can do in this way:
|
59 |
|
60 |
```
|
61 |
-
from speechbrain.pretrained import
|
62 |
-
|
63 |
items = [
|
64 |
"A quick brown fox jumped over the lazy dog",
|
65 |
"How much wood would a woodchuck chuck?",
|
66 |
"Never odd or even"
|
67 |
]
|
68 |
-
mel_outputs,
|
69 |
|
70 |
```
|
71 |
|
|
|
18 |
|
19 |
**IMPORTANT: This is a work in progress. This model is not providing meaningful output at the moment**
|
20 |
|
21 |
+
# Text-to-Speech (TTS) with FastSpeech2 trained on LJSpeech
|
22 |
|
23 |
+
This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a [FastSpeech2](https://arxiv.org/abs/2006.04558) pretrained on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
|
24 |
|
25 |
The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
|
26 |
|
|
|
34 |
Please notice that we encourage you to read our tutorials and learn more about
|
35 |
[SpeechBrain](https://speechbrain.github.io).
|
36 |
|
37 |
+
### Perform Text-to-Speech (TTS) with FastSpeech2
|
38 |
|
39 |
```
|
40 |
import torchaudio
|
41 |
+
from speechbrain.pretrained import FastSpeech2
|
42 |
from speechbrain.pretrained import HIFIGAN
|
43 |
|
44 |
# Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
|
45 |
+
fastspeech2 = FastSpeech2.from_hparams(source="speechbrain/tts-fastspeech2-ljspeech", savedir="tmpdir_tts")
|
46 |
+
hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-libritts-16kHz", savedir="tmpdir_vocoder")
|
47 |
|
48 |
# Running the TTS
|
49 |
+
mel_output, durations, pitch, energy = fastspeech2.encode_text(input_text)
|
50 |
|
51 |
# Running Vocoder (spectrogram-to-waveform)
|
52 |
waveforms = hifi_gan.decode_batch(mel_output)
|
53 |
|
54 |
# Save the waverform
|
55 |
+
torchaudio.save('example_TTS.wav', waveforms.squeeze(1), 16000)
|
56 |
```
|
57 |
|
58 |
If you want to generate multiple sentences in one-shot, you can do in this way:
|
59 |
|
60 |
```
|
61 |
+
from speechbrain.pretrained import FastSpeech2
|
62 |
+
fastspeech2 = FastSpeech2.from_hparams(source="speechbrain/tts-fastspeech2-ljspeech", savedir="tmpdir_tts")
|
63 |
items = [
|
64 |
"A quick brown fox jumped over the lazy dog",
|
65 |
"How much wood would a woodchuck chuck?",
|
66 |
"Never odd or even"
|
67 |
]
|
68 |
+
mel_outputs, durations, pitch, energy = fastspeech2.encode_batch(items)
|
69 |
|
70 |
```
|
71 |
|