pradnya-hf-dev commited on
Commit
b410a00
1 Parent(s): a00ad35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -18,9 +18,9 @@ metrics:
18
 
19
  **IMPORTANT: This is a work in progress. This model is not providing meaningful output at the moment**
20
 
21
- # Text-to-Speech (TTS) with Fastspeech2 trained on LJSpeech
22
 
23
- This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a [Tacotron2](https://arxiv.org/abs/1712.05884) pretrained on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
24
 
25
  The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
26
 
@@ -34,38 +34,38 @@ pip install speechbrain
34
  Please notice that we encourage you to read our tutorials and learn more about
35
  [SpeechBrain](https://speechbrain.github.io).
36
 
37
- ### Perform Text-to-Speech (TTS) with Fastspeech2
38
 
39
  ```
40
  import torchaudio
41
- from speechbrain.pretrained import Tacotron2
42
  from speechbrain.pretrained import HIFIGAN
43
 
44
  # Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
45
- fastspeech2 = Tacotron2.from_hparams(source="speechbrain/tts-fastspeech2-ljspeech", savedir="tmpdir_tts")
46
- hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
47
 
48
  # Running the TTS
49
- mel_output, mel_length, alignment = fastspeech2.encode_text("Mary had a little lamb")
50
 
51
  # Running Vocoder (spectrogram-to-waveform)
52
  waveforms = hifi_gan.decode_batch(mel_output)
53
 
54
  # Save the waverform
55
- torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
56
  ```
57
 
58
  If you want to generate multiple sentences in one-shot, you can do in this way:
59
 
60
  ```
61
- from speechbrain.pretrained import fastspeech2
62
- tacotron2 = Tacotron2.from_hparams(source="speechbrain/TTS_fastspeech2", savedir="tmpdir")
63
  items = [
64
  "A quick brown fox jumped over the lazy dog",
65
  "How much wood would a woodchuck chuck?",
66
  "Never odd or even"
67
  ]
68
- mel_outputs, mel_lengths, alignments = tacotron2.encode_batch(items)
69
 
70
  ```
71
 
 
18
 
19
  **IMPORTANT: This is a work in progress. This model is not providing meaningful output at the moment**
20
 
21
+ # Text-to-Speech (TTS) with FastSpeech2 trained on LJSpeech
22
 
23
+ This repository provides all the necessary tools for Text-to-Speech (TTS) with SpeechBrain using a [FastSpeech2](https://arxiv.org/abs/2006.04558) pretrained on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
24
 
25
  The pre-trained model takes in input a short text and produces a spectrogram in output. One can get the final waveform by applying a vocoder (e.g., HiFIGAN) on top of the generated spectrogram.
26
 
 
34
  Please notice that we encourage you to read our tutorials and learn more about
35
  [SpeechBrain](https://speechbrain.github.io).
36
 
37
+ ### Perform Text-to-Speech (TTS) with FastSpeech2
38
 
39
  ```
40
  import torchaudio
41
+ from speechbrain.pretrained import FastSpeech2
42
  from speechbrain.pretrained import HIFIGAN
43
 
44
  # Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
45
+ fastspeech2 = FastSpeech2.from_hparams(source="speechbrain/tts-fastspeech2-ljspeech", savedir="tmpdir_tts")
46
+ hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-libritts-16kHz", savedir="tmpdir_vocoder")
47
 
48
  # Running the TTS
49
+ mel_output, durations, pitch, energy = fastspeech2.encode_text(input_text)
50
 
51
  # Running Vocoder (spectrogram-to-waveform)
52
  waveforms = hifi_gan.decode_batch(mel_output)
53
 
54
  # Save the waverform
55
+ torchaudio.save('example_TTS.wav', waveforms.squeeze(1), 16000)
56
  ```
57
 
58
  If you want to generate multiple sentences in one-shot, you can do in this way:
59
 
60
  ```
61
+ from speechbrain.pretrained import FastSpeech2
62
+ fastspeech2 = FastSpeech2.from_hparams(source="speechbrain/tts-fastspeech2-ljspeech", savedir="tmpdir_tts")
63
  items = [
64
  "A quick brown fox jumped over the lazy dog",
65
  "How much wood would a woodchuck chuck?",
66
  "Never odd or even"
67
  ]
68
+ mel_outputs, durations, pitch, energy = fastspeech2.encode_batch(items)
69
 
70
  ```
71