nairaxo commited on
Commit
6d8d858
1 Parent(s): 762475b

Create new file

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "sw"
3
+ inference: false
4
+ tags:
5
+ - Vocoder
6
+ - HiFIGAN
7
+ - text-to-speech
8
+ - TTS
9
+ - speech-synthesis
10
+ - speechbrain
11
+ license: "apache-2.0"
12
+ datasets:
13
+ - LJSpeech
14
+ ---
15
+
16
+ # Vocoder with HiFIGAN trained on LJSpeech
17
+
18
+ This repository provides all the necessary tools for using a [HiFIGAN](https://arxiv.org/abs/2010.05646) vocoder trained with [LJSpeech](https://keithito.com/LJ-Speech-Dataset/).
19
+
20
+ The pre-trained model takes in input a spectrogram and produces a waveform in output. Typically, a vocoder is used after a TTS model that converts an input text into a spectrogram.
21
+
22
+
23
+ ## Install SpeechBrain
24
+
25
+ ```bash
26
+ pip install speechbrain
27
+ ```
28
+
29
+
30
+ Please notice that we encourage you to read our tutorials and learn more about
31
+ [SpeechBrain](https://speechbrain.github.io).
32
+
33
+ ### Using the Vocoder
34
+
35
+ ```python
36
+ import torch
37
+ from speechbrain.pretrained import HIFIGAN
38
+ hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir")
39
+ mel_specs = torch.rand(2, 80,298)
40
+ waveforms = hifi_gan.decode_batch(mel_specs)
41
+ ```
42
+ ### Using the Vocoder with the TTS
43
+ ```python
44
+ import torchaudio
45
+ from speechbrain.pretrained import Tacotron2
46
+ from speechbrain.pretrained import HIFIGAN
47
+
48
+ # Intialize TTS (tacotron2) and Vocoder (HiFIGAN)
49
+ tacotron2 = Tacotron2.from_hparams(source="speechbrain/tts-tacotron2-ljspeech", savedir="tmpdir_tts")
50
+ hifi_gan = HIFIGAN.from_hparams(source="speechbrain/tts-hifigan-ljspeech", savedir="tmpdir_vocoder")
51
+
52
+ # Running the TTS
53
+ mel_output, mel_length, alignment = tacotron2.encode_text("Mary had a little lamb")
54
+
55
+ # Running Vocoder (spectrogram-to-waveform)
56
+ waveforms = hifi_gan.decode_batch(mel_output)
57
+
58
+ # Save the waverform
59
+ torchaudio.save('example_TTS.wav',waveforms.squeeze(1), 22050)
60
+ ```
61
+
62
+ ### Inference on GPU
63
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
64
+
65
+ ### Training
66
+ The model was trained with SpeechBrain.
67
+ To train it from scratch follow these steps:
68
+ 1. Clone SpeechBrain:
69
+ ```bash
70
+ git clone https://github.com/speechbrain/speechbrain/
71
+ ```
72
+ 2. Install it:
73
+ ```bash
74
+ cd speechbrain
75
+ pip install -r requirements.txt
76
+ pip install -e .
77
+ ```
78
+ 3. Run Training:
79
+ ```bash
80
+ cd recipes/LJSpeech/TTS/vocoder/hifi_gan/
81
+ python train.py hparams/train.yaml --data_folder /path/to/LJspeech
82
+ ```
83
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/19sLwV7nAsnUuLkoTu5vafURA9Fo2WZgG?usp=sharing).