vishred18's picture
Upload 364 files
d5ee97c
|
raw
history blame
1.99 kB
# Fast speech 2 multi-speaker english lang based
## Prepare
Everything is done from main repo folder so TensorflowTTS/
0. Optional* [Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examples/fastspeech2_libritts/libri_experiment/prepare_libri.ipynb)
- Dataset structure after finish this step:
```
|- TensorFlowTTS/
| |- LibriTTS/
| |- |- train-clean-100/
| |- |- SPEAKERS.txt
| |- |- ...
| |- libritts/
| |- |- 200/
| |- |- |- 200_124139_000001_000000.txt
| |- |- |- 200_124139_000001_000000.wav
| |- |- |- ...
| |- |- 250/
| |- |- ...
| |- tensorflow_tts/
| |- models/
| |- ...
```
1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2)
2. Optional* build docker
- ```
bash examples/fastspeech2_libritts/scripts/build.sh
```
3. Optional* run docker
- ```
bash examples/fastspeech2_libritts/scripts/interactive.sh
```
4. Preprocessing:
- ```
tensorflow-tts-preprocess --rootdir ./libritts \
--outdir ./dump_libritts \
--config preprocess/libritts_preprocess.yaml \
--dataset libritts
```
5. Normalization:
- ```
tensorflow-tts-normalize --rootdir ./dump_libritts \
--outdir ./dump_libritts \
--config preprocess/libritts_preprocess.yaml \
--dataset libritts
```
6. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything)
7. Change train_libri.sh to match your dataset and run:
- ```
bash examples/fastspeech2_libritts/scripts/train_libri.sh
```
8. Optional* If u have problems with tensor sizes mismatch check step 5 in `examples/mfa_extraction` directory
## Comments
This version is using popular train.txt '|' split used in other repos. Training files should looks like this =>
Wav Path | Text | Speaker Name
Wav Path2 | Text | Speaker Name