Guan-Ting
/

StyleSpeech-MelGAN-vocoder-16kHz

Model card Files Files and versions Community

StyleSpeech-MelGAN-vocoder-16kHz / README.md

Guan-Ting's picture

Update README.md

05a0076 about 3 years ago

|

history blame contribute delete

995 Bytes

	### The MelGAN vocoder for StyleSpeech
	#### About StyleSpeech
	* StyleSpeech or Meta-StyleSpeech is a model for Multi-Speaker Adaptive Text-to-Speech Generation
	* The StyleSpeech model can be trained by official implementation (https://github.com/KevinMIN95/StyleSpeech).
	#### About MelGAN vocoder
	* This MelGAN vocoder is used to transform the mel-spectrogram back to the waveform.
	* StyleSpeech is based on 16k Hz sampling rate, and there is no available 16k Hz multi-speaker vocoder.
	* Thus I train this vocoder from scratch using Libri-TTS train-100 hour dataset. The training pipeline is the same as the official MelGAN (https://github.com/descriptinc/melgan-neurips).
	* The synthesized sounds are close to the official demo with good quality.
	#### Usage
	* Please follow the official MelGAN (https://github.com/descriptinc/melgan-neurips) to load pre-trained checkpoint and convert your mel-spectrogram back to the waveform.

	#### Training Details
	* GPU: RTX 2080Ti
	* Training epoch: 3000