PolyAI
/

BigVGAN-L

Inference Endpoints

Model card Files Files and versions Community

BigVGAN-L / README.md

pfb30's picture

Update README.md

dc87a91 over 1 year ago

|

history blame contribute delete

885 Bytes

	---
	license: mit
	---
	# BigVGAN-L
	The 24kHz model was pretrained using LibriTTS dataset with a full 100-band mel spectrogram as input (see ```config.json``` for the exact hyperparameter setup) with the [BigVGAN](https://github.com/NVIDIA/BigVGAN)
	repository. The pretraining was performed over 1300k steps with a 100 batch size with 8 A100 40GB GPUs.

	# Inference
	The run the inference with the example command for generating audio from the model. It computes mel spectrograms using wav files from --input_wavs_dir and saves the generated audio to --output_dir.
	```
	python NEMO_PATH/inference.py \
	--checkpoint_file MODEL_PATH/BigVGAN-L/g_01300000.pt \
	--input_wavs_dir AUDIO_PATH/input_wav \
	--output_dir AUDIO_PATH/output_wav
	```

	# Continual finetuning
	The vocoder can be finetuned further on using the NEMO_PATH/train.py script as the checkpoints save all the optimizer information.