|
--- |
|
license: mit |
|
--- |
|
# BigVGAN-L |
|
The 24kHz model was pretrained using LibriTTS dataset with a full 100-band mel spectrogram as input (see ```config.json``` for the exact hyperparameter setup) with the [BigVGAN](https://github.com/NVIDIA/BigVGAN) |
|
repository. The pretraining was performed over 1300k steps with a 100 batch size with 8 A100 40GB GPUs. |
|
|
|
# Inference |
|
The run the inference with the example command for generating audio from the model. It computes mel spectrograms using wav files from --input_wavs_dir and saves the generated audio to --output_dir. |
|
``` |
|
python NEMO_PATH/inference.py \ |
|
--checkpoint_file MODEL_PATH/BigVGAN-L/g_01300000.pt \ |
|
--input_wavs_dir AUDIO_PATH/input_wav \ |
|
--output_dir AUDIO_PATH/output_wav |
|
``` |
|
|
|
# Continual finetuning |
|
The vocoder can be finetuned further on using the NEMO_PATH/train.py script as the checkpoints save all the optimizer information. |