---
tags:
- espnet
- audio
- audio-to-audio
- vocoder
language: 
- en
datasets:
- vctk
license: cc-by-4.0
inference: false
---

## Vocoder model - HifiGAN - English

https://github.com/kan-bayashi/ParallelWaveGAN

**No support given.**

### Details

```
sampling_rate: 44100     # Sampling rate.
fft_size: 2048           # FFT size.
hop_size: 512            # Hop size.
win_length: 2048         # Window length.
                         # If set to null, it will be the same as fft_size.
window: "hann"           # Window function.
num_mels: 80             # Number of mel basis.
fmin: 0                 # Minimum freq in mel basis calculation.
fmax: 22050               # Maximum frequency in mel basis calculation.
generator_type: HiFiGANGenerator
generator_params:
    in_channels: 80                       # Number of input channels.
    out_channels: 1                       # Number of output channels.
    channels: 512                         # Number of initial channels.
    kernel_size: 7                        # Kernel size of initial and final conv layers.
    upsample_scales: [8, 8, 2, 2, 2]         # Upsampling scales.
    upsample_kernel_sizes: [16, 16, 4, 4, 4] # Kernel size for upsampling layers.
```