SNAC-Vocos

A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

Installation

Suggested python>=3.9.
Clone the repository:

git clone https://github.com/hertz-pj/SNAC-Vocos
cd SNAC-Vocos

Install packages:

pip install -r requirements.txt

Infer

Refer to the infer.py for inference instructions and usage examples.

Available Models

Model name	Huggingface	Corpus	Domain
snac_vocos_16khz_hop200_scale8421_1kh	🤗	1k hours	Speech(Mandarin/English)

Training

1、Prepare a filelist of audio files for the training and validation set, e.g. train.list.
2、Fill a config file, e.g. snac_vocos.yaml. The main parameters to pay attention to are batch_size, filelist_path, save_dir, and device.
3、Start training

python train.py fit --config ./configs/snac_vocos.yaml

TODO

Release code
Release a checkpoint trained with 1k hours of speech(Mandarin/English).
Demo page.

Acknowledgements

This implementation uses parts of the code from the following Github repos:

SNAC
WavTokenizer