YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SNAC-Vocos

A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

Installation

Suggested python>=3.9.
Clone the repository:

git clone https://github.com/hertz-pj/SNAC-Vocos
cd SNAC-Vocos

Install packages:

pip install -r requirements.txt

Infer

Refer to the infer.py for inference instructions and usage examples.

Available Models

Model name Huggingface Corpus Domain
snac_vocos_16khz_hop200_scale8421_1kh 🤗 1k hours Speech(Mandarin/English)

Training

1、Prepare a filelist of audio files for the training and validation set, e.g. train.list.
2、Fill a config file, e.g. snac_vocos.yaml. The main parameters to pay attention to are batch_size, filelist_path, save_dir, and device.
3、Start training

python train.py fit --config ./configs/snac_vocos.yaml

TODO

  • Release code
  • Release a checkpoint trained with 1k hours of speech(Mandarin/English).
  • Demo page.

Acknowledgements

This implementation uses parts of the code from the following Github repos:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .