YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SNAC-Vocos

A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.

Installation

Suggested python>=3.9.
Clone the repository:

git clone https://github.com/hertz-pj/SNAC-Vocos
cd SNAC-Vocos

Install packages:

pip install -r requirements.txt

Infer

Refer to the infer.py for inference instructions and usage examples.

Available Models

Model name Huggingface Corpus Domain
snac_vocos_16khz_hop200_scale8421_1kh 🤗 1k hours Speech(Mandarin/English)

Training

1、Prepare a filelist of audio files for the training and validation set, e.g. train.list.
2、Fill a config file, e.g. snac_vocos.yaml. The main parameters to pay attention to are batch_size, filelist_path, save_dir, and device.
3、Start training

python train.py fit --config ./configs/snac_vocos.yaml

TODO

  • Release code
  • Release a checkpoint trained with 1k hours of speech(Mandarin/English).
  • Demo page.

Acknowledgements

This implementation uses parts of the code from the following Github repos:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.