kxxia's picture
Update README.md
45ccf88 verified
metadata
license: mit
datasets:
  - Wenetspeech4TTS/WenetSpeech4TTS
language:
  - zh
pipeline_tag: text-to-speech

The vanilla VALL E train on WenetSpeech4TTS using Amphion tooltik.

The entire training process follows its training code, except that the text-to-phoneme feature step is slightly different.

Checkpoints

  • base_model.bin : VALL-E trained with the WenetSpeech4TTS Basic subset
  • 38sft_model.bin : VALL-E Basic fine-tuning with the WenetSpeech4TTS Standard subset
  • 4sft_model.bin : VALL-E Standard fine-tuning with the WenetSpeech4TTS Premium subset

usage

Inference code and more details : ISCSLP2024_CoVoC_baseline.