TITAN: A Versatile, Robust, and High-Quality Pretrained Model for Retrieval-based Voice Conversion (RVC) Training

Overview

TITAN is a state-of-the-art pretrained model designed for Retrieval-based Voice Conversion (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/) training. It offers a robust solution for transforming voice characteristics from one speaker to another, providing high-quality results with minimal training effort.

Model Details

Titan-Medium

Training Environment: Utilized a RTX 3060 TI on Applio v3.1.1 (https://github.com/IAHispano/Applio), employing a batch size of 8 over a span of 3 weeks.
Iterations (48k): 1018660 Steps and 530 Epochs
Iterations (40k): 1010588 Steps and 467 Epochs
Iterations (32k): 1001469 Steps and 463 Epochs
Sampling rate: 48k, 40k, 32k
Fine-tuning Process: RVC v2 pretrained with pitch guidance, leveraging an 11.15-hour dataset sourced from Expresso (https://arxiv.org/abs/2308.05725) also available on datasets/blaise-tk/TITAN-Medium.

Samples

Tests performed with a premature ckpt at ~700k steps doing all tests under the same conditions.

Titan-Medium	Ov2	Ov2.1

Titan-Large

Details forthcoming...

Collaborators

We appreciate the contributions of our collaborators who have helped in the development and refinement of TITAN.

Mustar
SimplCup
UnitedShoes

Beta Testers

We extend our gratitude to the beta testers who provided valuable feedback during the testing phase of TITAN.

SimplCup
Leo_Frixi
Light
SCRFilms
Ryanz
Litsa_the_dancer

Citation

Should you find TITAN beneficial for your research endeavors or projects, we kindly request citing our repository:

@article{titan,
  title={TITAN: A Versatile, Robust, and High-Quality Pretrained Model for Retrieval-based Voice Conversion (RVC) Training},
  author={Blaise},
  journal={Hugging Face},
  year={2024},
  publisher={Blaise},
  url={https://huggingface.co/blaise-tk/TITAN/}
}

blaise-tk
/

TITAN