AgnesTachyon So-vits-svc 4.1 Model
A so-vits-svc 4.1 model of AgnesTachyon in Uma Musume: Pretty Derby.
Model Details
Model Description
This is a so-vits-svc 4.1 model of AgnesTachyon in Uma Musume: Pretty Derby.
- Developed by: svc-develop-team
- Trained by: 70295
- Model type: Audio to Audio
- License: CC BY-NC 4.0
Uses
- Clone the so-vits-svc repository and install all dependencies.
- Create a new folder named "models" and place the "AgnesTachyon" folder inside it.
- Navigate to the directory of "so-vits-svc" and execute the following command by replacing "xxx.wav" with the name of your source audio file and "x" with the desired key to raise/lower.
python inference_main.py -m "models/AgnesTachyon/AgnesTachyon.pth" -c "models/AgnesTachyon/config.json" -n "xxx.wav" -t x -s "AgnesTachyon"
Shallow diffusion model, cluster model and feature index model is also provided. Check the README.md file of the so-vits-svc project for more information.
Training Details
Training Data
All of the training data is extracted from the Windows client of Uma Musume: Pretty Derby using the umamusume-voice-text-extractor.
The copyright of the training dataset belongs to Cygames.
Only the voice is used, the live music soundtrack is not included in the training dataset.
Training Procedure
Training Environment Preparation
- Download the base models mentioned in the README.md file of the so-vits-svc project.
You should download checkpoint_best_legacy_500.pt , D_0.pth, G_0.pth(for sovits model), model_0.pt(for shallow diffusion) , rmvpe.pt(for the f0 predictor RMVPE), model(for NSF_hifigan). - Place checkpoint_best_legacy_500.pt, rmvpe.pt in .\pretrain, place model and its config.json in .\pretrain\nsf_hifigan, place D_0.pth, G_0.pth in .\logs\44k, place model_0.pt in .\logs\44k\diffusion .
Credits: The D_0.pth and G_0.pth provided above is from OOPPEENN.
Preprocessing
- Delete all WAV files smaller than 400KB, and copy them to .\dataset_raw\AgnesTachyon
- Navigate to the directory of "so-vits-svc" and execute
python resample.py --skip_loudnorm
. - Execute
python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug
. - Edit the parameters in config.json and diffusion.yaml.
- Execute
python preprocess_hubert_f0.py --f0_predictor rmvpe --use_diff
Training
- Execute
python train.py -c configs/config.json -m 44k
.
[Optional]
- Execute
python train_diff.py -c configs/diffusion.yaml
to train the shallow diffusion model. - Execute
python cluster/train_cluster.py --gpu
to train the cluster model. - Execute
python train_index.py -c configs/config.json
to train the feature index model.
Training Hyperparameters
Please check config.json and diffusion.yaml for training hyperparameters
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: RTX 3090
- Hours used: 41.6
- Provider: Myself
- Compute Region: Mainland China
- Carbon Emitted: ~16.02kg CO2