need to fine tune with my own voice
hey. i need to fine tune this xtts with my own voice. can you show me a roadmap or tell me what i need to do? do i need to prepare a dataset with my own voice. and i dont know how to fine tune this xtts model. please help!!
Yeah sure!
All you need is 20 minutes or so of you reading stuff, keeping it in the same tone for the whole thing will have the best results like an audiobook.
If you want you can denoise your audio using something like this if there’s a ton of background sound.
(Run this space locally as a docker if you use this it’ll probs be faster)
https://huggingface.co/spaces/drewThomasson/DeepFilterNet2_no_limit
Then train it using this
https://github.com/daswer123/xtts-finetune-webu
At the end of the gui you should see a download model and download dataset button
That’ll zip and download your optimized fine-tuned xtts model on your voice
Just run through all the steps one by one letting each one complete
Heres a video of a guy using jt
thanks for your quick reply. but i have questions.
will I record a single audio file and that will be 20 minutes? or will I record multiple audio files (e.g. 80-100 audio files) and the total length of these will be 20 minutes?
- the github link you sent doesn't work. it says page not found. where can I find a working link. also the google colab notepad in the youtube video you sent doesn't work. it's not like in the video and gives many errors. do you know how to solve these errors? or do you have a working github and google colab link?
you can just upload a single file or multiple I think.
the gui should just deal with it lol
also sorry here try this link
https://github.com/daswer123/xtts-finetune-webui
heres it as a hyperlink if that works
The repository for XTTS Finetune WebUI can be found here.
when i fintuned the model, can i download it to my local computer and use it in some mobil or web application (i am talking about just the finetuned model)?
There’s a download button at the third tab in the gui
You’ll have to read the coqui xtts docs or somehting on loading custom models tho idk ask their discord
you might be giving it too many files at once
idk try combining them all into a concatenated giant file of all of your separate recordings with ffmpeg
you could ask chatgpt for that telling it what the names of all the files your concatenating are
idk I usually give it one big file
If you have a Nivida card you running it on best to run the docker image just in case
Then you won't run into any issues
you'll have to install docker first tho
at least in the docker the env will be set up perfectly for you automatically and wipes itself every run like a mini virtual machine
ı gave it one file, that didint change anything. i will try to run it on docker.
Yeah the docker fixed any possible environment issues with your setup so
As long as you have a nvidia graphics card with enough vram you should be good…
No errors in your console?
i have tried again on the colab. this the error output
model.bin: 91% 2.82G/3.09G [01:06<00:06, 41.7MB/s]
model.bin: 92% 2.83G/3.09G [01:06<00:06, 41.9MB/s]
model.bin: 92% 2.84G/3.09G [01:06<00:05, 42.9MB/s]
model.bin: 92% 2.85G/3.09G [01:07<00:05, 41.6MB/s]
model.bin: 93% 2.86G/3.09G [01:07<00:05, 41.9MB/s]
model.bin: 93% 2.87G/3.09G [01:07<00:05, 42.1MB/s]
model.bin: 93% 2.88G/3.09G [01:07<00:04, 42.1MB/s]
model.bin: 94% 2.89G/3.09G [01:08<00:04, 41.6MB/s]
model.bin: 94% 2.90G/3.09G [01:08<00:04, 42.0MB/s]
model.bin: 94% 2.92G/3.09G [01:08<00:04, 42.4MB/s]
model.bin: 95% 2.93G/3.09G [01:08<00:03, 42.5MB/s]
model.bin: 95% 2.94G/3.09G [01:09<00:03, 41.3MB/s]
model.bin: 95% 2.95G/3.09G [01:09<00:03, 41.8MB/s]
model.bin: 96% 2.96G/3.09G [01:09<00:03, 35.1MB/s]
model.bin: 96% 2.98G/3.09G [01:10<00:02, 43.6MB/s]
model.bin: 97% 2.99G/3.09G [01:10<00:02, 43.2MB/s]
model.bin: 97% 3.00G/3.09G [01:10<00:02, 43.1MB/s]
model.bin: 97% 3.01G/3.09G [01:10<00:01, 42.8MB/s]
model.bin: 98% 3.02G/3.09G [01:11<00:01, 42.2MB/s]
model.bin: 98% 3.03G/3.09G [01:11<00:01, 42.3MB/s]
model.bin: 99% 3.04G/3.09G [01:11<00:01, 42.5MB/s]
model.bin: 99% 3.05G/3.09G [01:11<00:00, 42.4MB/s]
model.bin: 99% 3.06G/3.09G [01:12<00:00, 41.8MB/s]
model.bin: 100% 3.07G/3.09G [01:12<00:00, 41.9MB/s]
model.bin: 100% 3.09G/3.09G [01:12<00:00, 42.5MB/s]
Existing language matches target language
Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor
have you faced something like this? i will try docker but i am afraid that i will face the same problem? will i ?
Oh idk the colab is notorious for breaking sometimes
Your using the GPU on it right?
Look I got free time on my hand anyway if you send me the training audio I’ll toss you back the model after training on my computer if you want
thank you so much but i dont have the audio right now. maybe when i prepare it can i send you in a few days?
...sure?