need to fine tune with my own voice

#2
by IOzmen - opened

hey. i need to fine tune this xtts with my own voice. can you show me a roadmap or tell me what i need to do? do i need to prepare a dataset with my own voice. and i dont know how to fine tune this xtts model. please help!!

Yeah sure!

All you need is 20 minutes or so of you reading stuff, keeping it in the same tone for the whole thing will have the best results like an audiobook.

If you want you can denoise your audio using something like this if there’s a ton of background sound.
(Run this space locally as a docker if you use this it’ll probs be faster)

https://huggingface.co/spaces/drewThomasson/DeepFilterNet2_no_limit

Then train it using this

https://github.com/daswer123/xtts-finetune-webu

At the end of the gui you should see a download model and download dataset button

That’ll zip and download your optimized fine-tuned xtts model on your voice

Just run through all the steps one by one letting each one complete

Heres a video of a guy using jt

thanks for your quick reply. but i have questions.

will I record a single audio file and that will be 20 minutes? or will I record multiple audio files (e.g. 80-100 audio files) and the total length of these will be 20 minutes?

  1. the github link you sent doesn't work. it says page not found. where can I find a working link. also the google colab notepad in the youtube video you sent doesn't work. it's not like in the video and gives many errors. do you know how to solve these errors? or do you have a working github and google colab link?

you can just upload a single file or multiple I think.
the gui should just deal with it lol

also sorry here try this link

https://github.com/daswer123/xtts-finetune-webui

heres it as a hyperlink if that works

The repository for XTTS Finetune WebUI can be found here.

when i fintuned the model, can i download it to my local computer and use it in some mobil or web application (i am talking about just the finetuned model)?

There’s a download button at the third tab in the gui

You’ll have to read the coqui xtts docs or somehting on loading custom models tho idk ask their discord

maybe i am asking too many questions sorry for that but. in the gui part i am facing an "connection errored out" in the logs part it just says error
image.png
can u think any possible solutions for this? on the yt video there was ne error or such thing :(/?

you might be giving it too many files at once

idk try combining them all into a concatenated giant file of all of your separate recordings with ffmpeg

you could ask chatgpt for that telling it what the names of all the files your concatenating are

idk I usually give it one big file

If you have a Nivida card you running it on best to run the docker image just in case

Then you won't run into any issues

you'll have to install docker first tho

at least in the docker the env will be set up perfectly for you automatically and wipes itself every run like a mini virtual machine

ı gave it one file, that didint change anything. i will try to run it on docker.

Yeah the docker fixed any possible environment issues with your setup so

As long as you have a nvidia graphics card with enough vram you should be good…

No errors in your console?

drewThomasson changed discussion status to closed

i have tried again on the colab. this the error output

model.bin: 91% 2.82G/3.09G [01:06<00:06, 41.7MB/s]
model.bin: 92% 2.83G/3.09G [01:06<00:06, 41.9MB/s]
model.bin: 92% 2.84G/3.09G [01:06<00:05, 42.9MB/s]
model.bin: 92% 2.85G/3.09G [01:07<00:05, 41.6MB/s]
model.bin: 93% 2.86G/3.09G [01:07<00:05, 41.9MB/s]
model.bin: 93% 2.87G/3.09G [01:07<00:05, 42.1MB/s]
model.bin: 93% 2.88G/3.09G [01:07<00:04, 42.1MB/s]
model.bin: 94% 2.89G/3.09G [01:08<00:04, 41.6MB/s]
model.bin: 94% 2.90G/3.09G [01:08<00:04, 42.0MB/s]
model.bin: 94% 2.92G/3.09G [01:08<00:04, 42.4MB/s]
model.bin: 95% 2.93G/3.09G [01:08<00:03, 42.5MB/s]
model.bin: 95% 2.94G/3.09G [01:09<00:03, 41.3MB/s]
model.bin: 95% 2.95G/3.09G [01:09<00:03, 41.8MB/s]
model.bin: 96% 2.96G/3.09G [01:09<00:03, 35.1MB/s]
model.bin: 96% 2.98G/3.09G [01:10<00:02, 43.6MB/s]
model.bin: 97% 2.99G/3.09G [01:10<00:02, 43.2MB/s]
model.bin: 97% 3.00G/3.09G [01:10<00:02, 43.1MB/s]
model.bin: 97% 3.01G/3.09G [01:10<00:01, 42.8MB/s]
model.bin: 98% 3.02G/3.09G [01:11<00:01, 42.2MB/s]
model.bin: 98% 3.03G/3.09G [01:11<00:01, 42.3MB/s]
model.bin: 99% 3.04G/3.09G [01:11<00:01, 42.5MB/s]
model.bin: 99% 3.05G/3.09G [01:11<00:00, 42.4MB/s]
model.bin: 99% 3.06G/3.09G [01:12<00:00, 41.8MB/s]
model.bin: 100% 3.07G/3.09G [01:12<00:00, 41.9MB/s]
model.bin: 100% 3.09G/3.09G [01:12<00:00, 42.5MB/s]
Existing language matches target language
Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor

have you faced something like this? i will try docker but i am afraid that i will face the same problem? will i ?

Oh idk the colab is notorious for breaking sometimes

Your using the GPU on it right?

Look I got free time on my hand anyway if you send me the training audio I’ll toss you back the model after training on my computer if you want

thank you so much but i dont have the audio right now. maybe when i prepare it can i send you in a few days?

...sure?

Sign up or log in to comment