drewThomasson/xtts-finetune-Bob-Odenkirk · need to fine tune with my own voice

7 days ago

hey. i need to fine tune this xtts with my own voice. can you show me a roadmap or tell me what i need to do? do i need to prepare a dataset with my own voice. and i dont know how to fine tune this xtts model. please help!!

drewThomasson

Owner 7 days ago

Yeah sure!

All you need is 20 minutes or so of you reading stuff, keeping it in the same tone for the whole thing will have the best results like an audiobook.

If you want you can denoise your audio using something like this if there’s a ton of background sound.
(Run this space locally as a docker if you use this it’ll probs be faster)

https://huggingface.co/spaces/drewThomasson/DeepFilterNet2_no_limit

Then train it using this

https://github.com/daswer123/xtts-finetune-webu

At the end of the gui you should see a download model and download dataset button

That’ll zip and download your optimized fine-tuned xtts model on your voice

drewThomasson

Owner 7 days ago

Just run through all the steps one by one letting each one complete

drewThomasson

Owner 7 days ago

Heres a video of a guy using jt

drewThomasson

Owner 7 days ago

https://youtu.be/8tpDiiouGxc?si=v6vKTOxV8TD15uX2

IOzmen

7 days ago

thanks for your quick reply. but i have questions.

will I record a single audio file and that will be 20 minutes? or will I record multiple audio files (e.g. 80-100 audio files) and the total length of these will be 20 minutes?

the github link you sent doesn't work. it says page not found. where can I find a working link. also the google colab notepad in the youtube video you sent doesn't work. it's not like in the video and gives many errors. do you know how to solve these errors? or do you have a working github and google colab link?

drewThomasson

Owner 7 days ago

you can just upload a single file or multiple I think.
the gui should just deal with it lol

also sorry here try this link

https://github.com/daswer123/xtts-finetune-webui

heres it as a hyperlink if that works

The repository for XTTS Finetune WebUI can be found here.

IOzmen

7 days ago

when i fintuned the model, can i download it to my local computer and use it in some mobil or web application (i am talking about just the finetuned model)?

drewThomasson

Owner 7 days ago

There’s a download button at the third tab in the gui

drewThomasson

Owner 7 days ago

You’ll have to read the coqui xtts docs or somehting on loading custom models tho idk ask their discord

IOzmen

7 days ago

maybe i am asking too many questions sorry for that but. in the gui part i am facing an "connection errored out" in the logs part it just says error

can u think any possible solutions for this? on the yt video there was ne error or such thing :(/?

drewThomasson

Owner 7 days ago

you might be giving it too many files at once

idk try combining them all into a concatenated giant file of all of your separate recordings with ffmpeg

you could ask chatgpt for that telling it what the names of all the files your concatenating are

drewThomasson

Owner 7 days ago

idk I usually give it one big file

drewThomasson

Owner 7 days ago

If you have a Nivida card you running it on best to run the docker image just in case

Then you won't run into any issues

you'll have to install docker first tho

drewThomasson

Owner 7 days ago

at least in the docker the env will be set up perfectly for you automatically and wipes itself every run like a mini virtual machine

IOzmen

7 days ago

ı gave it one file, that didint change anything. i will try to run it on docker.

drewThomasson

Owner 7 days ago

Yeah the docker fixed any possible environment issues with your setup so

As long as you have a nvidia graphics card with enough vram you should be good…

drewThomasson

Owner 7 days ago

No errors in your console?

drewThomasson changed discussion status to closed 7 days ago

IOzmen

7 days ago

i have tried again on the colab. this the error output

model.bin: 91% 2.82G/3.09G [01:06<00:06, 41.7MB/s]
model.bin: 92% 2.83G/3.09G [01:06<00:06, 41.9MB/s]
model.bin: 92% 2.84G/3.09G [01:06<00:05, 42.9MB/s]
model.bin: 92% 2.85G/3.09G [01:07<00:05, 41.6MB/s]
model.bin: 93% 2.86G/3.09G [01:07<00:05, 41.9MB/s]
model.bin: 93% 2.87G/3.09G [01:07<00:05, 42.1MB/s]
model.bin: 93% 2.88G/3.09G [01:07<00:04, 42.1MB/s]
model.bin: 94% 2.89G/3.09G [01:08<00:04, 41.6MB/s]
model.bin: 94% 2.90G/3.09G [01:08<00:04, 42.0MB/s]
model.bin: 94% 2.92G/3.09G [01:08<00:04, 42.4MB/s]
model.bin: 95% 2.93G/3.09G [01:08<00:03, 42.5MB/s]
model.bin: 95% 2.94G/3.09G [01:09<00:03, 41.3MB/s]
model.bin: 95% 2.95G/3.09G [01:09<00:03, 41.8MB/s]
model.bin: 96% 2.96G/3.09G [01:09<00:03, 35.1MB/s]
model.bin: 96% 2.98G/3.09G [01:10<00:02, 43.6MB/s]
model.bin: 97% 2.99G/3.09G [01:10<00:02, 43.2MB/s]
model.bin: 97% 3.00G/3.09G [01:10<00:02, 43.1MB/s]
model.bin: 97% 3.01G/3.09G [01:10<00:01, 42.8MB/s]
model.bin: 98% 3.02G/3.09G [01:11<00:01, 42.2MB/s]
model.bin: 98% 3.03G/3.09G [01:11<00:01, 42.3MB/s]
model.bin: 99% 3.04G/3.09G [01:11<00:01, 42.5MB/s]
model.bin: 99% 3.05G/3.09G [01:11<00:00, 42.4MB/s]
model.bin: 99% 3.06G/3.09G [01:12<00:00, 41.8MB/s]
model.bin: 100% 3.07G/3.09G [01:12<00:00, 41.9MB/s]
model.bin: 100% 3.09G/3.09G [01:12<00:00, 42.5MB/s]
Existing language matches target language
Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor

have you faced something like this? i will try docker but i am afraid that i will face the same problem? will i ?

drewThomasson

Owner 7 days ago

Oh idk the colab is notorious for breaking sometimes

Your using the GPU on it right?

drewThomasson

Owner 7 days ago

Look I got free time on my hand anyway if you send me the training audio I’ll toss you back the model after training on my computer if you want

IOzmen

7 days ago

thank you so much but i dont have the audio right now. maybe when i prepare it can i send you in a few days?

drewThomasson

Owner 7 days ago

...sure?