Audio-to-Audio
audio
speech
voice-conversion

Training Time

#7
by Art3mas - opened

Hey all,

So I wanted to do some tests with Beatrice-Trainer, I'm not very familiar with training if I'm honest, but it seemed like an interesting thing to try.

So, my 'issue' (if it is one, and not just unreasonable expectations) is the speed. I'm running on an RTX4070. It's mentioned that the training time on an RTX 4090 is around an hour, so I expected 2-3 hours, however the training seems to take upwards of 12-16 hours. On task manager I can see that there's large portions of time that the card isn't being used, this lines up with the repeated lines of "Training completed"

If this is unexpected behaviour I'd definitely appreciate some guidance on this should anyone have some.

Thanks!

So, after some digging around I eventually found this:

"The problem of training finished. appearing repeatedly and slowing down training can be solved by adding persistent_workers=True to training_loader = torch.utils.data.DataLoader(below in
beatrice_trainer/__ main__ .py" source: https://rentry.org/hnk4oo3n

So, I'm going to try this, hopefully this helps someone else if they have the same issue.

Alright, can't confirm anything in terms of quality yet. BUT. I've already seen a giant uptick in processing speed. Thank goodness. :')

Well, speed has gone up thankfully, my first model is a hot mess but I expected that with a relatively small dataset of middling quality.

Currently running a 1hr dataset at 15000 steps with a batch of 16 to see what comes out. Seems that with these settings the time to complete is somewhere in the realm of 6hrs at around 1.4-1.44s/it with a Nvidia RTX 4070 and intel 11900k with 64GB of ram.

I will probably try a larger step count and lower batch size overnight unless it turns out astronomically good.

So with more messing around, 12 workers seems to help a little for me, giving me around 2it/s I can only assume this is because some VRam is freed up?

Results are pretty up and down, I'm wondering if the pretrained file is solely japanese datasets meaning my attempts with english-language datasets will continue to produce similar levels of quality, I don't know enough about the models or their training to say however.

Looking more into the readme info there's definitely english data included in the pre-trained model given the inclusion of openslr, but it does raise the question of why the performance seems so much worse for english, even on english dataset trained models, I'm finding they perform better with the handful of japanese sentences I know, rather than my native english.

Sign up or log in to comment