Audio-to-Audio
audio
speech
voice-conversion

Local training error

#3
by Blakus - opened

About to start the training, I receive this error:

0%| | 0/20000 [00:00<?, ?it/s]

Traceback (most recent call last):
File "", line 1, in
File "C:\Program Files\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Program Files\Python310\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'WavDataset' on <module '__main__' (built-in)>

Now I am not sure which of these steps fixes this exact problem; however, I did encounter it while trying to train a model. I did all of the things below to make sure it worked.

  1. Use Python 3.10
  2. Install Torch 2.2.2 with Cuda 12.1 (inside poetry shell)
  3. Use relative paths when defining an input and output folder (example below)
  4. Instead of calling beatrice_trainer to start training call .\beatrice_trainer_main_.py
  5. Make sure all input files are .wav and are mono
  6. Split input .wav into 9-second .wav's (anything more and I got an error) (this can be done through ffmpeg)
  7. Use the poetry shell

image.png

FFMPEG code
ffmpeg -i input.wav -f segment -segment_time 9 output_%03d.wav

Shouldn't you put the command image into Excel file in zip file? Too easy to get it from image

0%| | 0/20000 [00:00<?, ?it/s]Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.

lol

Everything related to Python is complete mess. The most reliable way is to run in Ubuntu WSL or Docker.

Ok, can confirm @qcdead is spot on. It took me awhile to figure it all out but what I wasn't anticipating is the audio files need to be split with ffmpeg. Without it and just cutting by hand will not work.
Its likely the file names needed to be sequential for it to work. Just follow the above and should be fine.

In order of operations (for a windows user):

  1. Install pyenv (if you don't already have it): https://github.com/pyenv-win/pyenv-win#quick-start
  2. Ensure pyenv is set to 3.10.0:
pyenv install 3.10.0
pyenv global 3.10.0
  1. Install scoop: https://scoop.sh/
  2. Install pipx using scoop: https://pipx.pypa.io/stable/installation/
scoop install pipx
pipx ensurepath
  1. Install poetry using pipx: https://python-poetry.org/docs/
pipx install poetry
  1. Navigate to beatrice-trainer directory and poetry install:
poetry install
poetry shell
  1. Install specific cuda versions of torch, torchvision, torchaudio: https://pytorch.org/get-started/locally/
pip3 install torch==2.2.2 torchvision torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
  1. Create an "input" folder at root of project with a subfolder (the voice model name, in this example i'm calling it voice01). The subfolder will be where you copy the audio files to (see steps below).
  2. Create an empty "output" folder at root of project.
  3. Environment should now be correctly setup. Next prep the audio files. Use Audacity to format to mono:
    Open the wav file in Audacity and right click track and select split to mono. Select second audio track and delete it. Export to file.
  4. Install ffmpeg (make sure to add to your environment variables PATH): https://www.ffmpeg.org/download.html
  5. Use ffmpeg to split the file into 9 second clips:
ffmpeg -i input.wav -f segment -segment_time 9 output_%03d.wav
  1. Copy output_00X.wav files to input/voice01 folder.
  2. From root of project run (ensuring still in poetry environment):
python .\beatrice_trainer\__main__.py -d .\input -o .\output

If done correctly, you should start seeing:

| 0/20000 [00:00<?, ?it/s]Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.

...

| 2/20000 [00:30<69:05:54, 12.44s/it]Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.
Training finished.

and so on...

If done correctly you should not see the WavDataset error or the data_iter errors.

Good Luck!

Lol, I see that it was working and I just hit stop training because of the "training finished" log. Maybe it's a console bug and I have to keep waiting for the percentage to advance.

I will give you updates later.

I followed the @wickedjutto instructions, and the process does seem to work, but it tells me that training's gonna take over a hundred hours. The model card indicates this should only take a few hours at most with a GPU like mine. Not sure why it doesn't seem to recognize it.

@wickedjutto I'm glad someone could understand what I wrote and provide more detailed instructions.
I don't do coding, I'm a 3D modeler however these custom-trained models have interested me for a game I am making, and seeing the different qualities of voices of my own dataset using different tools has been an important factor.
What I was able to gather on these problems I had to translate from the Japanese AI server which proved to be difficult to understand (I don't speak a lick of Japanese) and my lack of knowledge with this stuff made it so I couldn't provide too much other than what I wrote, but at least you understood it enough to make more detailed instructions for anyone to follow

@Hisao-Nakai The first model I trained used a 5-minute dataset, on my RTX 3060TI and it took about 12 hours to complete. The ETA wasn't reliable as at the beginning it said it would only take 2 hours

I wrote a one-click installer that splits selected audio, processes it & begins training. Made it work using torch 2.4.0 cuda 11.8 in a 3.10.11 venv, I thought I'd throw in my two cents for those who get hung up on trying to make specific versions work that for some reason, just won't :)! Good luck!

I've completed the steps outlined and when I attempt Step 2. Environment Setup to enter "python3 beatrice_trainer -h" to get the help displayed and show a good install, I get this error:

(beatrice-trainer-py3.10) C:\VoiceClone\beatrice-trainer>python beatrice_trainer -h
Traceback (most recent call last):
File "C:\Python31011\lib\runpy.py", line 196, in _run_module_as_main
return run_code(code, main_globals, None,
File "C:\Python31011\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\VoiceClone\beatrice-trainer\beatrice_trainer_main
.py", line 25, in
import torchaudio
File "C:\Users\david\AppData\Local\pypoetry\Cache\virtualenvs\beatrice-trainer-Ym5nKfem-py3.10\lib\site-packages\torchaudio_init
.py", line 2, in
from . import extension # noqa # usort: skip
File "C:\Users\david\AppData\Local\pypoetry\Cache\virtualenvs\beatrice-trainer-Ym5nKfem-py3.10\lib\site-packages\torchaudio_extension_init
.py", line 38, in
_load_lib("libtorchaudio")
File "C:\Users\david\AppData\Local\pypoetry\Cache\virtualenvs\beatrice-trainer-Ym5nKfem-py3.10\lib\site-packages\torchaudio_extension\utils.py", line 60, in load_lib
torch.ops.load_library(path)
File "C:\Users\david\AppData\Local\pypoetry\Cache\virtualenvs\beatrice-trainer-Ym5nKfem-py3.10\lib\site-packages\torch_ops.py", line 933, in load_library
ctypes.CDLL(path)
File "C:\Python31011\lib\ctypes_init
.py", line 374, in init
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\david\AppData\Local\pypoetry\Cache\virtualenvs\beatrice-trainer-Ym5nKfem-py3.10\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.

I get this same error when I attempt to call beatrice_trainer using .\beatrice_trainder_main_.py as sugguested as well.

I've attempt the following based on what little I found online regarding this error:

  • verified that cudart64_110.dll is actually in the directory listed
  • copied the file to the \Windows\System32 directory (found a post that recommended trying it)
  • pip check and no issues found
  • uninstalled and re-installed (multiple times each):
    • torch/torchaudio/torchvision 2.2.2 as recommended
    • CUDA 11.8 Toolkit from NVIDIA site
    • Visual Studio 2022 (one post mentioned this as a possible solution)
    • Updated Windows nvidia driver to latest game and studio versions
  • bascailly tried everything else I've found on this error searching the internet (there isn't much out there).

I'm running on:
- Intel i9-14900K
- Windows 11 (fully updated/patched)
- 128GB RAM
- NVIDIA 4090 24MB RAM

BTW, I had no issues using FFMPEG to spit audio for training into 9 second files as suggested.

The primary other tool I've been working with, which is working great, is Rope-Live for face-swapping.

Any advice is greatly appreciated!

FYI, I also removed everything and started over following the steps outlined by @wickedjutto using pyenv, scoop, etc. and get the same error when attempting to invoke help or run "python .\beatrice_trainer_main_.py -d .\input -o .\output".

Just an FYI, the final resolution for me was to install VC_redist.X64 - 14.29.30153 and it resolved this issue

Sign up or log in to comment