File size: 1,409 Bytes
98115b8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# Vietnamese Voice Clone
## Data Preparation
***If you use custom data***
- Config your custom data follow this format:
- Create folder: DATA
- Subfolder: DATA/wavs -> which contain <audio_id>.wav files inside
- DATA/train.txt and DATA/val.txt: with format each line follow format: <audio_id><space>transcript
- If you dont have transcript, please check wav2vec inference script
***If you try with VIVOS***
```
wget http://ailab.hcmus.edu.vn/assets/vivos.tar.gz
tar xzf vivos.tar.gz
```
```
mkdir -p DATA/wavs
scp -v vivos/*/waves/*/*.wav DATA/wavs
```
```
cat vivos/test/prompts.txt > DATA/val.txt
cat vivos/test/prompts.txt > DATA/train.txt
cat vivos/train/prompts.txt >> DATA/train.txt
```
## Install environment
```
conda create -y -n viclone python=3.8
conda activate viclone
conda install cudatoolkit=11.3.1 cudnn=8.2.1
```
```
python -m pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu116
python -m pip install -r requirements.txt
```
```
cd vits/monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace
```
## Process data
```
python Step1_data_processing.py
```
## Extract feature
```
python Step2_extract_feature.py
```
## Train model
```
python train_ms.py -c configs/vivos.json -m vivos
```
## Demo
```python app.py```
Then check port: http://127.0.0.1:7860/ |