Spaces:
Configuration error
Configuration error
Upload 5 files
Browse files- .gitignore +52 -0
- LICENSE +20 -0
- README.md +118 -11
- setup.cfg +14 -0
- setup.py +43 -0
.gitignore
ADDED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gta/
|
2 |
+
train_data/
|
3 |
+
test_data/
|
4 |
+
assets/infore/
|
5 |
+
# IDE files
|
6 |
+
.idea
|
7 |
+
.vscode
|
8 |
+
|
9 |
+
# Mac files
|
10 |
+
.DS_Store
|
11 |
+
|
12 |
+
# Environments
|
13 |
+
.env
|
14 |
+
.venv
|
15 |
+
env/
|
16 |
+
venv/
|
17 |
+
ENV/
|
18 |
+
env.bak/
|
19 |
+
venv.bak/
|
20 |
+
|
21 |
+
# Byte-compiled / optimized / DLL files
|
22 |
+
__pycache__/
|
23 |
+
*.py[cod]
|
24 |
+
*$py.class
|
25 |
+
|
26 |
+
# Distribution / packaging
|
27 |
+
.Python
|
28 |
+
build/
|
29 |
+
develop-eggs/
|
30 |
+
dist/
|
31 |
+
downloads/
|
32 |
+
eggs/
|
33 |
+
.eggs/
|
34 |
+
lib/
|
35 |
+
lib64/
|
36 |
+
parts/
|
37 |
+
sdist/
|
38 |
+
var/
|
39 |
+
wheels/
|
40 |
+
pip-wheel-metadata/
|
41 |
+
share/python-wheels/
|
42 |
+
*.egg-info/
|
43 |
+
.installed.cfg
|
44 |
+
*.egg
|
45 |
+
MANIFEST
|
46 |
+
|
47 |
+
# Installer logs
|
48 |
+
pip-log.txt
|
49 |
+
pip-delete-this-directory.txt
|
50 |
+
|
51 |
+
# Jupyter Notebook
|
52 |
+
.ipynb_checkpoints
|
LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Copyright (c) 2021 ntt123
|
2 |
+
|
3 |
+
Permission is hereby granted, free of charge, to any person obtaining
|
4 |
+
a copy of this software and associated documentation files (the
|
5 |
+
"Software"), to deal in the Software without restriction, including
|
6 |
+
without limitation the rights to use, copy, modify, merge, publish,
|
7 |
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8 |
+
permit persons to whom the Software is furnished to do so, subject to
|
9 |
+
the following conditions:
|
10 |
+
|
11 |
+
The above copyright notice and this permission notice shall be
|
12 |
+
included in all copies or substantial portions of the Software.
|
13 |
+
|
14 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15 |
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16 |
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17 |
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18 |
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19 |
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20 |
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
README.md
CHANGED
@@ -1,11 +1,118 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
A Vietnamese TTS
|
2 |
+
================
|
3 |
+
|
4 |
+
馃敂 **Notice**: This project is no longer being updated. Please refer to the new project, [LightSpeed](https://github.com/NTT123/light-speed), which includes [a new male voice](https://huggingface.co/spaces/ntt123/Vietnam-male-voice-TTS).
|
5 |
+
|
6 |
+
Duration model + Acoustic model + HiFiGAN vocoder for vietnamese text-to-speech application.
|
7 |
+
|
8 |
+
Online demo at https://huggingface.co/spaces/ntt123/vietTTS.
|
9 |
+
|
10 |
+
A synthesized audio clip: [clip.wav](assets/infore/clip.wav). A colab notebook: [notebook](https://colab.research.google.com/drive/1oczrWOQOr1Y_qLdgis1twSlNZlfPVXoY?usp=sharing).
|
11 |
+
|
12 |
+
|
13 |
+
Checkout the experimental `multi-speaker` branch (`git checkout multi-speaker`) for multi-speaker support.
|
14 |
+
|
15 |
+
Install
|
16 |
+
-------
|
17 |
+
|
18 |
+
|
19 |
+
```sh
|
20 |
+
git clone https://github.com/NTT123/vietTTS.git
|
21 |
+
cd vietTTS
|
22 |
+
pip3 install -e .
|
23 |
+
```
|
24 |
+
|
25 |
+
|
26 |
+
Quick start using pretrained models
|
27 |
+
----------------------------------
|
28 |
+
```sh
|
29 |
+
bash ./scripts/quick_start.sh
|
30 |
+
```
|
31 |
+
|
32 |
+
|
33 |
+
Download InfoRe dataset
|
34 |
+
-----------------------
|
35 |
+
|
36 |
+
```sh
|
37 |
+
python ./scripts/download_aligned_infore_dataset.py
|
38 |
+
```
|
39 |
+
|
40 |
+
**Note**: this is a denoised and aligned version of the original dataset which is donated by the InfoRe Technology company (see [here](https://www.facebook.com/groups/j2team.community/permalink/1010834009248719/)). You can download the original dataset (**InfoRe Technology 1**) at [here](https://github.com/TensorSpeech/TensorFlowASR/blob/main/README.md#vietnamese).
|
41 |
+
|
42 |
+
See `notebooks/denoise_infore_dataset.ipynb` for instructions on how to denoise the dataset. We use the Montreal Forced Aligner (MFA) to align transcript and speech (textgrid files).
|
43 |
+
See `notebooks/align_text_audio_infore_mfa.ipynb` for instructions on how to create textgrid files.
|
44 |
+
|
45 |
+
Train duration model
|
46 |
+
--------------------
|
47 |
+
|
48 |
+
```sh
|
49 |
+
python -m vietTTS.nat.duration_trainer
|
50 |
+
```
|
51 |
+
|
52 |
+
|
53 |
+
Train acoustic model
|
54 |
+
--------------------
|
55 |
+
```sh
|
56 |
+
python -m vietTTS.nat.acoustic_trainer
|
57 |
+
```
|
58 |
+
|
59 |
+
|
60 |
+
|
61 |
+
Train HiFiGAN vocoder
|
62 |
+
-------------
|
63 |
+
|
64 |
+
We use the original implementation from HiFiGAN authors at https://github.com/jik876/hifi-gan. Use the config file at `assets/hifigan/config.json` to train your model.
|
65 |
+
|
66 |
+
```sh
|
67 |
+
git clone https://github.com/jik876/hifi-gan.git
|
68 |
+
|
69 |
+
# create dataset in hifi-gan format
|
70 |
+
ln -sf `pwd`/train_data hifi-gan/data
|
71 |
+
cd hifi-gan/data
|
72 |
+
ls -1 *.TextGrid | sed -e 's/\.TextGrid$//' > files.txt
|
73 |
+
cd ..
|
74 |
+
head -n 100 data/files.txt > val_files.txt
|
75 |
+
tail -n +101 data/files.txt > train_files.txt
|
76 |
+
rm data/files.txt
|
77 |
+
|
78 |
+
# training
|
79 |
+
python train.py \
|
80 |
+
--config ../assets/hifigan/config.json \
|
81 |
+
--input_wavs_dir=data \
|
82 |
+
--input_training_file=train_files.txt \
|
83 |
+
--input_validation_file=val_files.txt
|
84 |
+
```
|
85 |
+
|
86 |
+
Finetune on Ground-Truth Aligned melspectrograms:
|
87 |
+
```sh
|
88 |
+
cd /path/to/vietTTS # go to vietTTS directory
|
89 |
+
python -m vietTTS.nat.zero_silence_segments -o train_data # zero all [sil, sp, spn] segments
|
90 |
+
python -m vietTTS.nat.gta -o /path/to/hifi-gan/ft_dataset # create gta melspectrograms at hifi-gan/ft_dataset directory
|
91 |
+
|
92 |
+
# turn on finetune
|
93 |
+
cd /path/to/hifi-gan
|
94 |
+
python train.py \
|
95 |
+
--fine_tuning True \
|
96 |
+
--config ../assets/hifigan/config.json \
|
97 |
+
--input_wavs_dir=data \
|
98 |
+
--input_training_file=train_files.txt \
|
99 |
+
--input_validation_file=val_files.txt
|
100 |
+
```
|
101 |
+
|
102 |
+
Then, use the following command to convert pytorch model to haiku format:
|
103 |
+
```sh
|
104 |
+
cd ..
|
105 |
+
python -m vietTTS.hifigan.convert_torch_model_to_haiku \
|
106 |
+
--config-file=assets/hifigan/config.json \
|
107 |
+
--checkpoint-file=hifi-gan/cp_hifigan/g_[latest_checkpoint]
|
108 |
+
```
|
109 |
+
|
110 |
+
Synthesize speech
|
111 |
+
-----------------
|
112 |
+
|
113 |
+
```sh
|
114 |
+
python -m vietTTS.synthesizer \
|
115 |
+
--lexicon-file=train_data/lexicon.txt \
|
116 |
+
--text="h么m qua em t峄沬 tr瓢峄漬g" \
|
117 |
+
--output=clip.wav
|
118 |
+
```
|
setup.cfg
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[pep8]
|
2 |
+
max-line-length = 120
|
3 |
+
indent-size = 2
|
4 |
+
|
5 |
+
[pycodestyle]
|
6 |
+
max-line-length = 120
|
7 |
+
|
8 |
+
[yapf]
|
9 |
+
based_on_style = pep8
|
10 |
+
column_limit = 120
|
11 |
+
|
12 |
+
[tool:pytest]
|
13 |
+
testpaths=
|
14 |
+
tests
|
setup.py
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from setuptools import setup
|
2 |
+
|
3 |
+
__version__ = "0.4.1"
|
4 |
+
url = "https://github.com/ntt123/vietTTS"
|
5 |
+
|
6 |
+
install_requires = [
|
7 |
+
"dm-haiku",
|
8 |
+
"einops",
|
9 |
+
"fire",
|
10 |
+
"gdown",
|
11 |
+
"jax",
|
12 |
+
"jaxlib",
|
13 |
+
"librosa",
|
14 |
+
"optax",
|
15 |
+
"tabulate",
|
16 |
+
"textgrid @ git+https://github.com/kylebgorman/textgrid.git",
|
17 |
+
"tqdm",
|
18 |
+
"matplotlib",
|
19 |
+
]
|
20 |
+
setup_requires = []
|
21 |
+
tests_require = []
|
22 |
+
|
23 |
+
setup(
|
24 |
+
name="vietTTS",
|
25 |
+
version=__version__,
|
26 |
+
description="A vietnamese text-to-speech library.",
|
27 |
+
author="ntt123",
|
28 |
+
url=url,
|
29 |
+
keywords=[
|
30 |
+
"text-to-speech",
|
31 |
+
"tts",
|
32 |
+
"deep-learning",
|
33 |
+
"dm-haiku",
|
34 |
+
"jax",
|
35 |
+
"vietnamese",
|
36 |
+
"speech-synthesis",
|
37 |
+
],
|
38 |
+
install_requires=install_requires,
|
39 |
+
setup_requires=setup_requires,
|
40 |
+
tests_require=tests_require,
|
41 |
+
packages=["vietTTS"],
|
42 |
+
python_requires=">=3.7",
|
43 |
+
)
|