tuanne123 commited on
Commit
313dc31
verified
1 Parent(s): c01c832

Upload 5 files

Browse files
Files changed (5) hide show
  1. .gitignore +52 -0
  2. LICENSE +20 -0
  3. README.md +118 -11
  4. setup.cfg +14 -0
  5. setup.py +43 -0
.gitignore ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gta/
2
+ train_data/
3
+ test_data/
4
+ assets/infore/
5
+ # IDE files
6
+ .idea
7
+ .vscode
8
+
9
+ # Mac files
10
+ .DS_Store
11
+
12
+ # Environments
13
+ .env
14
+ .venv
15
+ env/
16
+ venv/
17
+ ENV/
18
+ env.bak/
19
+ venv.bak/
20
+
21
+ # Byte-compiled / optimized / DLL files
22
+ __pycache__/
23
+ *.py[cod]
24
+ *$py.class
25
+
26
+ # Distribution / packaging
27
+ .Python
28
+ build/
29
+ develop-eggs/
30
+ dist/
31
+ downloads/
32
+ eggs/
33
+ .eggs/
34
+ lib/
35
+ lib64/
36
+ parts/
37
+ sdist/
38
+ var/
39
+ wheels/
40
+ pip-wheel-metadata/
41
+ share/python-wheels/
42
+ *.egg-info/
43
+ .installed.cfg
44
+ *.egg
45
+ MANIFEST
46
+
47
+ # Installer logs
48
+ pip-log.txt
49
+ pip-delete-this-directory.txt
50
+
51
+ # Jupyter Notebook
52
+ .ipynb_checkpoints
LICENSE ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright (c) 2021 ntt123
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
README.md CHANGED
@@ -1,11 +1,118 @@
1
- ---
2
- title: '1231213123121231'
3
- emoji: 馃搲
4
- colorFrom: green
5
- colorTo: green
6
- sdk: static
7
- pinned: false
8
- license: unknown
9
- ---
10
-
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ A Vietnamese TTS
2
+ ================
3
+
4
+ 馃敂 **Notice**: This project is no longer being updated. Please refer to the new project, [LightSpeed](https://github.com/NTT123/light-speed), which includes [a new male voice](https://huggingface.co/spaces/ntt123/Vietnam-male-voice-TTS).
5
+
6
+ Duration model + Acoustic model + HiFiGAN vocoder for vietnamese text-to-speech application.
7
+
8
+ Online demo at https://huggingface.co/spaces/ntt123/vietTTS.
9
+
10
+ A synthesized audio clip: [clip.wav](assets/infore/clip.wav). A colab notebook: [notebook](https://colab.research.google.com/drive/1oczrWOQOr1Y_qLdgis1twSlNZlfPVXoY?usp=sharing).
11
+
12
+
13
+ Checkout the experimental `multi-speaker` branch (`git checkout multi-speaker`) for multi-speaker support.
14
+
15
+ Install
16
+ -------
17
+
18
+
19
+ ```sh
20
+ git clone https://github.com/NTT123/vietTTS.git
21
+ cd vietTTS
22
+ pip3 install -e .
23
+ ```
24
+
25
+
26
+ Quick start using pretrained models
27
+ ----------------------------------
28
+ ```sh
29
+ bash ./scripts/quick_start.sh
30
+ ```
31
+
32
+
33
+ Download InfoRe dataset
34
+ -----------------------
35
+
36
+ ```sh
37
+ python ./scripts/download_aligned_infore_dataset.py
38
+ ```
39
+
40
+ **Note**: this is a denoised and aligned version of the original dataset which is donated by the InfoRe Technology company (see [here](https://www.facebook.com/groups/j2team.community/permalink/1010834009248719/)). You can download the original dataset (**InfoRe Technology 1**) at [here](https://github.com/TensorSpeech/TensorFlowASR/blob/main/README.md#vietnamese).
41
+
42
+ See `notebooks/denoise_infore_dataset.ipynb` for instructions on how to denoise the dataset. We use the Montreal Forced Aligner (MFA) to align transcript and speech (textgrid files).
43
+ See `notebooks/align_text_audio_infore_mfa.ipynb` for instructions on how to create textgrid files.
44
+
45
+ Train duration model
46
+ --------------------
47
+
48
+ ```sh
49
+ python -m vietTTS.nat.duration_trainer
50
+ ```
51
+
52
+
53
+ Train acoustic model
54
+ --------------------
55
+ ```sh
56
+ python -m vietTTS.nat.acoustic_trainer
57
+ ```
58
+
59
+
60
+
61
+ Train HiFiGAN vocoder
62
+ -------------
63
+
64
+ We use the original implementation from HiFiGAN authors at https://github.com/jik876/hifi-gan. Use the config file at `assets/hifigan/config.json` to train your model.
65
+
66
+ ```sh
67
+ git clone https://github.com/jik876/hifi-gan.git
68
+
69
+ # create dataset in hifi-gan format
70
+ ln -sf `pwd`/train_data hifi-gan/data
71
+ cd hifi-gan/data
72
+ ls -1 *.TextGrid | sed -e 's/\.TextGrid$//' > files.txt
73
+ cd ..
74
+ head -n 100 data/files.txt > val_files.txt
75
+ tail -n +101 data/files.txt > train_files.txt
76
+ rm data/files.txt
77
+
78
+ # training
79
+ python train.py \
80
+ --config ../assets/hifigan/config.json \
81
+ --input_wavs_dir=data \
82
+ --input_training_file=train_files.txt \
83
+ --input_validation_file=val_files.txt
84
+ ```
85
+
86
+ Finetune on Ground-Truth Aligned melspectrograms:
87
+ ```sh
88
+ cd /path/to/vietTTS # go to vietTTS directory
89
+ python -m vietTTS.nat.zero_silence_segments -o train_data # zero all [sil, sp, spn] segments
90
+ python -m vietTTS.nat.gta -o /path/to/hifi-gan/ft_dataset # create gta melspectrograms at hifi-gan/ft_dataset directory
91
+
92
+ # turn on finetune
93
+ cd /path/to/hifi-gan
94
+ python train.py \
95
+ --fine_tuning True \
96
+ --config ../assets/hifigan/config.json \
97
+ --input_wavs_dir=data \
98
+ --input_training_file=train_files.txt \
99
+ --input_validation_file=val_files.txt
100
+ ```
101
+
102
+ Then, use the following command to convert pytorch model to haiku format:
103
+ ```sh
104
+ cd ..
105
+ python -m vietTTS.hifigan.convert_torch_model_to_haiku \
106
+ --config-file=assets/hifigan/config.json \
107
+ --checkpoint-file=hifi-gan/cp_hifigan/g_[latest_checkpoint]
108
+ ```
109
+
110
+ Synthesize speech
111
+ -----------------
112
+
113
+ ```sh
114
+ python -m vietTTS.synthesizer \
115
+ --lexicon-file=train_data/lexicon.txt \
116
+ --text="h么m qua em t峄沬 tr瓢峄漬g" \
117
+ --output=clip.wav
118
+ ```
setup.cfg ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [pep8]
2
+ max-line-length = 120
3
+ indent-size = 2
4
+
5
+ [pycodestyle]
6
+ max-line-length = 120
7
+
8
+ [yapf]
9
+ based_on_style = pep8
10
+ column_limit = 120
11
+
12
+ [tool:pytest]
13
+ testpaths=
14
+ tests
setup.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from setuptools import setup
2
+
3
+ __version__ = "0.4.1"
4
+ url = "https://github.com/ntt123/vietTTS"
5
+
6
+ install_requires = [
7
+ "dm-haiku",
8
+ "einops",
9
+ "fire",
10
+ "gdown",
11
+ "jax",
12
+ "jaxlib",
13
+ "librosa",
14
+ "optax",
15
+ "tabulate",
16
+ "textgrid @ git+https://github.com/kylebgorman/textgrid.git",
17
+ "tqdm",
18
+ "matplotlib",
19
+ ]
20
+ setup_requires = []
21
+ tests_require = []
22
+
23
+ setup(
24
+ name="vietTTS",
25
+ version=__version__,
26
+ description="A vietnamese text-to-speech library.",
27
+ author="ntt123",
28
+ url=url,
29
+ keywords=[
30
+ "text-to-speech",
31
+ "tts",
32
+ "deep-learning",
33
+ "dm-haiku",
34
+ "jax",
35
+ "vietnamese",
36
+ "speech-synthesis",
37
+ ],
38
+ install_requires=install_requires,
39
+ setup_requires=setup_requires,
40
+ tests_require=tests_require,
41
+ packages=["vietTTS"],
42
+ python_requires=">=3.7",
43
+ )