Spaces:

tuanne123
/

1231213123121231

Configuration error

App Files Files Community

tuanne123 commited on May 20, 2024

Commit

313dc31

verified ·

1 Parent(s): c01c832

Upload 5 files

Browse files

Files changed (5) hide show

.gitignore +52 -0
LICENSE +20 -0
README.md +118 -11
setup.cfg +14 -0
setup.py +43 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,52 @@

+gta/
+train_data/
+test_data/
+assets/infore/
+# IDE files
+.idea
+.vscode
+# Mac files
+.DS_Store
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Jupyter Notebook
+.ipynb_checkpoints

LICENSE ADDED Viewed

	@@ -0,0 +1,20 @@

+Copyright (c) 2021 ntt123
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md CHANGED Viewed

@@ -1,11 +1,118 @@
----
-title: '1231213123121231'
-emoji: 📉
-colorFrom: green
-colorTo: green
-sdk: static
-pinned: false
-license: unknown
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+A Vietnamese TTS
+================
+🔔 **Notice**: This project is no longer being updated. Please refer to the new project, [LightSpeed](https://github.com/NTT123/light-speed), which includes [a new male voice](https://huggingface.co/spaces/ntt123/Vietnam-male-voice-TTS).
+Duration model + Acoustic model + HiFiGAN vocoder for vietnamese text-to-speech application.
+Online demo at https://huggingface.co/spaces/ntt123/vietTTS.
+A synthesized audio clip: [clip.wav](assets/infore/clip.wav). A colab notebook: [notebook](https://colab.research.google.com/drive/1oczrWOQOr1Y_qLdgis1twSlNZlfPVXoY?usp=sharing).
+Checkout the experimental `multi-speaker` branch (`git checkout multi-speaker`) for multi-speaker support.
+Install
+-------
+```sh
+git clone https://github.com/NTT123/vietTTS.git
+cd vietTTS
+pip3 install -e .
+```
+Quick start using pretrained models
+----------------------------------
+```sh
+bash ./scripts/quick_start.sh
+```
+Download InfoRe dataset
+-----------------------
+```sh
+python ./scripts/download_aligned_infore_dataset.py
+```
+**Note**: this is a denoised and aligned version of the original dataset which is donated by the InfoRe Technology company (see [here](https://www.facebook.com/groups/j2team.community/permalink/1010834009248719/)). You can download the original dataset (**InfoRe Technology 1**) at [here](https://github.com/TensorSpeech/TensorFlowASR/blob/main/README.md#vietnamese).
+See `notebooks/denoise_infore_dataset.ipynb` for instructions on how to denoise the dataset. We use the Montreal Forced Aligner (MFA) to align transcript and speech (textgrid files).
+See `notebooks/align_text_audio_infore_mfa.ipynb` for instructions on how to create textgrid files.
+Train duration model
+--------------------
+```sh
+python -m vietTTS.nat.duration_trainer
+```
+Train acoustic model
+--------------------
+```sh
+python -m vietTTS.nat.acoustic_trainer
+```
+Train HiFiGAN vocoder
+-------------
+We use the original implementation from HiFiGAN authors at https://github.com/jik876/hifi-gan. Use the config file at `assets/hifigan/config.json` to train your model.
+```sh
+git clone https://github.com/jik876/hifi-gan.git
+# create dataset in hifi-gan format
+ln -sf `pwd`/train_data hifi-gan/data
+cd hifi-gan/data
+ls -1 *.TextGrid | sed -e 's/\.TextGrid$//' > files.txt
+cd ..
+head -n 100 data/files.txt > val_files.txt
+tail -n +101 data/files.txt > train_files.txt
+rm data/files.txt
+# training
+python train.py \
+  --config ../assets/hifigan/config.json \
+  --input_wavs_dir=data \
+  --input_training_file=train_files.txt \
+  --input_validation_file=val_files.txt
+```
+Finetune on Ground-Truth Aligned melspectrograms:
+```sh
+cd /path/to/vietTTS # go to vietTTS directory
+python -m vietTTS.nat.zero_silence_segments -o train_data # zero all [sil, sp, spn] segments
+python -m vietTTS.nat.gta -o /path/to/hifi-gan/ft_dataset  # create gta melspectrograms at hifi-gan/ft_dataset directory
+# turn on finetune
+cd /path/to/hifi-gan
+python train.py \
+  --fine_tuning True \
+  --config ../assets/hifigan/config.json \
+  --input_wavs_dir=data \
+  --input_training_file=train_files.txt \
+  --input_validation_file=val_files.txt
+```
+Then, use the following command to convert pytorch model to haiku format:
+```sh
+cd ..
+python -m vietTTS.hifigan.convert_torch_model_to_haiku \
+  --config-file=assets/hifigan/config.json \
+  --checkpoint-file=hifi-gan/cp_hifigan/g_[latest_checkpoint]
+```
+Synthesize speech
+-----------------
+```sh
+python -m vietTTS.synthesizer \
+  --lexicon-file=train_data/lexicon.txt \
+  --text="hôm qua em tới trường" \
+  --output=clip.wav
+```

setup.cfg ADDED Viewed

	@@ -0,0 +1,14 @@

+[pep8]
+max-line-length = 120
+indent-size = 2
+[pycodestyle]
+max-line-length = 120
+[yapf]
+based_on_style = pep8
+column_limit = 120
+[tool:pytest]
+testpaths=
+  tests

setup.py ADDED Viewed

	@@ -0,0 +1,43 @@

+from setuptools import setup
+__version__ = "0.4.1"
+url = "https://github.com/ntt123/vietTTS"
+install_requires = [
+    "dm-haiku",
+    "einops",
+    "fire",
+    "gdown",
+    "jax",
+    "jaxlib",
+    "librosa",
+    "optax",
+    "tabulate",
+    "textgrid @ git+https://github.com/kylebgorman/textgrid.git",
+    "tqdm",
+    "matplotlib",
+]
+setup_requires = []
+tests_require = []
+setup(
+    name="vietTTS",
+    version=__version__,
+    description="A vietnamese text-to-speech library.",
+    author="ntt123",
+    url=url,
+    keywords=[
+        "text-to-speech",
+        "tts",
+        "deep-learning",
+        "dm-haiku",
+        "jax",
+        "vietnamese",
+        "speech-synthesis",
+    ],
+    install_requires=install_requires,
+    setup_requires=setup_requires,
+    tests_require=tests_require,
+    packages=["vietTTS"],
+    python_requires=">=3.7",
+)