Spaces:
Running
Running
title: Style Bert VITS2 M | |
emoji: ð | |
colorFrom: green | |
colorTo: yellow | |
sdk: gradio | |
sdk_version: 4.40.0 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
# Style-Bert-VITS2 | |
Bert-VITS2 with more controllable voice styles. | |
https://github.com/litagin02/Style-Bert-VITS2/assets/139731664/e853f9a2-db4a-4202-a1dd-56ded3c562a0 | |
- [English README](docs/README_en.md) | |
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/litagin02/Style-Bert-VITS2/blob/master/colab.ipynb) | |
- [ð€ ãªã³ã©ã€ã³ãã¢ã¯ãã¡ããã](https://huggingface.co/spaces/litagin/Style-Bert-VITS2-JVNV) | |
- [Zennã®è§£èª¬èšäº](https://zenn.dev/litagin/articles/034819a5256ff4) | |
- [**ãªãªãŒã¹ããŒãž**](https://github.com/litagin02/Style-Bert-VITS2/releases/)ã[æŽæ°å±¥æŽ](docs/CHANGELOG.md) | |
- 2024-02-09: ver 2.2 | |
- 2024-02-07: ver 2.1 | |
- 2024-02-03: ver 2.0 | |
- 2024-01-09: ver 1.3 | |
- 2023-12-31: ver 1.2 | |
- 2023-12-29: ver 1.1 | |
- 2023-12-27: ver 1.0 | |
This repository is based on [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2) v2.1 and Japanese-Extra, so many thanks to the original author! | |
**æŠèŠ** | |
- å ¥åãããããã¹ãã®å 容ãããšã«ææ è±ããªé³å£°ãçæãã[Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)ã®v2.1ãšJapanese-Extraãå ã«ãææ ãçºè©±ã¹ã¿ã€ã«ã匷匱蟌ã¿ã§èªç±ã«å¶åŸ¡ã§ããããã«ãããã®ã§ãã | |
- GitãPythonããªã人ã§ãïŒWindowsãŠãŒã¶ãŒãªãïŒç°¡åã«ã€ã³ã¹ããŒã«ã§ããåŠç¿ãã§ããŸã (å€ãã[EasyBertVits2](https://github.com/Zuntan03/EasyBertVits2/)ãããåãããŸãã)ããŸãGoogle Colabã§ã®åŠç¿ããµããŒãããŠããŸã: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/litagin02/Style-Bert-VITS2/blob/master/colab.ipynb) | |
- é³å£°åæã®ã¿ã«äœ¿ãå Žåã¯ãã°ã©ãããªããŠãCPUã§åäœããŸãã | |
- ä»ãšã®é£æºã«äœ¿ããAPIãµãŒããŒãå梱ããŠããŸã ([@darai0512](https://github.com/darai0512) æ§ã«ããPRã§ããããããšãããããŸã)ã | |
- å ã ã楜ããããªæç« ã¯æ¥œãããã«ãæ²ããããªæç« ã¯æ²ãããã«ãèªãã®ãBert-VITS2ã®åŒ·ã¿ã§ãã®ã§ãã¹ã¿ã€ã«æå®ãããã©ã«ãã§ãææ è±ããªé³å£°ãçæããããšãã§ããŸãã | |
## 䜿ãæ¹ | |
<!-- 詳ããã¯[ãã¡ã](docs/tutorial.md)ãåç §ããŠãã ããã --> | |
### åäœç°å¢ | |
åUIãšAPI Serverã«ãããŠãWindows ã³ãã³ãããã³ããã»WSL2ã»Linux(Ubuntu Desktop)ã§ã®åäœã確èªããŠããŸã(WSLã§ã®ãã¹æå®ã¯çžå¯Ÿãã¹ãªã©å·¥å€«ãã ãã)ãNVidiaã®GPUãç¡ãå Žåã¯åŠç¿ã¯ã§ããŸãããé³å£°åæãšããŒãžã¯å¯èœã§ãã | |
### ã€ã³ã¹ããŒã« | |
#### GitãPythonã«éŠŽæã¿ãç¡ãæ¹ | |
WindowsãåæãšããŠããŸãã | |
1. [ãã®zipãã¡ã€ã«](https://github.com/litagin02/Style-Bert-VITS2/releases/download/2.2/Style-Bert-VITS2.zip)ã**ãã¹ã«æ¥æ¬èªã空çœãå«ãŸããªãå Žæã«**ããŠã³ããŒãããŠå±éããŸãã | |
- ã°ã©ããããæ¹ã¯ã`Install-Style-Bert-VITS2.bat`ãããã«ã¯ãªãã¯ããŸãã | |
- ã°ã©ãããªãæ¹ã¯ã`Install-Style-Bert-VITS2-CPU.bat`ãããã«ã¯ãªãã¯ããŸããCPUçã§ã¯åŠç¿ã¯ã§ããŸããããé³å£°åæãšããŒãžã¯å¯èœã§ãã | |
2. åŸ ã€ãšèªåã§å¿ èŠãªç°å¢ãã€ã³ã¹ããŒã«ãããŸãã | |
3. ãã®åŸãèªåçã«é³å£°åæããããã®WebUIãèµ·åãããã€ã³ã¹ããŒã«æåã§ããããã©ã«ãã®ã¢ãã«ãããŠã³ããŒãããããŠããã®ã§ããã®ãŸãŸéã¶ããšãã§ããŸãã | |
ãŸãã¢ããããŒãããããå Žåã¯ã`Update-Style-Bert-VITS2.bat`ãããã«ã¯ãªãã¯ããŠãã ããããã ã**1.x**ãã**2.x**ãžã¢ããããŒãããå Žåã¯ã[ãã®batãã¡ã€ã«](https://github.com/litagin02/Style-Bert-VITS2/releases/download/2.2/Update-to-JP-Extra.bat)ã`Style-Bert-VITS2`ãã©ã«ãããããã©ã«ãïŒ`Update-Style-Bert-VITS2.bat`çããããã©ã«ãïŒãžä¿åããŠããããã«ã¯ãªãã¯ããŠãã ããã | |
#### GitãPython䜿ãã人 | |
```bash | |
git clone https://github.com/litagin02/Style-Bert-VITS2.git | |
cd Style-Bert-VITS2 | |
python -m venv venv | |
venv\Scripts\activate | |
# PyTorch 2.2.xç³»ã¯ä»ã®ãšããã¯åŠç¿ãšã©ãŒãåºãã®ã§åã®ããŒãžã§ã³ã䜿ã | |
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118 | |
pip install -r requirements.txt | |
python initialize.py # å¿ èŠãªã¢ãã«ãšããã©ã«ãTTSã¢ãã«ãããŠã³ããŒã | |
``` | |
æåŸãå¿ããã«ã | |
### é³å£°åæ | |
`App.bat`ãããã«ã¯ãªãã¯ãã`python app.py`ãããšWebUIãèµ·åããŸãïŒ`python app.py --cpu`ã§CPUã¢ãŒãã§èµ·åãåŠç¿äžãã§ãã¯ã«äŸ¿å©ã§ãïŒãã€ã³ã¹ããŒã«æã«ããã©ã«ãã®ã¢ãã«ãããŠã³ããŒããããŠããã®ã§ãåŠç¿ããŠããªããŠãããã䜿ãããšãã§ããŸãã | |
é³å£°åæã«å¿ èŠãªã¢ãã«ãã¡ã€ã«ãã¡ã®æ§é ã¯ä»¥äžã®éãã§ãïŒæåã§é 眮ããå¿ èŠã¯ãããŸããïŒã | |
``` | |
model_assets | |
âââ your_model | |
â âââ config.json | |
â âââ your_model_file1.safetensors | |
â âââ your_model_file2.safetensors | |
â âââ ... | |
â âââ style_vectors.npy | |
âââ another_model | |
âââ ... | |
``` | |
ãã®ããã«ãæšè«ã«ã¯`config.json`ãš`*.safetensors`ãš`style_vectors.npy`ãå¿ èŠã§ããã¢ãã«ãå ±æããå Žåã¯ããã®3ã€ã®ãã¡ã€ã«ãå ±æããŠãã ããã | |
ãã®ãã¡`style_vectors.npy`ã¯ã¹ã¿ã€ã«ãå¶åŸ¡ããããã«å¿ èŠãªãã¡ã€ã«ã§ãåŠç¿ã®æã«ããã©ã«ãã§å¹³åã¹ã¿ã€ã«ãNeutralããçæãããŸãã | |
è€æ°ã¹ã¿ã€ã«ã䜿ã£ãŠãã詳ããã¹ã¿ã€ã«ãå¶åŸ¡ãããæ¹ã¯ãäžã®ãã¹ã¿ã€ã«ã®çæããåç §ããŠãã ããïŒå¹³åã¹ã¿ã€ã«ã®ã¿ã§ããåŠç¿ããŒã¿ãææ è±ããªãã°ååææ è±ããªé³å£°ãçæãããŸãïŒã | |
### åŠç¿ | |
åŠç¿ã«ã¯2-14ç§çšåºŠã®é³å£°ãã¡ã€ã«ãè€æ°ãšããããã®æžãèµ·ããããŒã¿ãå¿ èŠã§ãã | |
- æ¢åã³ãŒãã¹ãªã©ã§ãã§ã«åå²ãããé³å£°ãã¡ã€ã«ãšæžãèµ·ããããŒã¿ãããå Žåã¯ãã®ãŸãŸïŒå¿ èŠã«å¿ããŠæžãèµ·ãããã¡ã€ã«ãä¿®æ£ããŠïŒäœ¿ããŸããäžã®ãåŠç¿WebUIããåç §ããŠãã ããã | |
- ããã§ãªãå ŽåãïŒé·ãã¯åããªãïŒé³å£°ãã¡ã€ã«ã®ã¿ãããã°ãããããåŠç¿ã«ããã«äœ¿ããããã«ããŒã¿ã»ãããäœãããã®ããŒã«ãå梱ããŠããŸãã | |
#### ããŒã¿ã»ããäœã | |
- `Dataset.bat`ãããã«ã¯ãªãã¯ã`python webui_dataset.py`ãããšãé³å£°ãã¡ã€ã«ããããŒã¿ã»ãããäœãããã®WebUIãèµ·åããŸãïŒé³å£°ãã¡ã€ã«ãé©åãªé·ãã«ã¹ã©ã€ã¹ãããã®åŸã«æåã®æžãèµ·ãããèªåã§è¡ããŸãïŒã | |
- æ瀺ã«åŸã£ãåŸãéããŠäžã®ãåŠç¿WebUIãã§ãã®ãŸãŸåŠç¿ãè¡ãããšãã§ããŸãã | |
泚æ: ããŒã¿ã»ããã®æåä¿®æ£ããã€ãºé€å»çã现ããä¿®æ£ãè¡ãããå Žåã¯[Aivis](https://github.com/tsukumijima/Aivis)ãããã®ããŒã¿ã»ããéšåã®Windows察å¿ç [Aivis Dataset](https://github.com/litagin02/Aivis-Dataset) ã䜿ããšãããããããŸãããã§ãããã¡ã€ã«æ°ãå€ãå Žåãªã©ã¯ããã®ããŒã«ã§ç°¡æçã«åãåºããŠããŒã¿ã»ãããäœãã ãã§ãååãšããæ°ãããŠããŸãã | |
ããŒã¿ã»ãããã©ã®ãããªãã®ããããã¯åèªè©Šè¡é¯èª€äžããŠãã ããã | |
#### åŠç¿WebUI | |
- `Train.bat`ãããã«ã¯ãªãã¯ã`python webui_train.py`ãããšWebUIãèµ·åããã®ã§æ瀺ã«åŸã£ãŠãã ããã | |
### ã¹ã¿ã€ã«ã®çæ | |
- ããã©ã«ãã¹ã¿ã€ã«ãNeutralã以å€ã®ã¹ã¿ã€ã«ã䜿ããã人åãã§ãã | |
- `Style.bat`ãããã«ã¯ãªãã¯ã`python webui_style_vectors.py`ãããšWebUIãèµ·åããŸãã | |
- åŠç¿ãšã¯ç¬ç«ããŠããã®ã§ãåŠç¿äžã§ãã§ããããåŠç¿ãçµãã£ãŠãäœåºŠããããªãããŸãïŒååŠçã¯çµããããŠããå¿ èŠããããŸãïŒã | |
- ã¹ã¿ã€ã«ã«ã€ããŠã®ä»æ§ã®è©³çŽ°ã¯[clustering.ipynb](clustering.ipynb)ãåç §ããŠãã ããã | |
### API Server | |
æ§ç¯ããç°å¢äžã§`python server_fastapi.py`ãããšAPIãµãŒããŒãèµ·åããŸãã | |
APIä»æ§ã¯èµ·ååŸã«`/docs`ã«ãŠç¢ºèªãã ããã | |
- å ¥åæåæ°ã¯ããã©ã«ãã§100æåãäžéãšãªã£ãŠããŸããããã¯`config.yml`ã®`server.limit`ã§å€æŽã§ããŸãã | |
- ããã©ã«ãã§ã¯CORSèšå®ãå šãŠã®ãã¡ã€ã³ã§èš±å¯ããŠããŸããã§ããéãã`config.yml`ã®`server.origins`ã®å€ãå€æŽããä¿¡é Œã§ãããã¡ã€ã³ã«å¶éãã ãã(ããŒãæ¶ãã°CORSèšå®ãç¡å¹ã«ã§ããŸã)ã | |
### ããŒãž | |
2ã€ã®ã¢ãã«ããã声質ãã声ã®é«ãããææ è¡šçŸãããã³ããã®4ç¹ã§æ··ãåãããŠãæ°ããã¢ãã«ãäœãããšãåºæ¥ãŸãã | |
`Merge.bat`ãããã«ã¯ãªãã¯ã`python webui_merge.py`ãããšWebUIãèµ·åããŸãã | |
### èªç¶æ§è©äŸ¡ | |
åŠç¿çµæã®ãã¡ã©ã®ã¹ãããæ°ããããã®ãäžã€ã®ãææšãšããŠã[SpeechMOS](https://github.com/tarepan/SpeechMOS) ã䜿ãã¹ã¯ãªãããçšæããŠããŸã: | |
```bash | |
python speech_mos.py -m <model_name> | |
``` | |
ã¹ãããããšã®èªç¶æ§è©äŸ¡ã衚瀺ããã`mos_results`ãã©ã«ãã®`mos_{model_name}.csv`ãš`mos_{model_name}.png`ã«çµæãä¿åããããèªã¿äžãããããæç« ãå€ãããã£ããäžã®ãã¡ã€ã«ãåŒã£ãŠåèªèª¿æŽããŠãã ããããŸããããŸã§ã¢ã¯ã»ã³ããææ è¡šçŸãææãå šãèããªãåºæºã§ã®è©äŸ¡ã§ãç®å®ã®ã²ãšã€ãªã®ã§ãå®éã«èªã¿äžããããŠéžå¥ããã®ãäžçªã ãšæããŸãã | |
## Bert-VITS2ãšã®é¢ä¿ | |
åºæ¬çã«ã¯Bert-VITS2ã®ã¢ãã«æ§é ãå°ãæ¹é ããã ãã§ãã[æ§äºååŠç¿ã¢ãã«](https://huggingface.co/litagin/Style-Bert-VITS2-1.0-base)ã[JP-Extraã®äºååŠç¿ã¢ãã«](https://huggingface.co/litagin/Style-Bert-VITS2-2.0-base-JP-Extra)ããå®è³ªBert-VITS2 v2.1 or JP-Extraãšåããã®ã䜿çšããŠããŸãïŒäžèŠãªéã¿ãåã£ãŠsafetensorsã«å€æãããã®ïŒã | |
å ·äœçã«ã¯ä»¥äžã®ç¹ãç°ãªããŸãã | |
- [EasyBertVits2](https://github.com/Zuntan03/EasyBertVits2)ã®ããã«ãPythonãGitãç¥ããªã人ã§ãç°¡åã«äœ¿ããã | |
- ææ åã蟌ã¿ã®ã¢ãã«ãå€æŽïŒ256次å ã®[wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM)ãžãææ åã蟌ã¿ãšããããã¯è©±è èå¥ã®ããã®åã蟌ã¿ïŒ | |
- ææ åã蟌ã¿ããã¯ãã«éååãåãæããåãªãå šçµåå±€ã«ã | |
- ã¹ã¿ã€ã«ãã¯ãã«ãã¡ã€ã«`style_vectors.npy`ãäœãããšã§ããã®ã¹ã¿ã€ã«ã䜿ã£ãŠå¹æã®åŒ·ããé£ç¶çã«æå®ãã€ã€é³å£°ãçæããããšãã§ããã | |
- åçš®WebUIãäœæ | |
- bf16ã§ã®åŠç¿ã®ãµããŒã | |
- safetensors圢åŒã®ãµããŒããããã©ã«ãã§safetensorsã䜿çšããããã« | |
- ãã®ä»è»œåŸ®ãªbugfixããªãã¡ã¯ã¿ãªã³ã° | |
## TODO | |
- [x] ããã©ã«ãã®JVNVã¢ãã«ã«JP-Extraçã®ãã®ãè¿œå | |
- [x] LinuxãWSLçãWindowsã®éåžžç°å¢ä»¥å€ã§ã®ãµããŒã â ããããåé¡ãªããšã®å ±åãã | |
- [x] è€æ°è©±è åŠç¿ã§ã®é³å£°åæ察å¿ïŒåŠç¿ã¯çŸåšã§ãå¯èœïŒ | |
- [x] `server_fastapi.py`ã®å¯Ÿå¿ããšãã«APIã§äœ¿ããããã«ãªããšå¬ãã人ãå¢ããã®ãããããªã | |
- [x] ã¢ãã«ã®ããŒãžã§å£°é³ãšææ è¡šçŸãæ··ããæ©èœã®å®è£ | |
- [ ] è±èªçå€èšèªå¯Ÿå¿ïŒ | |
## References | |
In addition to the original reference (written below), I used the following repositories: | |
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2) | |
- [EasyBertVits2](https://github.com/Zuntan03/EasyBertVits2) | |
[The pretrained model](https://huggingface.co/litagin/Style-Bert-VITS2-1.0-base) and [JP-Extra version](https://huggingface.co/litagin/Style-Bert-VITS2-2.0-base-JP-Extra) is essentially taken from [the original base model of Bert-VITS2 v2.1](https://huggingface.co/Garydesu/bert-vits2_base_model-2.1) and [JP-Extra pretrained model of Bert-VITS2](https://huggingface.co/Stardust-minus/Bert-VITS2-Japanese-Extra), so all the credits go to the original author ([Fish Audio](https://github.com/fishaudio)): | |
Below is the original README.md. | |
--- | |
<div align="center"> | |
<img alt="LOGO" src="https://cdn.jsdelivr.net/gh/fishaudio/fish-diffusion@main/images/logo_512x512.png" width="256" height="256" /> | |
# Bert-VITS2 | |
VITS2 Backbone with multilingual bert | |
For quick guide, please refer to `webui_preprocess.py`. | |
ç®ææçšè¯·åè§ `webui_preprocess.py`ã | |
## 请泚æïŒæ¬é¡¹ç®æ žå¿æè·¯æ¥æºäº[anyvoiceai/MassTTS](https://github.com/anyvoiceai/MassTTS) äžäžªé垞奜çttsé¡¹ç® | |
## MassTTSçæŒç€ºdemo䞺[aiçå³°å¥éè¯å³°å¥æ¬äºº,并æŸåäºåšéäžè§å€±èœçè °å](https://www.bilibili.com/video/BV1w24y1c7z9) | |
[//]: # (## æ¬é¡¹ç®äž[PlayVoice/vits_chinese](https://github.com/PlayVoice/vits_chinese) 没æä»»äœå ³ç³») | |
[//]: # () | |
[//]: # (æ¬ä»åºæ¥æºäºä¹åæåå享äºaiå³°å¥çè§é¢ïŒæ¬äººè¢«å ¶äžçæææè³ïŒåšèªå·±å°è¯MassTTS以ååç°fsåšé³èŽšæ¹é¢äžvitsæäžå®å·®è·ïŒå¹¶äžtrainingçpipelineæ¯vitsæŽå€æïŒå æ€æç §å ¶æè·¯å°bert) | |
## æççæ è¡è /åŒæè /è°é¿/å士/sensei/çé人/åµåµé²/Våºåœåé 代ç èªå·±åŠä¹ åŠäœè®ç»ã | |
### 䞥çŠå°æ€é¡¹ç®çšäºäžåè¿åãäžå人æ°å ±ååœå®ªæ³ãïŒãäžå人æ°å ±ååœåæ³ãïŒãäžå人æ°å ±ååœæ²»å®ç®¡çå€çœæ³ãåãäžå人æ°å ±ååœæ°æ³å žãä¹çšéã | |
### 䞥çŠçšäºä»»äœæ¿æ²»çžå ³çšéã | |
#### Video:https://www.bilibili.com/video/BV1hp4y1K78E | |
#### Demo:https://www.bilibili.com/video/BV1TF411k78w | |
#### QQ GroupïŒ815818430 | |
## References | |
+ [anyvoiceai/MassTTS](https://github.com/anyvoiceai/MassTTS) | |
+ [jaywalnut310/vits](https://github.com/jaywalnut310/vits) | |
+ [p0p4k/vits2_pytorch](https://github.com/p0p4k/vits2_pytorch) | |
+ [svc-develop-team/so-vits-svc](https://github.com/svc-develop-team/so-vits-svc) | |
+ [PaddlePaddle/PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech) | |
+ [emotional-vits](https://github.com/innnky/emotional-vits) | |
+ [fish-speech](https://github.com/fishaudio/fish-speech) | |
+ [Bert-VITS2-UI](https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI) | |
## æè°¢ææ莡ç®è äœåºçåªå | |
<a href="https://github.com/fishaudio/Bert-VITS2/graphs/contributors" target="_blank"> | |
<img src="https://contrib.rocks/image?repo=fishaudio/Bert-VITS2"/> | |
</a> | |
[//]: # (# æ¬é¡¹ç®ææ代ç åŒçšåå·²åæïŒbertéšå代ç æè·¯æ¥æºäº[AIå³°å¥](https://www.bilibili.com/video/BV1w24y1c7z9)ïŒäž[vits_chinese](https://github.com/PlayVoice/vits_chinese)æ ä»»äœå ³ç³»ã欢è¿åäœæ¥é 代ç ãåæ¶ïŒæ们ä¹å¯¹è¯¥åŒåè ç[碰ç·ïŒä¹è³åŒçåŒåè çè¡äžº](https://www.bilibili.com/read/cv27101514/)衚瀺区ç谎莣ã) | |