Spaces:
Runtime error
Runtime error
# Umamusume DeBERTA-VITS2 TTS | |
--------------- | |
๐ 2023.10.19 ๐ | |
- Updated current Generator to 180K steps' checkpoint | |
------------------ | |
๐ **Currently, ONLY Japanese is supported.** ๐ | |
๐ช **Based on [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2), this work tightly follows [Akito/umamusume_bert_vits2](https://huggingface.co/spaces/AkitoP/umamusume_bert_vits2), from which the Japanese text preprocessor is provided.** โค | |
--------------- | |
## Instruction for use | ไฝฟ็จ่ฏดๆ | ไฝฟ็จใซใคใใฆใฎ่ชฌๆ | |
โ **Please do NOT enter a really LOOOONG sentence or sentences in a single row. Splitting your inputs into multiple rows makes each row to be inferenced separately. Please avoid completely empty rows, which will lead to weird sounds in the corresponding spaces in the generated audio.** โ | |
โ **่ฏทไธ่ฆๅจไธ่กๅ ่พๅ ฅ่ถ ้ฟๆๆฌ๏ผๆจกๅไผๅฐๆฏ่ก็่พๅ ฅ่งไธบไธๅฅ่ฏ่ฟ่กๆจ็ใๅจไธๅฝฑๅ่ฏญๆ่ฟ่ดฏ็ๆ ๅตไธ๏ผ่ฏทๅฐๅคๅฅ่ฏๅๅซๆพๅ ฅไธๅ็่กไธญๆฅๅๅฐๆจ็ๆถ้ดใ่ฏทๅ ้ค่พๅ ฅไธญ็็ฉบ็ฝ่ก๏ผ่ฟไผๅฏผ่ดๅจ็ๆ็่ฏญ้ณ็ๅฏนๅบไฝ็ฝฎไธญไบง็ๅฅๆช็ๅฃฐ้ณใ** โ | |
โ **้ทใใใใใญในใใไธ่กใซๅ ฅๅใใชใใงใใ ใใใใขใใซใฏๅ่กใไธใคใฎๆใจใใฆๆจ็ใใพใใๆๅณใ็นใใ็ฏๅฒใงใ่คๆฐใฎๆใ็ฐใชใ่กใซๅใใฆๆจ็ๆ้ใ็ญ็ธฎใใฆใใ ใใใ็ฉบ็ฝ่กใฏๅ้คใใฆใใ ใใใใใใ็ๆใใใ้ณๅฃฐใฎๅฏพๅฟ้จๅใงๅฅๅฆใช้ณใ็ใใๅๅ ใจใชใใพใใ** โ | |
------------------------- | |
๐ **When encountering situations where an error occurs, please check if there's rare and difficult CHINISE CHARACTERS in your inputs, and replace them with Hiragana or Katakana.** ๐ | |
๐ **ๅฆๆ็ๆๅบ็ฐไบ้่ฏฏ๏ผ่ฏท้ฆๅ ๆฃๆฅ่พๅ ฅไธญๆฏๅฆๅญๅจ้ๅธธๅฐ่ง็็ๅปๆฑๅญ๏ผๅฆๆๆ๏ผ่ฏทๅฐๅ ถๆฟๆขไธบๅนณๅๅๆ่ ็ๅๅใ** ๐ | |
๐ **็ๆใซ่ชคใใใใๅ ดๅใฏใใพใๅ ฅๅใซ้ๅธธใซ็ใใ้ฃ่งฃใชๆผขๅญใใชใใ็ขบ่ชใใฆใใ ใใใใใๅญๅจใใๅ ดๅใใใใๅนณไปฎๅใพใใฏ็ไปฎๅใซ็ฝฎใๆใใฆใใ ใใใ** ๐ | |
------------------------ | |
๐ **Please make good use of punctuation marks.** ๐ | |
๐ **่ฏทๅ็จๆ ็น็ฌฆๅท็็ฅๅฅๅ้ใ** ๐ | |
๐ **ๅฅ่ชญ็นใฎ้ญๆณใฎๅใใใพใๆดป็จใใฆใใ ใใใ** ๐ | |
--------------------- | |
๐ **What is the Chinese name for the character name? Please refer to [Umamusume Bilibili Wiki](https://wiki.biligame.com/umamusume/%E8%B5%9B%E9%A9%AC%E5%A8%98%E4%B8%80%E8%A7%88).** ๐ | |
๐ **ใญใฃใฉใฎไธญๅฝ่ชๅใฏไฝใงใใ๏ผใใใซใ่ฆงใใ ใใ๏ผ[ใฆใๅจใใชใใชWiki](https://wiki.biligame.com/umamusume/%E8%B5%9B%E9%A9%AC%E5%A8%98%E4%B8%80%E8%A7%88).** ๐ | |
## Training Details - For those who may be interested | |
๐ **This work switches [cl-tohoku/bert-base-japanese-v3](https://huggingface.co/cl-tohoku/bert-base-japanese-v3) to [ku-nlp/deberta-v2-base-japanese](https://huggingface.co/ku-nlp/deberta-v2-base-japanese) expecting potentially better performance, and, just for fun.** ๐ฅฐ | |
โค Thanks to **SUSTech Center for Computational Science and Engineering**. โค This model is trained on A100 (40GB) x 2 with **batch size 32** in total. | |
๐ช This model has been trained for **1 cycle, 180K steps (=120 epoch),** currently. ๐ช | |
๐ This work uses linear with warmup **(7.5% of total steps)** LR scheduler with ` max_lr=1e-4`. ๐ | |
โ This work **clips gradient value to 10** โ. | |
โ Finetuning the model on **single-speaker datasets separately** will definitely reach better result than training on **a huge dataset comprising of many speakers**. Sharing a same model leads to unexpected mixing of the speaker's voice line. โ | |
### TODO: | |
๐ Train one more cycle using text preprocessor provided by [AkitoP](https://huggingface.co/AkitoP) with cleaner text inputs and training data of Mejiro Ramonu. ๐ |