Spaces:
Runtime error
Runtime error
File size: 8,865 Bytes
45ee559 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
(synthesizing_speech)=
# Synthesizing Speech
First, you need to install TTS. We recommend using PyPi. You need to call the command below:
```bash
$ pip install TTS
```
After the installation, 2 terminal commands are available.
1. TTS Command Line Interface (CLI). - `tts`
2. Local Demo Server. - `tts-server`
3. In πPython. - `from TTS.api import TTS`
## On the Commandline - `tts`
![cli.gif](https://github.com/coqui-ai/TTS/raw/main/images/tts_cli.gif)
After the installation, πΈTTS provides a CLI interface for synthesizing speech using pre-trained models. You can either use your own model or the release models under πΈTTS.
Listing released πΈTTS models.
```bash
tts --list_models
```
Run a TTS model, from the release models list, with its default vocoder. (Simply copy and paste the full model names from the list as arguments for the command below.)
```bash
tts --text "Text for TTS" \
--model_name "<type>/<language>/<dataset>/<model_name>" \
--out_path folder/to/save/output.wav
```
Run a tts and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model.
```bash
tts --text "Text for TTS" \
--model_name "tts_models/<language>/<dataset>/<model_name>" \
--vocoder_name "vocoder_models/<language>/<dataset>/<model_name>" \
--out_path folder/to/save/output.wav
```
Run your own TTS model (Using Griffin-Lim Vocoder)
```bash
tts --text "Text for TTS" \
--model_path path/to/model.pth \
--config_path path/to/config.json \
--out_path folder/to/save/output.wav
```
Run your own TTS and Vocoder models
```bash
tts --text "Text for TTS" \
--config_path path/to/config.json \
--model_path path/to/model.pth \
--out_path folder/to/save/output.wav \
--vocoder_path path/to/vocoder.pth \
--vocoder_config_path path/to/vocoder_config.json
```
Run a multi-speaker TTS model from the released models list.
```bash
tts --model_name "tts_models/<language>/<dataset>/<model_name>" --list_speaker_idxs # list the possible speaker IDs.
tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "tts_models/<language>/<dataset>/<model_name>" --speaker_idx "<speaker_id>"
```
Run a released voice conversion model
```bash
tts --model_name "voice_conversion/<language>/<dataset>/<model_name>"
--source_wav "my/source/speaker/audio.wav"
--target_wav "my/target/speaker/audio.wav"
--out_path folder/to/save/output.wav
```
**Note:** You can use ```./TTS/bin/synthesize.py``` if you prefer running ```tts``` from the TTS project folder.
## On the Demo Server - `tts-server`
<!-- <img src="https://raw.githubusercontent.com/coqui-ai/TTS/main/images/demo_server.gif" height="56"/> -->
![server.gif](https://github.com/coqui-ai/TTS/raw/main/images/demo_server.gif)
You can boot up a demo πΈTTS server to run an inference with your models. Note that the server is not optimized for performance
but gives you an easy way to interact with the models.
The demo server provides pretty much the same interface as the CLI command.
```bash
tts-server -h # see the help
tts-server --list_models # list the available models.
```
Run a TTS model, from the release models list, with its default vocoder.
If the model you choose is a multi-speaker TTS model, you can select different speakers on the Web interface and synthesize
speech.
```bash
tts-server --model_name "<type>/<language>/<dataset>/<model_name>"
```
Run a TTS and a vocoder model from the released model list. Note that not every vocoder is compatible with every TTS model.
```bash
tts-server --model_name "<type>/<language>/<dataset>/<model_name>" \
--vocoder_name "<type>/<language>/<dataset>/<model_name>"
```
## Python πΈTTS API
You can run a multi-speaker and multi-lingual model in Python as
```python
from TTS.api import TTS
# List available πΈTTS models and choose the first one
model_name = TTS().list_models()[0]
# Init TTS
tts = TTS(model_name)
# Run TTS
# β Since this model is multi-speaker and multi-lingual, we must set the target speaker and the language
# Text to speech with a numpy output
wav = tts.tts("This is a test! This is also a test!!", speaker=tts.speakers[0], language=tts.languages[0])
# Text to speech to a file
tts.tts_to_file(text="Hello world!", speaker=tts.speakers[0], language=tts.languages[0], file_path="output.wav")
```
#### Here is an example for a single speaker model.
```python
# Init TTS with the target model name
tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False)
# Run TTS
tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)
```
#### Example voice cloning with YourTTS in English, French and Portuguese:
```python
tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False).to("cuda")
tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr", file_path="output.wav")
tts.tts_to_file("Isso Γ© clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt", file_path="output.wav")
```
#### Example voice conversion converting speaker of the `source_wav` to the speaker of the `target_wav`
```python
tts = TTS(model_name="voice_conversion_models/multilingual/vctk/freevc24", progress_bar=False).to("cuda")
tts.voice_conversion_to_file(source_wav="my/source.wav", target_wav="my/target.wav", file_path="output.wav")
```
#### Example voice cloning by a single speaker TTS model combining with the voice conversion model.
This way, you can clone voices by using any model in πΈTTS.
```python
tts = TTS("tts_models/de/thorsten/tacotron2-DDC")
tts.tts_with_vc_to_file(
"Wie sage ich auf Italienisch, dass ich dich liebe?",
speaker_wav="target/speaker.wav",
file_path="ouptut.wav"
)
```
#### Example text to speech using [πΈCoqui Studio](https://coqui.ai) models.
You can use all of your available speakers in the studio.
[πΈCoqui Studio](https://coqui.ai) API token is required. You can get it from the [account page](https://coqui.ai/account).
You should set the `COQUI_STUDIO_TOKEN` environment variable to use the API token.
```python
# If you have a valid API token set you will see the studio speakers as separate models in the list.
# The name format is coqui_studio/en/<studio_speaker_name>/coqui_studio
models = TTS().list_models()
# Init TTS with the target studio speaker
tts = TTS(model_name="coqui_studio/en/Torcull Diarmuid/coqui_studio", progress_bar=False)
# Run TTS
tts.tts_to_file(text="This is a test.", file_path=OUTPUT_PATH)
# Run TTS with emotion and speed control
tts.tts_to_file(text="This is a test.", file_path=OUTPUT_PATH, emotion="Happy", speed=1.5)
```
If you just need πΈ Coqui Studio speakers, you can use `CS_API`. It is a wrapper around the πΈ Coqui Studio API.
```python
from TTS.api import CS_API
# Init πΈ Coqui Studio API
# you can either set the API token as an environment variable `COQUI_STUDIO_TOKEN` or pass it as an argument.
# XTTS - Best quality and life-like speech in EN
api = CS_API(api_token=<token>, model="XTTS")
api.speakers # all the speakers are available with all the models.
api.list_speakers()
api.list_voices()
wav, sample_rate = api.tts(text="This is a test.", speaker=api.speakers[0].name, emotion="Happy", speed=1.5)
# XTTS-multilingual - Multilingual XTTS with [en, de, es, fr, it, pt, ...] (more langs coming soon)
api = CS_API(api_token=<token>, model="XTTS-multilingual")
api.speakers
api.list_speakers()
api.list_voices()
wav, sample_rate = api.tts(text="This is a test.", speaker=api.speakers[0].name, emotion="Happy", speed=1.5)
# V1 - Fast and lightweight TTS in EN with emotion control.
api = CS_API(api_token=<token>, model="V1")
api.speakers
api.emotions # emotions are only for the V1 model.
api.list_speakers()
api.list_voices()
wav, sample_rate = api.tts(text="This is a test.", speaker=api.speakers[0].name, emotion="Happy", speed=1.5)
```
#### Example text to speech using **Fairseq models in ~1100 languages** π€―.
For these models use the following name format: `tts_models/<lang-iso_code>/fairseq/vits`.
You can find the list of language ISO codes [here](https://dl.fbaipublicfiles.com/mms/tts/all-tts-languages.html) and learn about the Fairseq models [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms).
```python
from TTS.api import TTS
api = TTS(model_name="tts_models/eng/fairseq/vits").to("cuda")
api.tts_to_file("This is a test.", file_path="output.wav")
# TTS with on the fly voice conversion
api = TTS("tts_models/deu/fairseq/vits")
api.tts_with_vc_to_file(
"Wie sage ich auf Italienisch, dass ich dich liebe?",
speaker_wav="target/speaker.wav",
file_path="ouptut.wav"
)
``` |