File size: 1,984 Bytes
7aa1d35 62526e7 a5e6a5f 62526e7 7aa1d35 96dd639 7aa1d35 62526e7 0698529 62526e7 5f0b562 62526e7 59925dc 0698529 59925dc 0698529 62526e7 59925dc 62526e7 59925dc 62526e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
---
tags:
- audio
- text-to-speech
- onnx
inference: false
language: en
datasets:
- ljspeech
license: apache-2.0
library_name: txtai
---
# ESPnet JETS Text-to-Speech (TTS) Model for ONNX
[imdanboy/jets](https://huggingface.co/imdanboy/ljspeech_tts_train_jets_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave) exported to ONNX. This model is an ONNX export using the [espnet_onnx](https://github.com/espnet/espnet_onnx) library.
## Usage with txtai
[txtai](https://github.com/neuml/txtai) has a built in Text to Speech (TTS) pipeline that makes using this model easy.
```python
import soundfile as sf
from txtai.pipeline import TextToSpeech
# Build pipeline
tts = TextToSpeech("NeuML/ljspeech-jets-onnx")
# Generate speech
speech, rate = tts("Say something here")
# Write to file
sf.write("out.wav", speech, rate)
```
## Usage with ONNX
This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with [ttstokenizer](https://github.com/neuml/ttstokenizer).
Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.
```python
import onnxruntime
import soundfile as sf
import yaml
from ttstokenizer import TTSTokenizer
# This example assumes the files have been downloaded locally
with open("ljspeech-jets-onnx/config.yaml", "r", encoding="utf-8") as f:
config = yaml.safe_load(f)
# Create model
model = onnxruntime.InferenceSession(
"ljspeech-jets-onnx/model.onnx",
providers=["CPUExecutionProvider"]
)
# Create tokenizer
tokenizer = TTSTokenizer(config["token"]["list"])
# Tokenize inputs
inputs = tokenizer("Say something here")
# Generate speech
outputs = model.run(None, {"text": inputs})
# Write to file
sf.write("out.wav", outputs[0], 22050)
```
## How to export
More information on how to export ESPnet models to ONNX can be [found here](https://github.com/espnet/espnet_onnx#text2speech-inference).
|