File size: 2,169 Bytes
fc0ddb5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
tags:
- audio
- text-to-speech
- onnx
base_model:
  - hexgrad/Kokoro-82M
inference: false
language: en
license: apache-2.0
library_name: txtai
---

# Kokoro Base (82M) Model for ONNX

[Kokoro 82M](https://huggingface.co/hexgrad/Kokoro-82M) export to ONNX. This model is the same ONNX file that's in the base repository. The voices file is from [this repository](https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files)

## Usage with txtai

[txtai](https://github.com/neuml/txtai) has a built in Text to Speech (TTS) pipeline that makes using this model easy.

_Note: This requires txtai >= 8.3.0. Install from GitHub until that release._

```python
import soundfile as sf

from txtai.pipeline import TextToSpeech

# Build pipeline
tts = TextToSpeech("NeuML/kokoro-base-onnx")

# Generate speech
speech, rate = tts("Say something here")

# Write to file
sf.write("out.wav", speech, rate)
```

## Usage with ONNX

This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with [ttstokenizer](https://github.com/neuml/ttstokenizer). `ttstokenizer` is a permissively licensed library with no external dependencies (such as espeak).

Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.

```python
import json
import numpy as np
import onnxruntime
import soundfile as sf

from ttstokenizer import IPATokenizer

# This example assumes the files have been downloaded locally
with open("kokoro-base-onnx/voices.json", "r", encoding="utf-8") as f:
    voices = json.load(f)

# Create model
model = onnxruntime.InferenceSession(
    "kokoro-base-onnx/model.onnx",
    providers=["CPUExecutionProvider"]
)

# Create tokenizer
tokenizer = IPATokenizer()

# Tokenize inputs
inputs = tokenizer("Say something here")

# Get speaker array
speaker = np.array(self.voices["af"], dtype=np.float32)

# Generate speech
outputs = model.run(None, {
    "tokens": [[0, *inputs, 0]],
    "style": speaker[len(inputs)],
    "speed": np.ones(1, dtype=np.float32) * 1.0
})

# Write to file
sf.write("out.wav", outputs[0], 24000)
```