davidmezzetti commited on
Commit
3d735e1
1 Parent(s): f59ea02

Add model files

Browse files
Files changed (3) hide show
  1. README.md +194 -0
  2. config.yaml +92 -0
  3. model.onnx +3 -0
README.md ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - audio
4
+ - text-to-speech
5
+ - onnx
6
+ inference: false
7
+ language: en
8
+ datasets:
9
+ - CSTR-Edinburgh/vctk
10
+ license: apache-2.0
11
+ library_name: txtai
12
+ ---
13
+
14
+ # ESPnet VITS Text-to-Speech (TTS) Model for ONNX
15
+
16
+ [espnet/kan-bayashi_vctk_vits](https://huggingface.co/espnet/kan-bayashi_vctk_tts_train_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave) exported to ONNX. This model is an ONNX export using the [espnet_onnx](https://github.com/espnet/espnet_onnx) library.
17
+
18
+ ## Usage with txtai
19
+
20
+ [txtai](https://github.com/neuml/txtai) has a built in Text to Speech (TTS) pipeline that makes using this model easy.
21
+
22
+ _Note the following example requires txtai >= 7.5_
23
+
24
+ ```python
25
+ import soundfile as sf
26
+
27
+ from txtai.pipeline import TextToSpeech
28
+
29
+ # Build pipeline
30
+ tts = TextToSpeech("NeuML/vctk-vits-onnx")
31
+
32
+ # Generate speech with speaker id
33
+ speech = tts("Say something here", speaker=42)
34
+
35
+ # Write to file
36
+ sf.write("out.wav", speech, 22050)
37
+ ```
38
+
39
+ ## Usage with ONNX
40
+
41
+ This model can also be run directly with ONNX provided the input text is tokenized. Tokenization can be done with [ttstokenizer](https://github.com/neuml/ttstokenizer).
42
+
43
+ Note that the txtai pipeline has additional functionality such as batching large inputs together that would need to be duplicated with this method.
44
+
45
+ ```python
46
+ import numpy as np
47
+ import onnxruntime
48
+ import soundfile as sf
49
+ import yaml
50
+
51
+ from ttstokenizer import TTSTokenizer
52
+
53
+ # This example assumes the files have been downloaded locally
54
+ with open("vctk-vits-onnx/config.yaml", "r", encoding="utf-8") as f:
55
+ config = yaml.safe_load(f)
56
+
57
+ # Create model
58
+ model = onnxruntime.InferenceSession(
59
+ "vctk-vits-onnx/model.onnx",
60
+ providers=["CPUExecutionProvider"]
61
+ )
62
+
63
+ # Create tokenizer
64
+ tokenizer = TTSTokenizer(config["token"]["list"])
65
+
66
+ # Tokenize inputs
67
+ inputs = tokenizer("Say something here")
68
+
69
+ # Generate speech
70
+ outputs = model.run(None, {"text": inputs, "sids": np.array([42])})
71
+
72
+ # Write to file
73
+ sf.write("out.wav", outputs[0], 22050)
74
+ ```
75
+
76
+ ## How to export
77
+
78
+ More information on how to export ESPnet models to ONNX can be [found here](https://github.com/espnet/espnet_onnx#text2speech-inference).
79
+
80
+ ## Speaker reference
81
+
82
+ The [CSTR VCTK Corpus](https://datashare.ed.ac.uk/handle/10283/3443) includes speech data uttered by native speakers of English with various accents.
83
+
84
+ When using this model, set a `speaker` id from the reference table below. The `ref` column corresponds to the id in the VCTK dataset.
85
+
86
+ | SPEAKER | REF | AGE | GENDER | ACCENTS | REGION |
87
+ |----------:|-----:|------:|:---------|:---------------|:-----------------|
88
+ | 1 | 225 | 23 | F | English | Southern England |
89
+ | 2 | 226 | 22 | M | English | Surrey |
90
+ | 3 | 227 | 38 | M | English | Cumbria |
91
+ | 4 | 228 | 22 | F | English | Southern England |
92
+ | 5 | 229 | 23 | F | English | Southern England |
93
+ | 6 | 230 | 22 | F | English | Stockton-on-tees |
94
+ | 7 | 231 | 23 | F | English | Southern England |
95
+ | 8 | 232 | 23 | M | English | Southern England |
96
+ | 9 | 233 | 23 | F | English | Staffordshire |
97
+ | 10 | 234 | 22 | F | Scottish | West Dumfries |
98
+ | 11 | 236 | 23 | F | English | Manchester |
99
+ | 12 | 237 | 22 | M | Scottish | Fife |
100
+ | 13 | 238 | 22 | F | Northern Irish | Belfast |
101
+ | 14 | 239 | 22 | F | English | SW England |
102
+ | 15 | 240 | 21 | F | English | Southern England |
103
+ | 16 | 241 | 21 | M | Scottish | Perth |
104
+ | 17 | 243 | 22 | M | English | London |
105
+ | 18 | 244 | 22 | F | English | Manchester |
106
+ | 19 | 245 | 25 | M | Irish | Dublin |
107
+ | 20 | 246 | 22 | M | Scottish | Selkirk |
108
+ | 21 | 247 | 22 | M | Scottish | Argyll |
109
+ | 22 | 248 | 23 | F | Indian | |
110
+ | 23 | 249 | 22 | F | Scottish | Aberdeen |
111
+ | 24 | 250 | 22 | F | English | SE England |
112
+ | 25 | 251 | 26 | M | Indian | |
113
+ | 26 | 252 | 22 | M | Scottish | Edinburgh |
114
+ | 27 | 253 | 22 | F | Welsh | Cardiff |
115
+ | 28 | 254 | 21 | M | English | Surrey |
116
+ | 29 | 255 | 19 | M | Scottish | Galloway |
117
+ | 30 | 256 | 24 | M | English | Birmingham |
118
+ | 31 | 257 | 24 | F | English | Southern England |
119
+ | 32 | 258 | 22 | M | English | Southern England |
120
+ | 33 | 259 | 23 | M | English | Nottingham |
121
+ | 34 | 260 | 21 | M | Scottish | Orkney |
122
+ | 35 | 261 | 26 | F | Northern Irish | Belfast |
123
+ | 36 | 262 | 23 | F | Scottish | Edinburgh |
124
+ | 37 | 263 | 22 | M | Scottish | Aberdeen |
125
+ | 38 | 264 | 23 | F | Scottish | West Lothian |
126
+ | 39 | 265 | 23 | F | Scottish | Ross |
127
+ | 40 | 266 | 22 | F | Irish | Athlone |
128
+ | 41 | 267 | 23 | F | English | Yorkshire |
129
+ | 42 | 268 | 23 | F | English | Southern England |
130
+ | 43 | 269 | 20 | F | English | Newcastle |
131
+ | 44 | 270 | 21 | M | English | Yorkshire |
132
+ | 45 | 271 | 19 | M | Scottish | Fife |
133
+ | 46 | 272 | 23 | M | Scottish | Edinburgh |
134
+ | 47 | 273 | 23 | M | English | Suffolk |
135
+ | 48 | 274 | 22 | M | English | Essex |
136
+ | 49 | 275 | 23 | M | Scottish | Midlothian |
137
+ | 50 | 276 | 24 | F | English | Oxford |
138
+ | 51 | 277 | 23 | F | English | NE England |
139
+ | 52 | 278 | 22 | M | English | Cheshire |
140
+ | 53 | 279 | 23 | M | English | Leicester |
141
+ | 54 | 280 | | | Unknown | |
142
+ | 55 | 281 | 29 | M | Scottish | Edinburgh |
143
+ | 56 | 282 | 23 | F | English | Newcastle |
144
+ | 57 | 283 | 24 | F | Irish | Cork |
145
+ | 58 | 284 | 20 | M | Scottish | Fife |
146
+ | 59 | 285 | 21 | M | Scottish | Edinburgh |
147
+ | 60 | 286 | 23 | M | English | Newcastle |
148
+ | 61 | 287 | 23 | M | English | York |
149
+ | 62 | 288 | 22 | F | Irish | Dublin |
150
+ | 63 | 292 | 23 | M | Northern Irish | Belfast |
151
+ | 64 | 293 | 22 | F | Northern Irish | Belfast |
152
+ | 65 | 294 | 33 | F | American | San Francisco |
153
+ | 66 | 295 | 23 | F | Irish | Dublin |
154
+ | 67 | 297 | 20 | F | American | New York |
155
+ | 68 | 298 | 19 | M | Irish | Tipperary |
156
+ | 69 | 299 | 25 | F | American | California |
157
+ | 70 | 300 | 23 | F | American | California |
158
+ | 71 | 301 | 23 | F | American | North Carolina |
159
+ | 72 | 302 | 20 | M | Canadian | Montreal |
160
+ | 73 | 303 | 24 | F | Canadian | Toronto |
161
+ | 74 | 304 | 22 | M | Northern Irish | Belfast |
162
+ | 75 | 305 | 19 | F | American | Philadelphia |
163
+ | 76 | 306 | 21 | F | American | New York |
164
+ | 77 | 307 | 23 | F | Canadian | Ontario |
165
+ | 78 | 308 | 18 | F | American | Alabama |
166
+ | 79 | 310 | 21 | F | American | Tennessee |
167
+ | 80 | 311 | 21 | M | American | Iowa |
168
+ | 81 | 312 | 19 | F | Canadian | Hamilton |
169
+ | 82 | 313 | 24 | F | Irish | County Down |
170
+ | 83 | 314 | 26 | F | South African | Cape Town |
171
+ | 84 | 316 | 20 | M | Canadian | Alberta |
172
+ | 85 | 317 | 23 | F | Canadian | Hamilton |
173
+ | 86 | 318 | 32 | F | American | Napa |
174
+ | 87 | 323 | 19 | F | South African | Pretoria |
175
+ | 88 | 326 | 26 | M | Australian | Sydney |
176
+ | 89 | 329 | 23 | F | American | |
177
+ | 90 | 330 | 26 | F | American | |
178
+ | 91 | 333 | 19 | F | American | Indiana |
179
+ | 92 | 334 | 18 | M | American | Chicago |
180
+ | 93 | 335 | 25 | F | New Zealand | English |
181
+ | 94 | 336 | 18 | F | South African | Johannesburg |
182
+ | 95 | 339 | 21 | F | American | Pennsylvania |
183
+ | 96 | 340 | 18 | F | Irish | Dublin |
184
+ | 97 | 341 | 26 | F | American | Ohio |
185
+ | 98 | 343 | 27 | F | Canadian | Alberta |
186
+ | 99 | 345 | 22 | M | American | Florida |
187
+ | 100 | 347 | 26 | M | South African | Johannesburg |
188
+ | 101 | 351 | 21 | F | Northern Irish | Derry |
189
+ | 102 | 360 | 19 | M | American | New Jersey |
190
+ | 103 | 361 | 19 | F | American | New Jersey |
191
+ | 104 | 362 | 29 | F | American | |
192
+ | 105 | 363 | 22 | M | Canadian | Toronto |
193
+ | 106 | 364 | 23 | M | Irish | Donegal |
194
+ | 107 | 374 | 28 | M | Australian | English |
config.yaml ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ normalize:
2
+ use_normalize: false
3
+ text_cleaner:
4
+ cleaner_types:
5
+ - tacotron
6
+ token:
7
+ list:
8
+ - <blank>
9
+ - <unk>
10
+ - AH0
11
+ - T
12
+ - N
13
+ - S
14
+ - R
15
+ - IH1
16
+ - D
17
+ - L
18
+ - .
19
+ - Z
20
+ - DH
21
+ - K
22
+ - W
23
+ - M
24
+ - AE1
25
+ - EH1
26
+ - AA1
27
+ - IH0
28
+ - IY1
29
+ - AH1
30
+ - B
31
+ - P
32
+ - V
33
+ - ER0
34
+ - F
35
+ - HH
36
+ - AY1
37
+ - EY1
38
+ - UW1
39
+ - IY0
40
+ - AO1
41
+ - OW1
42
+ - G
43
+ - ','
44
+ - NG
45
+ - SH
46
+ - Y
47
+ - JH
48
+ - AW1
49
+ - UH1
50
+ - TH
51
+ - ER1
52
+ - CH
53
+ - '?'
54
+ - OW0
55
+ - OW2
56
+ - EH2
57
+ - EY2
58
+ - UW0
59
+ - IH2
60
+ - OY1
61
+ - AY2
62
+ - ZH
63
+ - AW2
64
+ - EH0
65
+ - IY2
66
+ - AA2
67
+ - AE0
68
+ - AH2
69
+ - AE2
70
+ - AO0
71
+ - AO2
72
+ - AY0
73
+ - UW2
74
+ - UH2
75
+ - AA0
76
+ - AW0
77
+ - EY0
78
+ - '!'
79
+ - UH0
80
+ - ER2
81
+ - OY2
82
+ - ''''
83
+ - OY0
84
+ - <sos/eos>
85
+ tokenizer:
86
+ g2p_type: g2p_en_no_space
87
+ token_type: phn
88
+ tts_model:
89
+ model_path: espnet/kan-bayashi_vctk_tts_train_multi_spk_vits_raw_phn_tacotron_g2p_en_no_space_train.total_count.ave/full/vits.onnx
90
+ model_type: VITS
91
+ vocoder:
92
+ vocoder_type: not_used
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e17774c62e9472bcaea260d6aa3c89570972c1dec34847893696412b58940d2
3
+ size 145025514