kevinwang676 commited on
Commit
e0509fb
1 Parent(s): a9c4ef5

Delete .ipynb_checkpoints

Browse files
.ipynb_checkpoints/demo_part1-checkpoint.ipynb DELETED
@@ -1,236 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "id": "b6ee1ede",
6
- "metadata": {},
7
- "source": [
8
- "## Voice Style Control Demo"
9
- ]
10
- },
11
- {
12
- "cell_type": "code",
13
- "execution_count": null,
14
- "id": "b7f043ee",
15
- "metadata": {},
16
- "outputs": [],
17
- "source": [
18
- "import os\n",
19
- "import torch\n",
20
- "import se_extractor\n",
21
- "from api import BaseSpeakerTTS, ToneColorConverter"
22
- ]
23
- },
24
- {
25
- "cell_type": "markdown",
26
- "id": "15116b59",
27
- "metadata": {},
28
- "source": [
29
- "### Initialization"
30
- ]
31
- },
32
- {
33
- "cell_type": "code",
34
- "execution_count": null,
35
- "id": "aacad912",
36
- "metadata": {},
37
- "outputs": [],
38
- "source": [
39
- "ckpt_base = 'checkpoints/base_speakers/EN'\n",
40
- "ckpt_converter = 'checkpoints/converter'\n",
41
- "device = 'cuda:0'\n",
42
- "output_dir = 'outputs'\n",
43
- "\n",
44
- "base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)\n",
45
- "base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')\n",
46
- "\n",
47
- "tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)\n",
48
- "tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')\n",
49
- "\n",
50
- "os.makedirs(output_dir, exist_ok=True)"
51
- ]
52
- },
53
- {
54
- "cell_type": "markdown",
55
- "id": "7f67740c",
56
- "metadata": {},
57
- "source": [
58
- "### Obtain Tone Color Embedding"
59
- ]
60
- },
61
- {
62
- "cell_type": "markdown",
63
- "id": "f8add279",
64
- "metadata": {},
65
- "source": [
66
- "The `source_se` is the tone color embedding of the base speaker. \n",
67
- "It is an average of multiple sentences generated by the base speaker. We directly provide the result here but\n",
68
- "the readers feel free to extract `source_se` by themselves."
69
- ]
70
- },
71
- {
72
- "cell_type": "code",
73
- "execution_count": null,
74
- "id": "63ff6273",
75
- "metadata": {},
76
- "outputs": [],
77
- "source": [
78
- "source_se = torch.load(f'{ckpt_base}/en_default_se.pth').to(device)"
79
- ]
80
- },
81
- {
82
- "cell_type": "markdown",
83
- "id": "4f71fcc3",
84
- "metadata": {},
85
- "source": [
86
- "The `reference_speaker.mp3` below points to the short audio clip of the reference whose voice we want to clone. We provide an example here. If you use your own reference speakers, please **make sure each speaker has a unique filename.** The `se_extractor` will save the `targeted_se` using the filename of the audio and **will not automatically overwrite.**"
87
- ]
88
- },
89
- {
90
- "cell_type": "code",
91
- "execution_count": null,
92
- "id": "55105eae",
93
- "metadata": {},
94
- "outputs": [],
95
- "source": [
96
- "reference_speaker = 'resources/example_reference.mp3'\n",
97
- "target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, target_dir='processed', vad=True)"
98
- ]
99
- },
100
- {
101
- "cell_type": "markdown",
102
- "id": "a40284aa",
103
- "metadata": {},
104
- "source": [
105
- "### Inference"
106
- ]
107
- },
108
- {
109
- "cell_type": "code",
110
- "execution_count": null,
111
- "id": "73dc1259",
112
- "metadata": {},
113
- "outputs": [],
114
- "source": [
115
- "save_path = f'{output_dir}/output_en_default.wav'\n",
116
- "\n",
117
- "# Run the base speaker tts\n",
118
- "text = \"This audio is generated by OpenVoice.\"\n",
119
- "src_path = f'{output_dir}/tmp.wav'\n",
120
- "base_speaker_tts.tts(text, src_path, speaker='default', language='English', speed=1.0)\n",
121
- "\n",
122
- "# Run the tone color converter\n",
123
- "encode_message = \"@MyShell\"\n",
124
- "tone_color_converter.convert(\n",
125
- " audio_src_path=src_path, \n",
126
- " src_se=source_se, \n",
127
- " tgt_se=target_se, \n",
128
- " output_path=save_path,\n",
129
- " message=encode_message)"
130
- ]
131
- },
132
- {
133
- "cell_type": "markdown",
134
- "id": "6e3ea28a",
135
- "metadata": {},
136
- "source": [
137
- "**Try with different styles and speed.** The style can be controlled by the `speaker` parameter in the `base_speaker_tts.tts` method. Available choices: friendly, cheerful, excited, sad, angry, terrified, shouting, whispering. Note that the tone color embedding need to be updated. The speed can be controlled by the `speed` parameter. Let's try whispering with speed 0.9."
138
- ]
139
- },
140
- {
141
- "cell_type": "code",
142
- "execution_count": null,
143
- "id": "fd022d38",
144
- "metadata": {},
145
- "outputs": [],
146
- "source": [
147
- "source_se = torch.load(f'{ckpt_base}/en_style_se.pth').to(device)\n",
148
- "save_path = f'{output_dir}/output_whispering.wav'\n",
149
- "\n",
150
- "# Run the base speaker tts\n",
151
- "text = \"This audio is generated by OpenVoice with a half-performance model.\"\n",
152
- "src_path = f'{output_dir}/tmp.wav'\n",
153
- "base_speaker_tts.tts(text, src_path, speaker='whispering', language='English', speed=0.9)\n",
154
- "\n",
155
- "# Run the tone color converter\n",
156
- "encode_message = \"@MyShell\"\n",
157
- "tone_color_converter.convert(\n",
158
- " audio_src_path=src_path, \n",
159
- " src_se=source_se, \n",
160
- " tgt_se=target_se, \n",
161
- " output_path=save_path,\n",
162
- " message=encode_message)"
163
- ]
164
- },
165
- {
166
- "cell_type": "markdown",
167
- "id": "5fcfc70b",
168
- "metadata": {},
169
- "source": [
170
- "**Try with different languages.** OpenVoice can achieve multi-lingual voice cloning by simply replace the base speaker. We provide an example with a Chinese base speaker here and we encourage the readers to try `demo_part2.ipynb` for a detailed demo."
171
- ]
172
- },
173
- {
174
- "cell_type": "code",
175
- "execution_count": null,
176
- "id": "a71d1387",
177
- "metadata": {},
178
- "outputs": [],
179
- "source": [
180
- "\n",
181
- "ckpt_base = 'checkpoints/base_speakers/ZH'\n",
182
- "base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)\n",
183
- "base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')\n",
184
- "\n",
185
- "source_se = torch.load(f'{ckpt_base}/zh_default_se.pth').to(device)\n",
186
- "save_path = f'{output_dir}/output_chinese.wav'\n",
187
- "\n",
188
- "# Run the base speaker tts\n",
189
- "text = \"今天天气真好,我们一起出去吃饭吧。\"\n",
190
- "src_path = f'{output_dir}/tmp.wav'\n",
191
- "base_speaker_tts.tts(text, src_path, speaker='default', language='Chinese', speed=1.0)\n",
192
- "\n",
193
- "# Run the tone color converter\n",
194
- "encode_message = \"@MyShell\"\n",
195
- "tone_color_converter.convert(\n",
196
- " audio_src_path=src_path, \n",
197
- " src_se=source_se, \n",
198
- " tgt_se=target_se, \n",
199
- " output_path=save_path,\n",
200
- " message=encode_message)"
201
- ]
202
- },
203
- {
204
- "cell_type": "markdown",
205
- "id": "8e513094",
206
- "metadata": {},
207
- "source": [
208
- "**Tech for good.** For people who will deploy OpenVoice for public usage: We offer you the option to add watermark to avoid potential misuse. Please see the ToneColorConverter class. **MyShell reserves the ability to detect whether an audio is generated by OpenVoice**, no matter whether the watermark is added or not."
209
- ]
210
- }
211
- ],
212
- "metadata": {
213
- "interpreter": {
214
- "hash": "9d70c38e1c0b038dbdffdaa4f8bfa1f6767c43760905c87a9fbe7800d18c6c35"
215
- },
216
- "kernelspec": {
217
- "display_name": "Python 3.9.18 ('openvoice')",
218
- "language": "python",
219
- "name": "python3"
220
- },
221
- "language_info": {
222
- "codemirror_mode": {
223
- "name": "ipython",
224
- "version": 3
225
- },
226
- "file_extension": ".py",
227
- "mimetype": "text/x-python",
228
- "name": "python",
229
- "nbconvert_exporter": "python",
230
- "pygments_lexer": "ipython3",
231
- "version": "3.9.18"
232
- }
233
- },
234
- "nbformat": 4,
235
- "nbformat_minor": 5
236
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.ipynb_checkpoints/requirements-checkpoint.txt DELETED
@@ -1,14 +0,0 @@
1
- librosa==0.9.1
2
- faster-whisper==0.9.0
3
- pydub==0.25.1
4
- wavmark==0.0.2
5
- numpy==1.22.0
6
- eng_to_ipa==0.0.2
7
- inflect==7.0.0
8
- unidecode==1.3.7
9
- whisper-timestamped==1.14.2
10
- openai
11
- python-dotenv
12
- pypinyin
13
- jieba
14
- cn2an