ybbwcwaps commited on
Commit
d036110
1 Parent(s): c578da5

some torchvgg

Browse files
FakeVD/Models/torchvggish/LICENSE ADDED
@@ -0,0 +1,250 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright 2020 Harri Taylor. All rights reserved.
2
+
3
+ Apache License
4
+ Version 2.0, January 2004
5
+ http://www.apache.org/licenses/
6
+
7
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
8
+
9
+ 1. Definitions.
10
+
11
+ "License" shall mean the terms and conditions for use, reproduction,
12
+ and distribution as defined by Sections 1 through 9 of this document.
13
+
14
+ "Licensor" shall mean the copyright owner or entity authorized by
15
+ the copyright owner that is granting the License.
16
+
17
+ "Legal Entity" shall mean the union of the acting entity and all
18
+ other entities that control, are controlled by, or are under common
19
+ control with that entity. For the purposes of this definition,
20
+ "control" means (i) the power, direct or indirect, to cause the
21
+ direction or management of such entity, whether by contract or
22
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
23
+ outstanding shares, or (iii) beneficial ownership of such entity.
24
+
25
+ "You" (or "Your") shall mean an individual or Legal Entity
26
+ exercising permissions granted by this License.
27
+
28
+ "Source" form shall mean the preferred form for making modifications,
29
+ including but not limited to software source code, documentation
30
+ source, and configuration files.
31
+
32
+ "Object" form shall mean any form resulting from mechanical
33
+ transformation or translation of a Source form, including but
34
+ not limited to compiled object code, generated documentation,
35
+ and conversions to other media types.
36
+
37
+ "Work" shall mean the work of authorship, whether in Source or
38
+ Object form, made available under the License, as indicated by a
39
+ copyright notice that is included in or attached to the work
40
+ (an example is provided in the Appendix below).
41
+
42
+ "Derivative Works" shall mean any work, whether in Source or Object
43
+ form, that is based on (or derived from) the Work and for which the
44
+ editorial revisions, annotations, elaborations, or other modifications
45
+ represent, as a whole, an original work of authorship. For the purposes
46
+ of this License, Derivative Works shall not include works that remain
47
+ separable from, or merely link (or bind by name) to the interfaces of,
48
+ the Work and Derivative Works thereof.
49
+
50
+ "Contribution" shall mean any work of authorship, including
51
+ the original version of the Work and any modifications or additions
52
+ to that Work or Derivative Works thereof, that is intentionally
53
+ submitted to Licensor for inclusion in the Work by the copyright owner
54
+ or by an individual or Legal Entity authorized to submit on behalf of
55
+ the copyright owner. For the purposes of this definition, "submitted"
56
+ means any form of electronic, verbal, or written communication sent
57
+ to the Licensor or its representatives, including but not limited to
58
+ communication on electronic mailing lists, source code control systems,
59
+ and issue tracking systems that are managed by, or on behalf of, the
60
+ Licensor for the purpose of discussing and improving the Work, but
61
+ excluding communication that is conspicuously marked or otherwise
62
+ designated in writing by the copyright owner as "Not a Contribution."
63
+
64
+ "Contributor" shall mean Licensor and any individual or Legal Entity
65
+ on behalf of whom a Contribution has been received by Licensor and
66
+ subsequently incorporated within the Work.
67
+
68
+ 2. Grant of Copyright License. Subject to the terms and conditions of
69
+ this License, each Contributor hereby grants to You a perpetual,
70
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
71
+ copyright license to reproduce, prepare Derivative Works of,
72
+ publicly display, publicly perform, sublicense, and distribute the
73
+ Work and such Derivative Works in Source or Object form.
74
+
75
+ 3. Grant of Patent License. Subject to the terms and conditions of
76
+ this License, each Contributor hereby grants to You a perpetual,
77
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
78
+ (except as stated in this section) patent license to make, have made,
79
+ use, offer to sell, sell, import, and otherwise transfer the Work,
80
+ where such license applies only to those patent claims licensable
81
+ by such Contributor that are necessarily infringed by their
82
+ Contribution(s) alone or by combination of their Contribution(s)
83
+ with the Work to which such Contribution(s) was submitted. If You
84
+ institute patent litigation against any entity (including a
85
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
86
+ or a Contribution incorporated within the Work constitutes direct
87
+ or contributory patent infringement, then any patent licenses
88
+ granted to You under this License for that Work shall terminate
89
+ as of the date such litigation is filed.
90
+
91
+ 4. Redistribution. You may reproduce and distribute copies of the
92
+ Work or Derivative Works thereof in any medium, with or without
93
+ modifications, and in Source or Object form, provided that You
94
+ meet the following conditions:
95
+
96
+ (a) You must give any other recipients of the Work or
97
+ Derivative Works a copy of this License; and
98
+
99
+ (b) You must cause any modified files to carry prominent notices
100
+ stating that You changed the files; and
101
+
102
+ (c) You must retain, in the Source form of any Derivative Works
103
+ that You distribute, all copyright, patent, trademark, and
104
+ attribution notices from the Source form of the Work,
105
+ excluding those notices that do not pertain to any part of
106
+ the Derivative Works; and
107
+
108
+ (d) If the Work includes a "NOTICE" text file as part of its
109
+ distribution, then any Derivative Works that You distribute must
110
+ include a readable copy of the attribution notices contained
111
+ within such NOTICE file, excluding those notices that do not
112
+ pertain to any part of the Derivative Works, in at least one
113
+ of the following places: within a NOTICE text file distributed
114
+ as part of the Derivative Works; within the Source form or
115
+ documentation, if provided along with the Derivative Works; or,
116
+ within a display generated by the Derivative Works, if and
117
+ wherever such third-party notices normally appear. The contents
118
+ of the NOTICE file are for informational purposes only and
119
+ do not modify the License. You may add Your own attribution
120
+ notices within Derivative Works that You distribute, alongside
121
+ or as an addendum to the NOTICE text from the Work, provided
122
+ that such additional attribution notices cannot be construed
123
+ as modifying the License.
124
+
125
+ You may add Your own copyright statement to Your modifications and
126
+ may provide additional or different license terms and conditions
127
+ for use, reproduction, or distribution of Your modifications, or
128
+ for any such Derivative Works as a whole, provided Your use,
129
+ reproduction, and distribution of the Work otherwise complies with
130
+ the conditions stated in this License.
131
+
132
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
133
+ any Contribution intentionally submitted for inclusion in the Work
134
+ by You to the Licensor shall be under the terms and conditions of
135
+ this License, without any additional terms or conditions.
136
+ Notwithstanding the above, nothing herein shall supersede or modify
137
+ the terms of any separate license agreement you may have executed
138
+ with Licensor regarding such Contributions.
139
+
140
+ 6. Trademarks. This License does not grant permission to use the trade
141
+ names, trademarks, service marks, or product names of the Licensor,
142
+ except as required for reasonable and customary use in describing the
143
+ origin of the Work and reproducing the content of the NOTICE file.
144
+
145
+ 7. Disclaimer of Warranty. Unless required by applicable law or
146
+ agreed to in writing, Licensor provides the Work (and each
147
+ Contributor provides its Contributions) on an "AS IS" BASIS,
148
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
149
+ implied, including, without limitation, any warranties or conditions
150
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
151
+ PARTICULAR PURPOSE. You are solely responsible for determining the
152
+ appropriateness of using or redistributing the Work and assume any
153
+ risks associated with Your exercise of permissions under this License.
154
+
155
+ 8. Limitation of Liability. In no event and under no legal theory,
156
+ whether in tort (including negligence), contract, or otherwise,
157
+ unless required by applicable law (such as deliberate and grossly
158
+ negligent acts) or agreed to in writing, shall any Contributor be
159
+ liable to You for damages, including any direct, indirect, special,
160
+ incidental, or consequential damages of any character arising as a
161
+ result of this License or out of the use or inability to use the
162
+ Work (including but not limited to damages for loss of goodwill,
163
+ work stoppage, computer failure or malfunction, or any and all
164
+ other commercial damages or losses), even if such Contributor
165
+ has been advised of the possibility of such damages.
166
+
167
+ 9. Accepting Warranty or Additional Liability. While redistributing
168
+ the Work or Derivative Works thereof, You may choose to offer,
169
+ and charge a fee for, acceptance of support, warranty, indemnity,
170
+ or other liability obligations and/or rights consistent with this
171
+ License. However, in accepting such obligations, You may act only
172
+ on Your own behalf and on Your sole responsibility, not on behalf
173
+ of any other Contributor, and only if You agree to indemnify,
174
+ defend, and hold each Contributor harmless for any liability
175
+ incurred by, or claims asserted against, such Contributor by reason
176
+ of your accepting any such warranty or additional liability.
177
+
178
+ END OF TERMS AND CONDITIONS
179
+
180
+ APPENDIX: How to apply the Apache License to your work.
181
+
182
+ To apply the Apache License to your work, attach the following
183
+ boilerplate notice, with the fields enclosed by brackets "[]"
184
+ replaced with your own identifying information. (Don't include
185
+ the brackets!) The text should be enclosed in the appropriate
186
+ comment syntax for the file format. We also recommend that a
187
+ file or class name and description of purpose be included on the
188
+ same "printed page" as the copyright notice for easier
189
+ identification within third-party archives.
190
+
191
+ Copyright [yyyy] [name of copyright owner]
192
+
193
+ Licensed under the Apache License, Version 2.0 (the "License");
194
+ you may not use this file except in compliance with the License.
195
+ You may obtain a copy of the License at
196
+
197
+ http://www.apache.org/licenses/LICENSE-2.0
198
+
199
+ Unless required by applicable law or agreed to in writing, software
200
+ distributed under the License is distributed on an "AS IS" BASIS,
201
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
202
+ See the License for the specific language governing permissions and
203
+ limitations under the License.
204
+
205
+ From PyTorch:
206
+
207
+ Copyright (c) 2016- Facebook, Inc (Adam Paszke)
208
+ Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
209
+ Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
210
+ Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
211
+ Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
212
+ Copyright (c) 2011-2013 NYU (Clement Farabet)
213
+ Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
214
+ Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
215
+ Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
216
+
217
+ From Caffe2:
218
+
219
+ Copyright (c) 2016-present, Facebook Inc. All rights reserved.
220
+
221
+ All contributions by Facebook:
222
+ Copyright (c) 2016 Facebook Inc.
223
+
224
+ All contributions by Google:
225
+ Copyright (c) 2015 Google Inc.
226
+ All rights reserved.
227
+
228
+ All contributions by Yangqing Jia:
229
+ Copyright (c) 2015 Yangqing Jia
230
+ All rights reserved.
231
+
232
+ All contributions from Caffe:
233
+ Copyright(c) 2013, 2014, 2015, the respective contributors
234
+ All rights reserved.
235
+
236
+ From Tensorflow:
237
+ Copyright(c) 2019 The TensorFlow Authors. All rights reserved.
238
+
239
+ All other contributions:
240
+ Copyright(c) 2015, 2016 the respective contributors
241
+ All rights reserved.
242
+
243
+ Caffe2 uses a copyright model similar to Caffe: each contributor holds
244
+ copyright over their contributions to Caffe2. The project versioning records
245
+ all such contribution and copyright details. If a contributor wants to further
246
+ mark their specific copyright on a particular contribution, they should
247
+ indicate their copyright solely in the commit message of the change when it is
248
+ committed.
249
+
250
+ All rights reserved.
FakeVD/Models/torchvggish/README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **Looking for maintainers** - I no longer have the capacity to maintain this project. If you would like to take over maintenence, please get in touch. I will either forward to your fork, or add you as a maintainer for the project. Thanks.
2
+
3
+ ---
4
+
5
+
6
+ # VGGish
7
+ A `torch`-compatible port of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset)<sup>[1]</sup>,
8
+ a feature embedding frontend for audio classification models. The weights are ported directly from the tensorflow model, so embeddings created using `torchvggish` will be identical.
9
+
10
+
11
+ ## Usage
12
+
13
+ ```python
14
+ import torch
15
+
16
+ model = torch.hub.load('harritaylor/torchvggish', 'vggish')
17
+ model.eval()
18
+
19
+ # Download an example audio file
20
+ import urllib
21
+ url, filename = ("http://soundbible.com/grab.php?id=1698&type=wav", "bus_chatter.wav")
22
+ try: urllib.URLopener().retrieve(url, filename)
23
+ except: urllib.request.urlretrieve(url, filename)
24
+
25
+ model.forward(filename)
26
+ ```
27
+
28
+ <hr>
29
+ [1] S. Hershey et al., ‘CNN Architectures for Large-Scale Audio Classification’,\
30
+ in International Conference on Acoustics, Speech and Signal Processing (ICASSP),2017\
31
+ Available: https://arxiv.org/abs/1609.09430, https://ai.google/research/pubs/pub45611
32
+
33
+
FakeVD/Models/torchvggish/docs/_example_download_weights.ipynb ADDED
@@ -0,0 +1,251 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {
7
+ "pycharm": {
8
+ "is_executing": false
9
+ }
10
+ },
11
+ "outputs": [
12
+ {
13
+ "name": "stdout",
14
+ "text": [
15
+ "A audioset/README.md\r\nA audioset/mel_features.py\r\nA audioset/vggish_inference_demo.py\r\nA audioset/vggish_input.py\r\nA audioset/vggish_params.py\r\nA audioset/vggish_postprocess.py\r\nA audioset/vggish_slim.py\r\nA audioset/vggish_smoke_test.py\r\n",
16
+ "A audioset/vggish_train_demo.py\r\n",
17
+ "Checked out revision 9495.\r\n",
18
+ "Requirement already satisfied: numpy in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (1.16.3)\r\nRequirement already satisfied: scipy in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (1.2.1)\r\n",
19
+ "Requirement already satisfied: resampy in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (0.2.1)\r\nRequirement already satisfied: tensorflow in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (1.13.1)\r\nRequirement already satisfied: six in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (1.12.0)\r\nRequirement already satisfied: soundfile in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (0.10.2)\r\nRequirement already satisfied: numpy>=1.10 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from resampy) (1.16.3)\r\nRequirement already satisfied: scipy>=0.13 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from resampy) (1.2.1)\r\n",
20
+ "Requirement already satisfied: numba>=0.32 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from resampy) (0.43.1)\r\nRequirement already satisfied: gast>=0.2.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (0.2.2)\r\nRequirement already satisfied: termcolor>=1.1.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.1.0)\r\nRequirement already satisfied: tensorflow-estimator<1.14.0rc0,>=1.13.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.13.0)\r\nRequirement already satisfied: wheel>=0.26 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (0.33.1)\r\nRequirement already satisfied: grpcio>=1.8.6 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.20.1)\r\nRequirement already satisfied: astor>=0.6.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (0.7.1)\r\nRequirement already satisfied: absl-py>=0.1.6 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (0.7.1)\r\nRequirement already satisfied: protobuf>=3.6.1 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (3.7.1)\r\nRequirement already satisfied: keras-applications>=1.0.6 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.0.7)\r\nRequirement already satisfied: keras-preprocessing>=1.0.5 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.0.9)\r\nRequirement already satisfied: tensorboard<1.14.0,>=1.13.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.13.1)\r\n",
21
+ "Requirement already satisfied: cffi>=1.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from soundfile) (1.12.3)\r\nRequirement already satisfied: llvmlite>=0.28.0dev0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from numba>=0.32->resampy) (0.28.0)\r\nRequirement already satisfied: mock>=2.0.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow) (3.0.5)\r\nRequirement already satisfied: setuptools in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from protobuf>=3.6.1->tensorflow) (41.0.1)\r\nRequirement already satisfied: h5py in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from keras-applications>=1.0.6->tensorflow) (2.9.0)\r\nRequirement already satisfied: werkzeug>=0.11.15 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow) (0.15.2)\r\nRequirement already satisfied: markdown>=2.6.8 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow) (3.1)\r\nRequirement already satisfied: pycparser in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from cffi>=1.0->soundfile) (2.19)\r\n",
22
+ " % Total % Received % Xferd Average Speed Time Time Time Current\r\n Dload Upload Total Spent Left Speed\r\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
23
+ "\r 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0",
24
+ "\r 0 277M 0 3870 0 0 2314 0 34:56:43 0:00:01 34:56:42 2313",
25
+ "\r 1 277M 1 4096k 0 0 1413k 0 0:03:21 0:00:02 0:03:19 1413k",
26
+ "\r 4 277M 4 13.1M 0 0 4110k 0 0:01:09 0:00:03 0:01:06 4110k",
27
+ "\r 13 277M 13 37.2M 0 0 8940k 0 0:00:31 0:00:04 0:00:27 8939k",
28
+ "\r 25 277M 25 71.5M 0 0 13.5M 0 0:00:20 0:00:05 0:00:15 17.7M",
29
+ "\r 34 277M 34 96.8M 0 0 15.4M 0 0:00:17 0:00:06 0:00:11 21.0M",
30
+ "\r 43 277M 43 120M 0 0 15.8M 0 0:00:17 0:00:07 0:00:10 24.6M",
31
+ "\r 48 277M 48 136M 0 0 16.4M 0 0:00:16 0:00:08 0:00:08 24.4M",
32
+ "\r 60 277M 60 166M 0 0 17.9M 0 0:00:15 0:00:09 0:00:06 25.8M",
33
+ "\r 71 277M 71 197M 0 0 19.2M 0 0:00:14 0:00:10 0:00:04 25.1M",
34
+ "\r 76 277M 76 212M 0 0 18.8M 0 0:00:14 0:00:11 0:00:03 23.0M",
35
+ "\r 83 277M 83 232M 0 0 17.8M 0 0:00:15 0:00:12 0:00:03 20.8M",
36
+ "\r 86 277M 86 ",
37
+ " 240M 0 0 18.0M 0 0:00:15 0:00:13 0:00:02 20.8M",
38
+ "\r 95 277M 95 264M 0 0 18.2M 0 0:00:15 0:00:14 0:00:01 18.6M",
39
+ "\r100 277M 100 277M 0 0 18.6M 0 0:00:14 0:00:14 --:--:-- 17.3M\r\n",
40
+ " % Total % Received % Xferd Average Speed Time Time Time Current\r\n Dload Upload Total Spent Left Speed\r\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
41
+ "\r 1 73020 1 1284 0 0 3139 0 0:00:23 --:--:-- 0:00:23 3139\r100 73020 100 73020 0 0 163k 0 --:--:-- --:--:-- --:--:-- 163k\r\n"
42
+ ],
43
+ "output_type": "stream"
44
+ }
45
+ ],
46
+ "source": [
47
+ "\"\"\"\n",
48
+ "This notebook demonstrates how to replicate converting tensorflow\n",
49
+ "weights from tensorflow's vggish to torchvggish\n",
50
+ "\"\"\" \n",
51
+ "\n",
52
+ "# Download the audioset directory using subversion\n",
53
+ "# !apt-get -qq install subversion # uncomment if on linux\n",
54
+ "!svn checkout https://github.com/tensorflow/models/trunk/research/audioset\n",
55
+ "\n",
56
+ "# Download audioset requirements\n",
57
+ "!pip install numpy scipy\n",
58
+ "!pip install resampy tensorflow six soundfile\n",
59
+ "\n",
60
+ "# grab the VGGish model checkpoints & PCA params\n",
61
+ "!curl -O https://storage.googleapis.com/audioset/vggish_model.ckpt\n",
62
+ "!curl -O https://storage.googleapis.com/audioset/vggish_pca_params.npz"
63
+ ]
64
+ },
65
+ {
66
+ "cell_type": "code",
67
+ "execution_count": 2,
68
+ "metadata": {
69
+ "pycharm": {
70
+ "is_executing": false
71
+ }
72
+ },
73
+ "outputs": [
74
+ {
75
+ "name": "stdout",
76
+ "text": [
77
+ "\nWARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.\nFor more information, please see:\n * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n * https://github.com/tensorflow/addons\nIf you depend on functionality not listed there, please file an issue.\n\n\nTesting your install of VGGish\n\n",
78
+ "Log Mel Spectrogram example: [[-4.47297436 -4.29457354 -4.14940631 ... -3.9747003 -3.94774997\n -3.78687669]\n [-4.48589533 -4.28825497 -4.139964 ... -3.98368686 -3.94976505\n -3.7951698 ]\n [-4.46158065 -4.29329706 -4.14905953 ... -3.96442484 -3.94895483\n -3.78619839]\n ...\n [-4.46152626 -4.29365061 -4.14848608 ... -3.96638113 -3.95057575\n -3.78538167]\n [-4.46152595 -4.2936572 -4.14848104 ... -3.96640507 -3.95059567\n -3.78537143]\n [-4.46152565 -4.29366386 -4.14847603 ... -3.96642906 -3.95061564\n -3.78536116]]\nWARNING:tensorflow:From /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\nInstructions for updating:\nColocations handled automatically by placer.\n",
79
+ "WARNING:tensorflow:From /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\nInstructions for updating:\nUse keras.layers.flatten instead.\n",
80
+ "WARNING:tensorflow:From /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\nInstructions for updating:\nUse standard file APIs to check for files with this prefix.\n",
81
+ "INFO:tensorflow:Restoring parameters from vggish_model.ckpt\n",
82
+ "VGGish embedding: [0. 0. 0. 0. 0. 0.\n 0. 0.16137293 0. 0. 0. 0.\n 0. 0. 0. 0. 0. 0.80695796\n 0. 0. 0. 0. 0. 0.\n 0. 0.36792755 0.03582409 0. 0. 0.\n 0. 0.38027024 0.1375593 0.9174708 0.8065634 0.\n 0. 0. 0. 0.04036281 0.7076243 0.\n 0.497839 0.24081808 0.21565434 0.88492286 1.19568 0.6706197\n 0.20779458 0.01639861 0.17471863 0. 0. 0.25100806\n 0. 0. 0.14607918 0. 0.39887053 0.30542105\n 0.12896761 0. 0. 0. 0. 0.\n 0.5385133 0. 0. 0.04941072 0.42527416 0.18537284\n 0. 0. 0.14753515 0. 0. 0.69933873\n 0.45541188 0.05174822 0. 0.01992539 0. 0.\n 0.5181578 0.565576 0.6587975 0. 0. 0.41056332\n 0. 0. 0. 0.25765193 0.23232114 0.24026448\n 0. 0. 0. 0. 0. 0.26523757\n 0. 0.48460823 0. 0. 0.19325787 0.\n 0.20123348 0. 0.03368621 0. 0. 0.\n 0. 0.17836356 0.024749 0.06889972 0. 0.\n 0. 0.08246281 0. 0. 0. 0.\n 0. 0. ]\nPostprocessed VGGish embedding: [169 10 154 127 191 66 124 69 157 232 142 21 128 131 43 3 33 111\n 198 153 76 255 194 60 71 179 146 131 167 60 79 76 192 84 102 160\n 23 91 173 13 149 186 115 202 252 163 84 145 107 255 5 198 81 0\n 203 110 35 104 101 131 255 0 0 158 136 74 115 152 77 154 54 151\n 82 243 57 116 165 153 85 181 152 0 255 122 29 255 46 105 110 43\n 0 90 58 13 255 108 96 255 84 121 255 75 176 111 176 64 83 231\n 255 82 255 94 81 144 99 173 255 0 0 158 31 230 112 255 0 255\n 20 255]\n\nLooks Good To Me!\n\n"
83
+ ],
84
+ "output_type": "stream"
85
+ }
86
+ ],
87
+ "source": [
88
+ "# Test install\n",
89
+ "!mv audioset/* .\n",
90
+ "from vggish_smoke_test import *"
91
+ ]
92
+ },
93
+ {
94
+ "cell_type": "code",
95
+ "execution_count": 4,
96
+ "metadata": {
97
+ "pycharm": {
98
+ "is_executing": false
99
+ }
100
+ },
101
+ "outputs": [
102
+ {
103
+ "name": "stdout",
104
+ "text": [
105
+ "INFO:tensorflow:Restoring parameters from vggish_model.ckpt\n",
106
+ "vggish/conv1/weights:0\n\t(3, 3, 1, 64)\nvggish/conv1/biases:0\n\t(64,)\nvggish/conv2/weights:0\n\t(3, 3, 64, 128)\nvggish/conv2/biases:0\n\t(128,)\nvggish/conv3/conv3_1/weights:0\n\t(3, 3, 128, 256)\nvggish/conv3/conv3_1/biases:0\n\t(256,)\nvggish/conv3/conv3_2/weights:0\n\t(3, 3, 256, 256)\nvggish/conv3/conv3_2/biases:0\n\t(256,)\nvggish/conv4/conv4_1/weights:0\n\t(3, 3, 256, 512)\nvggish/conv4/conv4_1/biases:0\n\t(512,)\nvggish/conv4/conv4_2/weights:0\n\t(3, 3, 512, 512)\nvggish/conv4/conv4_2/biases:0\n\t(512,)\nvggish/fc1/fc1_1/weights:0\n\t(12288, 4096)\nvggish/fc1/fc1_1/biases:0\n\t(4096,)\nvggish/fc1/fc1_2/weights:0\n\t(4096, 4096)\nvggish/fc1/fc1_2/biases:0\n\t(4096,)\nvggish/fc2/weights:0\n\t(4096, 128)\nvggish/fc2/biases:0\n\t(128,)\nvalues written to vggish_dict\n"
107
+ ],
108
+ "output_type": "stream"
109
+ }
110
+ ],
111
+ "source": [
112
+ "import tensorflow as tf\n",
113
+ "import vggish_slim\n",
114
+ "\n",
115
+ "vggish_dict = {}\n",
116
+ "# load the model and get info \n",
117
+ "with tf.Graph().as_default(), tf.Session() as sess:\n",
118
+ " vggish_slim.define_vggish_slim(training=True)\n",
119
+ " vggish_slim.load_vggish_slim_checkpoint(sess,\"vggish_model.ckpt\")\n",
120
+ " \n",
121
+ " tvars = tf.trainable_variables()\n",
122
+ " tvars_vals = sess.run(tvars)\n",
123
+ "\n",
124
+ " for var, val in zip(tvars, tvars_vals):\n",
125
+ " print(\"%s\" % (var.name))\n",
126
+ " print(\"\\t\" + str(var.shape))\n",
127
+ " vggish_dict[var.name] = val\n",
128
+ " print(\"values written to vggish_dict\")"
129
+ ]
130
+ },
131
+ {
132
+ "cell_type": "code",
133
+ "execution_count": 14,
134
+ "metadata": {
135
+ "pycharm": {
136
+ "is_executing": false,
137
+ "name": "#%%\n"
138
+ }
139
+ },
140
+ "outputs": [],
141
+ "source": [
142
+ "# Define torch model for vggish\n",
143
+ "\n",
144
+ "import torch\n",
145
+ "import torch.nn as nn\n",
146
+ "import numpy as np\n",
147
+ "\n",
148
+ "# From vggish_slim:\n",
149
+ "# The VGG stack of alternating convolutions and max-pools.\n",
150
+ "# net = slim.conv2d(net, 64, scope='conv1')\n",
151
+ "# net = slim.max_pool2d(net, scope='pool1')\n",
152
+ "# net = slim.conv2d(net, 128, scope='conv2')\n",
153
+ "# net = slim.max_pool2d(net, scope='pool2')\n",
154
+ "# net = slim.repeat(net, 2, slim.conv2d, 256, scope='conv3')\n",
155
+ "# net = slim.max_pool2d(net, scope='pool3')\n",
156
+ "# net = slim.repeat(net, 2, slim.conv2d, 512, scope='conv4')\n",
157
+ "# net = slim.max_pool2d(net, scope='pool4')\n",
158
+ "# # Flatten before entering fully-connected layers\n",
159
+ "# net = slim.flatten(net)\n",
160
+ "# net = slim.repeat(net, 2, slim.fully_connected, 4096, scope='fc1')\n",
161
+ "# # The embedding layer.\n",
162
+ "# net = slim.fully_connected(net, params.EMBEDDING_SIZE, scope='fc2')\n",
163
+ "\n",
164
+ "vggish_list = list(vggish_dict.values())\n",
165
+ "def param_generator():\n",
166
+ " param = vggish_list.pop(0)\n",
167
+ " transposed = np.transpose(param)\n",
168
+ " to_torch = torch.from_numpy(transposed)\n",
169
+ " result = torch.nn.Parameter(to_torch)\n",
170
+ " yield result\n",
171
+ "\n",
172
+ "class VGGish(nn.Module):\n",
173
+ " def __init__(self):\n",
174
+ " super(VGGish, self).__init__()\n",
175
+ " self.features = nn.Sequential(\n",
176
+ " nn.Conv2d(1, 64, 3, 1, 1),\n",
177
+ " nn.ReLU(inplace=True),\n",
178
+ " nn.MaxPool2d(2, 2),\n",
179
+ " nn.Conv2d(64, 128, 3, 1, 1),\n",
180
+ " nn.ReLU(inplace=True),\n",
181
+ " nn.MaxPool2d(2, 2),\n",
182
+ " nn.Conv2d(128, 256, 3, 1, 1),\n",
183
+ " nn.ReLU(inplace=True),\n",
184
+ " nn.Conv2d(256, 256, 3, 1, 1),\n",
185
+ " nn.ReLU(inplace=True),\n",
186
+ " nn.MaxPool2d(2, 2),\n",
187
+ " nn.Conv2d(256, 512, 3, 1, 1),\n",
188
+ " nn.ReLU(inplace=True),\n",
189
+ " nn.Conv2d(512, 512, 3, 1, 1),\n",
190
+ " nn.ReLU(inplace=True),\n",
191
+ " nn.MaxPool2d(2, 2))\n",
192
+ " self.embeddings = nn.Sequential(\n",
193
+ " nn.Linear(512*24, 4096),\n",
194
+ " nn.ReLU(inplace=True),\n",
195
+ " nn.Linear(4096, 4096),\n",
196
+ " nn.ReLU(inplace=True),\n",
197
+ " nn.Linear(4096, 128),\n",
198
+ " nn.ReLU(inplace=True))\n",
199
+ " \n",
200
+ " # extract weights from `vggish_list`\n",
201
+ " for seq in (self.features, self.embeddings):\n",
202
+ " for layer in seq:\n",
203
+ " if type(layer).__name__ != \"MaxPool2d\" and type(layer).__name__ != \"ReLU\":\n",
204
+ " layer.weight = next(param_generator())\n",
205
+ " layer.bias = next(param_generator())\n",
206
+ " \n",
207
+ " def forward(self, x):\n",
208
+ " x = self.features(x)\n",
209
+ " x = x.view(x.size(0),-1)\n",
210
+ " x = self.embeddings(x)\n",
211
+ " return x\n",
212
+ "\n",
213
+ "net = VGGish()\n",
214
+ "net.eval()\n",
215
+ "\n",
216
+ "# Save weights to disk\n",
217
+ "torch.save(net.state_dict(), \"./vggish.pth\")"
218
+ ]
219
+ }
220
+ ],
221
+ "metadata": {
222
+ "kernelspec": {
223
+ "display_name": "Python 3",
224
+ "language": "python",
225
+ "name": "python3"
226
+ },
227
+ "language_info": {
228
+ "codemirror_mode": {
229
+ "name": "ipython",
230
+ "version": 3
231
+ },
232
+ "file_extension": ".py",
233
+ "mimetype": "text/x-python",
234
+ "name": "python",
235
+ "nbconvert_exporter": "python",
236
+ "pygments_lexer": "ipython3",
237
+ "version": "3.7.1"
238
+ },
239
+ "pycharm": {
240
+ "stem_cell": {
241
+ "cell_type": "raw",
242
+ "source": [],
243
+ "metadata": {
244
+ "collapsed": false
245
+ }
246
+ }
247
+ }
248
+ },
249
+ "nbformat": 4,
250
+ "nbformat_minor": 1
251
+ }
FakeVD/Models/torchvggish/hubconf.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ dependencies = ['torch', 'numpy', 'resampy', 'soundfile']
2
+
3
+ from torchvggish.vggish import VGGish
4
+
5
+ model_urls = {
6
+ 'vggish': 'https://github.com/harritaylor/torchvggish/'
7
+ 'releases/download/v0.1/vggish-10086976.pth',
8
+ 'pca': 'https://github.com/harritaylor/torchvggish/'
9
+ 'releases/download/v0.1/vggish_pca_params-970ea276.pth'
10
+ }
11
+
12
+
13
+ def vggish(**kwargs):
14
+ model = VGGish(urls=model_urls, **kwargs)
15
+ return model
FakeVD/Models/torchvggish/torchvggish/mel_features.py ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2017 The TensorFlow Authors All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ # ==============================================================================
15
+
16
+ """Defines routines to compute mel spectrogram features from audio waveform."""
17
+
18
+ import numpy as np
19
+
20
+
21
+ def frame(data, window_length, hop_length):
22
+ """Convert array into a sequence of successive possibly overlapping frames.
23
+
24
+ An n-dimensional array of shape (num_samples, ...) is converted into an
25
+ (n+1)-D array of shape (num_frames, window_length, ...), where each frame
26
+ starts hop_length points after the preceding one.
27
+
28
+ This is accomplished using stride_tricks, so the original data is not
29
+ copied. However, there is no zero-padding, so any incomplete frames at the
30
+ end are not included.
31
+
32
+ Args:
33
+ data: np.array of dimension N >= 1.
34
+ window_length: Number of samples in each frame.
35
+ hop_length: Advance (in samples) between each window.
36
+
37
+ Returns:
38
+ (N+1)-D np.array with as many rows as there are complete frames that can be
39
+ extracted.
40
+ """
41
+ num_samples = data.shape[0]
42
+ num_frames = 1 + int(np.floor((num_samples - window_length) / hop_length))
43
+ shape = (num_frames, window_length) + data.shape[1:]
44
+ strides = (data.strides[0] * hop_length,) + data.strides
45
+ return np.lib.stride_tricks.as_strided(data, shape=shape, strides=strides)
46
+
47
+
48
+ def periodic_hann(window_length):
49
+ """Calculate a "periodic" Hann window.
50
+
51
+ The classic Hann window is defined as a raised cosine that starts and
52
+ ends on zero, and where every value appears twice, except the middle
53
+ point for an odd-length window. Matlab calls this a "symmetric" window
54
+ and np.hanning() returns it. However, for Fourier analysis, this
55
+ actually represents just over one cycle of a period N-1 cosine, and
56
+ thus is not compactly expressed on a length-N Fourier basis. Instead,
57
+ it's better to use a raised cosine that ends just before the final
58
+ zero value - i.e. a complete cycle of a period-N cosine. Matlab
59
+ calls this a "periodic" window. This routine calculates it.
60
+
61
+ Args:
62
+ window_length: The number of points in the returned window.
63
+
64
+ Returns:
65
+ A 1D np.array containing the periodic hann window.
66
+ """
67
+ return 0.5 - (0.5 * np.cos(2 * np.pi / window_length *
68
+ np.arange(window_length)))
69
+
70
+
71
+ def stft_magnitude(signal, fft_length,
72
+ hop_length=None,
73
+ window_length=None):
74
+ """Calculate the short-time Fourier transform magnitude.
75
+
76
+ Args:
77
+ signal: 1D np.array of the input time-domain signal.
78
+ fft_length: Size of the FFT to apply.
79
+ hop_length: Advance (in samples) between each frame passed to FFT.
80
+ window_length: Length of each block of samples to pass to FFT.
81
+
82
+ Returns:
83
+ 2D np.array where each row contains the magnitudes of the fft_length/2+1
84
+ unique values of the FFT for the corresponding frame of input samples.
85
+ """
86
+ frames = frame(signal, window_length, hop_length)
87
+ # Apply frame window to each frame. We use a periodic Hann (cosine of period
88
+ # window_length) instead of the symmetric Hann of np.hanning (period
89
+ # window_length-1).
90
+ window = periodic_hann(window_length)
91
+ windowed_frames = frames * window
92
+ return np.abs(np.fft.rfft(windowed_frames, int(fft_length)))
93
+
94
+
95
+ # Mel spectrum constants and functions.
96
+ _MEL_BREAK_FREQUENCY_HERTZ = 700.0
97
+ _MEL_HIGH_FREQUENCY_Q = 1127.0
98
+
99
+
100
+ def hertz_to_mel(frequencies_hertz):
101
+ """Convert frequencies to mel scale using HTK formula.
102
+
103
+ Args:
104
+ frequencies_hertz: Scalar or np.array of frequencies in hertz.
105
+
106
+ Returns:
107
+ Object of same size as frequencies_hertz containing corresponding values
108
+ on the mel scale.
109
+ """
110
+ return _MEL_HIGH_FREQUENCY_Q * np.log(
111
+ 1.0 + (frequencies_hertz / _MEL_BREAK_FREQUENCY_HERTZ))
112
+
113
+
114
+ def spectrogram_to_mel_matrix(num_mel_bins=20,
115
+ num_spectrogram_bins=129,
116
+ audio_sample_rate=8000,
117
+ lower_edge_hertz=125.0,
118
+ upper_edge_hertz=3800.0):
119
+ """Return a matrix that can post-multiply spectrogram rows to make mel.
120
+
121
+ Returns a np.array matrix A that can be used to post-multiply a matrix S of
122
+ spectrogram values (STFT magnitudes) arranged as frames x bins to generate a
123
+ "mel spectrogram" M of frames x num_mel_bins. M = S A.
124
+
125
+ The classic HTK algorithm exploits the complementarity of adjacent mel bands
126
+ to multiply each FFT bin by only one mel weight, then add it, with positive
127
+ and negative signs, to the two adjacent mel bands to which that bin
128
+ contributes. Here, by expressing this operation as a matrix multiply, we go
129
+ from num_fft multiplies per frame (plus around 2*num_fft adds) to around
130
+ num_fft^2 multiplies and adds. However, because these are all presumably
131
+ accomplished in a single call to np.dot(), it's not clear which approach is
132
+ faster in Python. The matrix multiplication has the attraction of being more
133
+ general and flexible, and much easier to read.
134
+
135
+ Args:
136
+ num_mel_bins: How many bands in the resulting mel spectrum. This is
137
+ the number of columns in the output matrix.
138
+ num_spectrogram_bins: How many bins there are in the source spectrogram
139
+ data, which is understood to be fft_size/2 + 1, i.e. the spectrogram
140
+ only contains the nonredundant FFT bins.
141
+ audio_sample_rate: Samples per second of the audio at the input to the
142
+ spectrogram. We need this to figure out the actual frequencies for
143
+ each spectrogram bin, which dictates how they are mapped into mel.
144
+ lower_edge_hertz: Lower bound on the frequencies to be included in the mel
145
+ spectrum. This corresponds to the lower edge of the lowest triangular
146
+ band.
147
+ upper_edge_hertz: The desired top edge of the highest frequency band.
148
+
149
+ Returns:
150
+ An np.array with shape (num_spectrogram_bins, num_mel_bins).
151
+
152
+ Raises:
153
+ ValueError: if frequency edges are incorrectly ordered or out of range.
154
+ """
155
+ nyquist_hertz = audio_sample_rate / 2.
156
+ if lower_edge_hertz < 0.0:
157
+ raise ValueError("lower_edge_hertz %.1f must be >= 0" % lower_edge_hertz)
158
+ if lower_edge_hertz >= upper_edge_hertz:
159
+ raise ValueError("lower_edge_hertz %.1f >= upper_edge_hertz %.1f" %
160
+ (lower_edge_hertz, upper_edge_hertz))
161
+ if upper_edge_hertz > nyquist_hertz:
162
+ raise ValueError("upper_edge_hertz %.1f is greater than Nyquist %.1f" %
163
+ (upper_edge_hertz, nyquist_hertz))
164
+ spectrogram_bins_hertz = np.linspace(0.0, nyquist_hertz, num_spectrogram_bins)
165
+ spectrogram_bins_mel = hertz_to_mel(spectrogram_bins_hertz)
166
+ # The i'th mel band (starting from i=1) has center frequency
167
+ # band_edges_mel[i], lower edge band_edges_mel[i-1], and higher edge
168
+ # band_edges_mel[i+1]. Thus, we need num_mel_bins + 2 values in
169
+ # the band_edges_mel arrays.
170
+ band_edges_mel = np.linspace(hertz_to_mel(lower_edge_hertz),
171
+ hertz_to_mel(upper_edge_hertz), num_mel_bins + 2)
172
+ # Matrix to post-multiply feature arrays whose rows are num_spectrogram_bins
173
+ # of spectrogram values.
174
+ mel_weights_matrix = np.empty((num_spectrogram_bins, num_mel_bins))
175
+ for i in range(num_mel_bins):
176
+ lower_edge_mel, center_mel, upper_edge_mel = band_edges_mel[i:i + 3]
177
+ # Calculate lower and upper slopes for every spectrogram bin.
178
+ # Line segments are linear in the *mel* domain, not hertz.
179
+ lower_slope = ((spectrogram_bins_mel - lower_edge_mel) /
180
+ (center_mel - lower_edge_mel))
181
+ upper_slope = ((upper_edge_mel - spectrogram_bins_mel) /
182
+ (upper_edge_mel - center_mel))
183
+ # .. then intersect them with each other and zero.
184
+ mel_weights_matrix[:, i] = np.maximum(0.0, np.minimum(lower_slope,
185
+ upper_slope))
186
+ # HTK excludes the spectrogram DC bin; make sure it always gets a zero
187
+ # coefficient.
188
+ mel_weights_matrix[0, :] = 0.0
189
+ return mel_weights_matrix
190
+
191
+
192
+ def log_mel_spectrogram(data,
193
+ audio_sample_rate=8000,
194
+ log_offset=0.0,
195
+ window_length_secs=0.025,
196
+ hop_length_secs=0.010,
197
+ **kwargs):
198
+ """Convert waveform to a log magnitude mel-frequency spectrogram.
199
+
200
+ Args:
201
+ data: 1D np.array of waveform data.
202
+ audio_sample_rate: The sampling rate of data.
203
+ log_offset: Add this to values when taking log to avoid -Infs.
204
+ window_length_secs: Duration of each window to analyze.
205
+ hop_length_secs: Advance between successive analysis windows.
206
+ **kwargs: Additional arguments to pass to spectrogram_to_mel_matrix.
207
+
208
+ Returns:
209
+ 2D np.array of (num_frames, num_mel_bins) consisting of log mel filterbank
210
+ magnitudes for successive frames.
211
+ """
212
+ window_length_samples = int(round(audio_sample_rate * window_length_secs))
213
+ hop_length_samples = int(round(audio_sample_rate * hop_length_secs))
214
+ fft_length = 2 ** int(np.ceil(np.log(window_length_samples) / np.log(2.0)))
215
+ spectrogram = stft_magnitude(
216
+ data,
217
+ fft_length=fft_length,
218
+ hop_length=hop_length_samples,
219
+ window_length=window_length_samples)
220
+ mel_spectrogram = np.dot(spectrogram, spectrogram_to_mel_matrix(
221
+ num_spectrogram_bins=spectrogram.shape[1],
222
+ audio_sample_rate=audio_sample_rate, **kwargs))
223
+ return np.log(mel_spectrogram + log_offset)
FakeVD/Models/torchvggish/torchvggish/vggish.py ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import torch
3
+ import torch.nn as nn
4
+ from torch import hub
5
+
6
+ from . import vggish_input, vggish_params
7
+
8
+
9
+ class VGG(nn.Module):
10
+ def __init__(self, features):
11
+ super(VGG, self).__init__()
12
+ self.features = features
13
+ self.embeddings = nn.Sequential(
14
+ nn.Linear(512 * 4 * 6, 4096),
15
+ nn.ReLU(True),
16
+ nn.Linear(4096, 4096),
17
+ nn.ReLU(True),
18
+ nn.Linear(4096, 128),
19
+ nn.ReLU(True))
20
+
21
+ def forward(self, x):
22
+ x = self.features(x)
23
+
24
+ # Transpose the output from features to
25
+ # remain compatible with vggish embeddings
26
+ x = torch.transpose(x, 1, 3)
27
+ x = torch.transpose(x, 1, 2)
28
+ x = x.contiguous()
29
+ x = x.view(x.size(0), -1)
30
+
31
+ return self.embeddings(x)
32
+
33
+
34
+ class Postprocessor(nn.Module):
35
+ """Post-processes VGGish embeddings. Returns a torch.Tensor instead of a
36
+ numpy array in order to preserve the gradient.
37
+
38
+ "The initial release of AudioSet included 128-D VGGish embeddings for each
39
+ segment of AudioSet. These released embeddings were produced by applying
40
+ a PCA transformation (technically, a whitening transform is included as well)
41
+ and 8-bit quantization to the raw embedding output from VGGish, in order to
42
+ stay compatible with the YouTube-8M project which provides visual embeddings
43
+ in the same format for a large set of YouTube videos. This class implements
44
+ the same PCA (with whitening) and quantization transformations."
45
+ """
46
+
47
+ def __init__(self):
48
+ """Constructs a postprocessor."""
49
+ super(Postprocessor, self).__init__()
50
+ # Create empty matrix, for user's state_dict to load
51
+ self.pca_eigen_vectors = torch.empty(
52
+ (vggish_params.EMBEDDING_SIZE, vggish_params.EMBEDDING_SIZE,),
53
+ dtype=torch.float,
54
+ )
55
+ self.pca_means = torch.empty(
56
+ (vggish_params.EMBEDDING_SIZE, 1), dtype=torch.float
57
+ )
58
+
59
+ self.pca_eigen_vectors = nn.Parameter(self.pca_eigen_vectors, requires_grad=False)
60
+ self.pca_means = nn.Parameter(self.pca_means, requires_grad=False)
61
+
62
+ def postprocess(self, embeddings_batch):
63
+ """Applies tensor postprocessing to a batch of embeddings.
64
+
65
+ Args:
66
+ embeddings_batch: An tensor of shape [batch_size, embedding_size]
67
+ containing output from the embedding layer of VGGish.
68
+
69
+ Returns:
70
+ A tensor of the same shape as the input, containing the PCA-transformed,
71
+ quantized, and clipped version of the input.
72
+ """
73
+ assert len(embeddings_batch.shape) == 2, "Expected 2-d batch, got %r" % (
74
+ embeddings_batch.shape,
75
+ )
76
+ assert (
77
+ embeddings_batch.shape[1] == vggish_params.EMBEDDING_SIZE
78
+ ), "Bad batch shape: %r" % (embeddings_batch.shape,)
79
+
80
+ # Apply PCA.
81
+ # - Embeddings come in as [batch_size, embedding_size].
82
+ # - Transpose to [embedding_size, batch_size].
83
+ # - Subtract pca_means column vector from each column.
84
+ # - Premultiply by PCA matrix of shape [output_dims, input_dims]
85
+ # where both are are equal to embedding_size in our case.
86
+ # - Transpose result back to [batch_size, embedding_size].
87
+ pca_applied = torch.mm(self.pca_eigen_vectors, (embeddings_batch.t() - self.pca_means)).t()
88
+
89
+ # Quantize by:
90
+ # - clipping to [min, max] range
91
+ clipped_embeddings = torch.clamp(
92
+ pca_applied, vggish_params.QUANTIZE_MIN_VAL, vggish_params.QUANTIZE_MAX_VAL
93
+ )
94
+ # - convert to 8-bit in range [0.0, 255.0]
95
+ quantized_embeddings = torch.round(
96
+ (clipped_embeddings - vggish_params.QUANTIZE_MIN_VAL)
97
+ * (
98
+ 255.0
99
+ / (vggish_params.QUANTIZE_MAX_VAL - vggish_params.QUANTIZE_MIN_VAL)
100
+ )
101
+ )
102
+ return torch.squeeze(quantized_embeddings)
103
+
104
+ def forward(self, x):
105
+ return self.postprocess(x)
106
+
107
+
108
+ def make_layers():
109
+ layers = []
110
+ in_channels = 1
111
+ for v in [64, "M", 128, "M", 256, 256, "M", 512, 512, "M"]:
112
+ if v == "M":
113
+ layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
114
+ else:
115
+ conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
116
+ layers += [conv2d, nn.ReLU(inplace=True)]
117
+ in_channels = v
118
+ return nn.Sequential(*layers)
119
+
120
+
121
+ def _vgg():
122
+ return VGG(make_layers())
123
+
124
+
125
+ # def _spectrogram():
126
+ # config = dict(
127
+ # sr=16000,
128
+ # n_fft=400,
129
+ # n_mels=64,
130
+ # hop_length=160,
131
+ # window="hann",
132
+ # center=False,
133
+ # pad_mode="reflect",
134
+ # htk=True,
135
+ # fmin=125,
136
+ # fmax=7500,
137
+ # output_format='Magnitude',
138
+ # # device=device,
139
+ # )
140
+ # return Spectrogram.MelSpectrogram(**config)
141
+
142
+
143
+ class VGGish(VGG):
144
+ def __init__(self, urls, device=None, pretrained=True, preprocess=True, postprocess=True, progress=True):
145
+ super().__init__(make_layers())
146
+ if pretrained:
147
+ state_dict = hub.load_state_dict_from_url(urls['vggish'], progress=progress)
148
+ super().load_state_dict(state_dict)
149
+
150
+ if device is None:
151
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
152
+ self.device = device
153
+ self.preprocess = preprocess
154
+ self.postprocess = postprocess
155
+ if self.postprocess:
156
+ self.pproc = Postprocessor()
157
+ if pretrained:
158
+ state_dict = hub.load_state_dict_from_url(urls['pca'], progress=progress)
159
+ # TODO: Convert the state_dict to torch
160
+ state_dict[vggish_params.PCA_EIGEN_VECTORS_NAME] = torch.as_tensor(
161
+ state_dict[vggish_params.PCA_EIGEN_VECTORS_NAME], dtype=torch.float
162
+ )
163
+ state_dict[vggish_params.PCA_MEANS_NAME] = torch.as_tensor(
164
+ state_dict[vggish_params.PCA_MEANS_NAME].reshape(-1, 1), dtype=torch.float
165
+ )
166
+
167
+ self.pproc.load_state_dict(state_dict)
168
+ self.to(self.device)
169
+
170
+ def forward(self, x, fs=None):
171
+ if self.preprocess:
172
+ x = self._preprocess(x, fs)
173
+ x = x.to(self.device)
174
+ x = VGG.forward(self, x)
175
+ if self.postprocess:
176
+ x = self._postprocess(x)
177
+ return x
178
+
179
+ def _preprocess(self, x, fs):
180
+ if isinstance(x, np.ndarray):
181
+ x = vggish_input.waveform_to_examples(x, fs)
182
+ elif isinstance(x, str):
183
+ x = vggish_input.wavfile_to_examples(x)
184
+ else:
185
+ raise AttributeError
186
+ return x
187
+
188
+ def _postprocess(self, x):
189
+ return self.pproc(x)
FakeVD/Models/torchvggish/torchvggish/vggish_input.py ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2017 The TensorFlow Authors All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ # ==============================================================================
15
+
16
+ """Compute input examples for VGGish from audio waveform."""
17
+
18
+ # Modification: Return torch tensors rather than numpy arrays
19
+ import torch
20
+
21
+ import numpy as np
22
+ import resampy
23
+
24
+ from . import mel_features
25
+ from . import vggish_params
26
+
27
+ import soundfile as sf
28
+
29
+
30
+ def waveform_to_examples(data, sample_rate, return_tensor=True):
31
+ """Converts audio waveform into an array of examples for VGGish.
32
+
33
+ Args:
34
+ data: np.array of either one dimension (mono) or two dimensions
35
+ (multi-channel, with the outer dimension representing channels).
36
+ Each sample is generally expected to lie in the range [-1.0, +1.0],
37
+ although this is not required.
38
+ sample_rate: Sample rate of data.
39
+ return_tensor: Return data as a Pytorch tensor ready for VGGish
40
+
41
+ Returns:
42
+ 3-D np.array of shape [num_examples, num_frames, num_bands] which represents
43
+ a sequence of examples, each of which contains a patch of log mel
44
+ spectrogram, covering num_frames frames of audio and num_bands mel frequency
45
+ bands, where the frame length is vggish_params.STFT_HOP_LENGTH_SECONDS.
46
+
47
+ """
48
+ # Convert to mono.
49
+ if len(data.shape) > 1:
50
+ data = np.mean(data, axis=1)
51
+ # Resample to the rate assumed by VGGish.
52
+ if sample_rate != vggish_params.SAMPLE_RATE:
53
+ data = resampy.resample(data, sample_rate, vggish_params.SAMPLE_RATE)
54
+
55
+ # Compute log mel spectrogram features.
56
+ log_mel = mel_features.log_mel_spectrogram(
57
+ data,
58
+ audio_sample_rate=vggish_params.SAMPLE_RATE,
59
+ log_offset=vggish_params.LOG_OFFSET,
60
+ window_length_secs=vggish_params.STFT_WINDOW_LENGTH_SECONDS,
61
+ hop_length_secs=vggish_params.STFT_HOP_LENGTH_SECONDS,
62
+ num_mel_bins=vggish_params.NUM_MEL_BINS,
63
+ lower_edge_hertz=vggish_params.MEL_MIN_HZ,
64
+ upper_edge_hertz=vggish_params.MEL_MAX_HZ)
65
+
66
+ # Frame features into examples.
67
+ features_sample_rate = 1.0 / vggish_params.STFT_HOP_LENGTH_SECONDS
68
+ example_window_length = int(round(
69
+ vggish_params.EXAMPLE_WINDOW_SECONDS * features_sample_rate))
70
+ example_hop_length = int(round(
71
+ vggish_params.EXAMPLE_HOP_SECONDS * features_sample_rate))
72
+ log_mel_examples = mel_features.frame(
73
+ log_mel,
74
+ window_length=example_window_length,
75
+ hop_length=example_hop_length)
76
+
77
+ if return_tensor:
78
+ log_mel_examples = torch.tensor(
79
+ log_mel_examples, requires_grad=True)[:, None, :, :].float()
80
+
81
+ return log_mel_examples
82
+
83
+
84
+ def wavfile_to_examples(wav_file, return_tensor=True):
85
+ """Convenience wrapper around waveform_to_examples() for a common WAV format.
86
+
87
+ Args:
88
+ wav_file: String path to a file, or a file-like object. The file
89
+ is assumed to contain WAV audio data with signed 16-bit PCM samples.
90
+ torch: Return data as a Pytorch tensor ready for VGGish
91
+
92
+ Returns:
93
+ See waveform_to_examples.
94
+ """
95
+ wav_data, sr = sf.read(wav_file, dtype='int16')
96
+ assert wav_data.dtype == np.int16, 'Bad sample type: %r' % wav_data.dtype
97
+ samples = wav_data / 32768.0 # Convert to [-1.0, +1.0]
98
+ return waveform_to_examples(samples, sr, return_tensor)
FakeVD/Models/torchvggish/torchvggish/vggish_params.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2017 The TensorFlow Authors All Rights Reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+ # ==============================================================================
15
+
16
+ """Global parameters for the VGGish model.
17
+
18
+ See vggish_slim.py for more information.
19
+ """
20
+
21
+ # Architectural constants.
22
+ NUM_FRAMES = 96 # Frames in input mel-spectrogram patch.
23
+ NUM_BANDS = 64 # Frequency bands in input mel-spectrogram patch.
24
+ EMBEDDING_SIZE = 128 # Size of embedding layer.
25
+
26
+ # Hyperparameters used in feature and example generation.
27
+ SAMPLE_RATE = 16000
28
+ STFT_WINDOW_LENGTH_SECONDS = 0.025
29
+ STFT_HOP_LENGTH_SECONDS = 0.010
30
+ NUM_MEL_BINS = NUM_BANDS
31
+ MEL_MIN_HZ = 125
32
+ MEL_MAX_HZ = 7500
33
+ LOG_OFFSET = 0.01 # Offset used for stabilized log of input mel-spectrogram.
34
+ EXAMPLE_WINDOW_SECONDS = 0.96 # Each example contains 96 10ms frames
35
+ EXAMPLE_HOP_SECONDS = 0.96 # with zero overlap.
36
+
37
+ # Parameters used for embedding postprocessing.
38
+ PCA_EIGEN_VECTORS_NAME = 'pca_eigen_vectors'
39
+ PCA_MEANS_NAME = 'pca_means'
40
+ QUANTIZE_MIN_VAL = -2.0
41
+ QUANTIZE_MAX_VAL = +2.0
42
+
43
+ # Hyperparameters used in training.
44
+ INIT_STDDEV = 0.01 # Standard deviation used to initialize weights.
45
+ LEARNING_RATE = 1e-4 # Learning rate for the Adam optimizer.
46
+ ADAM_EPSILON = 1e-8 # Epsilon for the Adam optimizer.
47
+
48
+ # Names of ops, tensors, and features.
49
+ INPUT_OP_NAME = 'vggish/input_features'
50
+ INPUT_TENSOR_NAME = INPUT_OP_NAME + ':0'
51
+ OUTPUT_OP_NAME = 'vggish/embedding'
52
+ OUTPUT_TENSOR_NAME = OUTPUT_OP_NAME + ':0'
53
+ AUDIO_EMBEDDING_FEATURE_NAME = 'audio_embedding'