Spaces:
Runtime error
Runtime error
Upload 10 files
Browse files- 000003.ogg +0 -0
- 000032.ogg +0 -0
- 000038.ogg +0 -0
- 000050.ogg +0 -0
- 000103.ogg +0 -0
- README.md +13 -0
- app.py +88 -0
- article.md +44 -0
- gitattributes +27 -0
- requirements.txt +5 -0
000003.ogg
ADDED
Binary file (394 kB). View file
|
|
000032.ogg
ADDED
Binary file (380 kB). View file
|
|
000038.ogg
ADDED
Binary file (416 kB). View file
|
|
000050.ogg
ADDED
Binary file (368 kB). View file
|
|
000103.ogg
ADDED
Binary file (435 kB). View file
|
|
README.md
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: Audioclassification
|
3 |
+
emoji: 💻
|
4 |
+
colorFrom: gray
|
5 |
+
colorTo: indigo
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 2.9.4
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: mit
|
11 |
+
---
|
12 |
+
|
13 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
|
app.py
ADDED
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio
|
2 |
+
import torchaudio
|
3 |
+
from fastai.vision.all import *
|
4 |
+
from fastai.learner import load_learner
|
5 |
+
from torchvision.utils import save_image
|
6 |
+
from huggingface_hub import hf_hub_download
|
7 |
+
|
8 |
+
|
9 |
+
model = load_learner(
|
10 |
+
hf_hub_download("kurianbenoy/music_genre_classification_baseline", "model.pkl")
|
11 |
+
)
|
12 |
+
|
13 |
+
|
14 |
+
EXAMPLES_PATH = Path("./examples")
|
15 |
+
labels = model.dls.vocab
|
16 |
+
|
17 |
+
interface_options = {
|
18 |
+
"title": "Music Genre Classification",
|
19 |
+
"description": " ",
|
20 |
+
"interpretation": "default",
|
21 |
+
"layout": "horizontal",
|
22 |
+
# Audio from validation file
|
23 |
+
"examples": ["000003.ogg", "000032.ogg", "000038.ogg", "000050.ogg", "000103.ogg"],
|
24 |
+
"allow_flagging": "never"
|
25 |
+
}
|
26 |
+
|
27 |
+
## Code from Dien Hoa Truong inference notebook: https://www.kaggle.com/code/dienhoa/inference-submission-music-genre
|
28 |
+
N_FFT = 2048
|
29 |
+
HOP_LEN = 1024
|
30 |
+
|
31 |
+
|
32 |
+
def create_spectrogram(filename):
|
33 |
+
audio, sr = torchaudio.load(filename)
|
34 |
+
specgram = torchaudio.transforms.MelSpectrogram(
|
35 |
+
sample_rate=sr,
|
36 |
+
n_fft=N_FFT,
|
37 |
+
win_length=N_FFT,
|
38 |
+
hop_length=HOP_LEN,
|
39 |
+
center=True,
|
40 |
+
pad_mode="reflect",
|
41 |
+
power=2.0,
|
42 |
+
norm="slaney",
|
43 |
+
onesided=True,
|
44 |
+
n_mels=224,
|
45 |
+
mel_scale="htk",
|
46 |
+
)(audio).mean(axis=0)
|
47 |
+
specgram = torchaudio.transforms.AmplitudeToDB()(specgram)
|
48 |
+
specgram = specgram - specgram.min()
|
49 |
+
specgram = specgram / specgram.max()
|
50 |
+
|
51 |
+
return specgram
|
52 |
+
|
53 |
+
|
54 |
+
def create_image(filename):
|
55 |
+
specgram = create_spectrogram(filename)
|
56 |
+
dest = Path("temp.png")
|
57 |
+
save_image(specgram, "temp.png")
|
58 |
+
|
59 |
+
|
60 |
+
# Code from: https://huggingface.co/spaces/suvash/food-101-resnet50
|
61 |
+
def predict(img):
|
62 |
+
img = PILImage.create(img)
|
63 |
+
_pred, _pred_w_idx, probs = model.predict(img)
|
64 |
+
# gradio doesn't support tensors, so converting to float
|
65 |
+
labels_probs = {labels[i]: float(probs[i]) for i, _ in enumerate(labels)}
|
66 |
+
return labels_probs
|
67 |
+
|
68 |
+
|
69 |
+
def end2endpipeline(filename):
|
70 |
+
create_image(filename)
|
71 |
+
return predict("temp.png")
|
72 |
+
|
73 |
+
|
74 |
+
demo = gradio.Interface(
|
75 |
+
fn=end2endpipeline,
|
76 |
+
inputs=gradio.inputs.Audio(source="upload", type="filepath"),
|
77 |
+
outputs=gradio.outputs.Label(num_top_classes=5),
|
78 |
+
**interface_options,
|
79 |
+
)
|
80 |
+
|
81 |
+
launch_options = {
|
82 |
+
"enable_queue": True,
|
83 |
+
"share": False,
|
84 |
+
# thanks Alex for pointing this option to cache examples
|
85 |
+
"cache_examples": True,
|
86 |
+
}
|
87 |
+
|
88 |
+
demo.launch(**launch_options)
|
article.md
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
> Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser.
|
2 |
+
|
3 |
+
During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before.
|
4 |
+
|
5 |
+
|
6 |
+
[You can find how I trained this music genre classification using fast.ai in this blogpost.](https://kurianbenoy.com/posts/2022/2022-05-01-audiocnndemo.html).
|
7 |
+
|
8 |
+
## Dataset
|
9 |
+
|
10 |
+
1. [The competition data](https://www.kaggle.com/competitions/kaggle-pog-series-s01e02/data)
|
11 |
+
2. [Image data generated from converting audio to melspectograms in form of images](https://www.kaggle.com/datasets/dienhoa/music-genre-spectrogram-pogchamps)
|
12 |
+
|
13 |
+
|
14 |
+
## Training
|
15 |
+
|
16 |
+
Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs.
|
17 |
+
|
18 |
+
| epoch | train_loss | valid_loss | error_rate | time |
|
19 |
+
|-------|---------------|---------------|---------------|-------|
|
20 |
+
|0 | 2.312176 | 1.843815 | 0.558654 | 02:07 |
|
21 |
+
|1 | 2.102361 | 1.719162 | 0.539061 | 02:08 |
|
22 |
+
|2 | 1.867139 | 1.623988 | 0.527003 | 02:08 |
|
23 |
+
|3 | 1.710557 | 1.527913 | 0.507661 | 02:07 |
|
24 |
+
|4 | 1.629478 | 1.456836 | 0.479779 | 02:05 |
|
25 |
+
|5 | 1.519305 | 1.433036 | 0.474253 | 02:05 |
|
26 |
+
|6 | 1.457465 | 1.379757 | 0.464456 | 02:05 |
|
27 |
+
|7 | 1.396283 | 1.369344 | 0.457925 | 02:05 |
|
28 |
+
|8 | 1.359388 | 1.367973 | 0.453655 | 02:05 |
|
29 |
+
|9 | 1.364363 | 1.368887 | 0.456167 | 02:04 |
|
30 |
+
|
31 |
+
|
32 |
+
## Examples
|
33 |
+
|
34 |
+
The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training.
|
35 |
+
|
36 |
+
## Credits
|
37 |
+
|
38 |
+
Thanks [Dien Hoa Truong](https://twitter.com/DienhoaT) for providing [inference code](https://www.kaggle.com/code/dienhoa/inference-submission-music-genre) for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction.
|
39 |
+
|
40 |
+
Thanks [@suvash](https://twitter.com/suvash) for helping me get started with huggingface
|
41 |
+
spaces and for his [excellent space](https://huggingface.co/spaces/suvash/food-101-resnet50) which was a reference for this work.
|
42 |
+
|
43 |
+
Thanks [@strickvl](https://twitter.com/strickvl) for reporting issue in safari browser
|
44 |
+
and trying this space out.
|
gitattributes
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
5 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
6 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
7 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
8 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
9 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
10 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
11 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
12 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
13 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
14 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
15 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
16 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
17 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
18 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
19 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
20 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
21 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
22 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
23 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
24 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
25 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
26 |
+
*.zstandard filter=lfs diff=lfs merge=lfs -text
|
27 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
requirements.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
fastai==2.6.0
|
2 |
+
gradio==2.9.4
|
3 |
+
torchaudio
|
4 |
+
torchvision
|
5 |
+
huggingface_hub
|