asutosh09 commited on
Commit
53ddea5
·
verified ·
1 Parent(s): 378a95c

Upload 10 files

Browse files
Files changed (10) hide show
  1. 000003.ogg +0 -0
  2. 000032.ogg +0 -0
  3. 000038.ogg +0 -0
  4. 000050.ogg +0 -0
  5. 000103.ogg +0 -0
  6. README.md +13 -0
  7. app.py +88 -0
  8. article.md +44 -0
  9. gitattributes +27 -0
  10. requirements.txt +5 -0
000003.ogg ADDED
Binary file (394 kB). View file
 
000032.ogg ADDED
Binary file (380 kB). View file
 
000038.ogg ADDED
Binary file (416 kB). View file
 
000050.ogg ADDED
Binary file (368 kB). View file
 
000103.ogg ADDED
Binary file (435 kB). View file
 
README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Audioclassification
3
+ emoji: 💻
4
+ colorFrom: gray
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: 2.9.4
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
app.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio
2
+ import torchaudio
3
+ from fastai.vision.all import *
4
+ from fastai.learner import load_learner
5
+ from torchvision.utils import save_image
6
+ from huggingface_hub import hf_hub_download
7
+
8
+
9
+ model = load_learner(
10
+ hf_hub_download("kurianbenoy/music_genre_classification_baseline", "model.pkl")
11
+ )
12
+
13
+
14
+ EXAMPLES_PATH = Path("./examples")
15
+ labels = model.dls.vocab
16
+
17
+ interface_options = {
18
+ "title": "Music Genre Classification",
19
+ "description": " ",
20
+ "interpretation": "default",
21
+ "layout": "horizontal",
22
+ # Audio from validation file
23
+ "examples": ["000003.ogg", "000032.ogg", "000038.ogg", "000050.ogg", "000103.ogg"],
24
+ "allow_flagging": "never"
25
+ }
26
+
27
+ ## Code from Dien Hoa Truong inference notebook: https://www.kaggle.com/code/dienhoa/inference-submission-music-genre
28
+ N_FFT = 2048
29
+ HOP_LEN = 1024
30
+
31
+
32
+ def create_spectrogram(filename):
33
+ audio, sr = torchaudio.load(filename)
34
+ specgram = torchaudio.transforms.MelSpectrogram(
35
+ sample_rate=sr,
36
+ n_fft=N_FFT,
37
+ win_length=N_FFT,
38
+ hop_length=HOP_LEN,
39
+ center=True,
40
+ pad_mode="reflect",
41
+ power=2.0,
42
+ norm="slaney",
43
+ onesided=True,
44
+ n_mels=224,
45
+ mel_scale="htk",
46
+ )(audio).mean(axis=0)
47
+ specgram = torchaudio.transforms.AmplitudeToDB()(specgram)
48
+ specgram = specgram - specgram.min()
49
+ specgram = specgram / specgram.max()
50
+
51
+ return specgram
52
+
53
+
54
+ def create_image(filename):
55
+ specgram = create_spectrogram(filename)
56
+ dest = Path("temp.png")
57
+ save_image(specgram, "temp.png")
58
+
59
+
60
+ # Code from: https://huggingface.co/spaces/suvash/food-101-resnet50
61
+ def predict(img):
62
+ img = PILImage.create(img)
63
+ _pred, _pred_w_idx, probs = model.predict(img)
64
+ # gradio doesn't support tensors, so converting to float
65
+ labels_probs = {labels[i]: float(probs[i]) for i, _ in enumerate(labels)}
66
+ return labels_probs
67
+
68
+
69
+ def end2endpipeline(filename):
70
+ create_image(filename)
71
+ return predict("temp.png")
72
+
73
+
74
+ demo = gradio.Interface(
75
+ fn=end2endpipeline,
76
+ inputs=gradio.inputs.Audio(source="upload", type="filepath"),
77
+ outputs=gradio.outputs.Label(num_top_classes=5),
78
+ **interface_options,
79
+ )
80
+
81
+ launch_options = {
82
+ "enable_queue": True,
83
+ "share": False,
84
+ # thanks Alex for pointing this option to cache examples
85
+ "cache_examples": True,
86
+ }
87
+
88
+ demo.launch(**launch_options)
article.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ > Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser.
2
+
3
+ During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before.
4
+
5
+
6
+ [You can find how I trained this music genre classification using fast.ai in this blogpost.](https://kurianbenoy.com/posts/2022/2022-05-01-audiocnndemo.html).
7
+
8
+ ## Dataset
9
+
10
+ 1. [The competition data](https://www.kaggle.com/competitions/kaggle-pog-series-s01e02/data)
11
+ 2. [Image data generated from converting audio to melspectograms in form of images](https://www.kaggle.com/datasets/dienhoa/music-genre-spectrogram-pogchamps)
12
+
13
+
14
+ ## Training
15
+
16
+ Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs.
17
+
18
+ | epoch | train_loss | valid_loss | error_rate | time |
19
+ |-------|---------------|---------------|---------------|-------|
20
+ |0 | 2.312176 | 1.843815 | 0.558654 | 02:07 |
21
+ |1 | 2.102361 | 1.719162 | 0.539061 | 02:08 |
22
+ |2 | 1.867139 | 1.623988 | 0.527003 | 02:08 |
23
+ |3 | 1.710557 | 1.527913 | 0.507661 | 02:07 |
24
+ |4 | 1.629478 | 1.456836 | 0.479779 | 02:05 |
25
+ |5 | 1.519305 | 1.433036 | 0.474253 | 02:05 |
26
+ |6 | 1.457465 | 1.379757 | 0.464456 | 02:05 |
27
+ |7 | 1.396283 | 1.369344 | 0.457925 | 02:05 |
28
+ |8 | 1.359388 | 1.367973 | 0.453655 | 02:05 |
29
+ |9 | 1.364363 | 1.368887 | 0.456167 | 02:04 |
30
+
31
+
32
+ ## Examples
33
+
34
+ The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training.
35
+
36
+ ## Credits
37
+
38
+ Thanks [Dien Hoa Truong](https://twitter.com/DienhoaT) for providing [inference code](https://www.kaggle.com/code/dienhoa/inference-submission-music-genre) for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction.
39
+
40
+ Thanks [@suvash](https://twitter.com/suvash) for helping me get started with huggingface
41
+ spaces and for his [excellent space](https://huggingface.co/spaces/suvash/food-101-resnet50) which was a reference for this work.
42
+
43
+ Thanks [@strickvl](https://twitter.com/strickvl) for reporting issue in safari browser
44
+ and trying this space out.
gitattributes ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ftz filter=lfs diff=lfs merge=lfs -text
6
+ *.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.h5 filter=lfs diff=lfs merge=lfs -text
8
+ *.joblib filter=lfs diff=lfs merge=lfs -text
9
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
10
+ *.model filter=lfs diff=lfs merge=lfs -text
11
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
12
+ *.onnx filter=lfs diff=lfs merge=lfs -text
13
+ *.ot filter=lfs diff=lfs merge=lfs -text
14
+ *.parquet filter=lfs diff=lfs merge=lfs -text
15
+ *.pb filter=lfs diff=lfs merge=lfs -text
16
+ *.pt filter=lfs diff=lfs merge=lfs -text
17
+ *.pth filter=lfs diff=lfs merge=lfs -text
18
+ *.rar filter=lfs diff=lfs merge=lfs -text
19
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
20
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
21
+ *.tflite filter=lfs diff=lfs merge=lfs -text
22
+ *.tgz filter=lfs diff=lfs merge=lfs -text
23
+ *.wasm filter=lfs diff=lfs merge=lfs -text
24
+ *.xz filter=lfs diff=lfs merge=lfs -text
25
+ *.zip filter=lfs diff=lfs merge=lfs -text
26
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
27
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ fastai==2.6.0
2
+ gradio==2.9.4
3
+ torchaudio
4
+ torchvision
5
+ huggingface_hub