Spaces:

asutosh09
/

genreclassification

Runtime error

App Files Files Community

asutosh09 commited on Apr 5, 2024

Commit

53ddea5

verified ·

1 Parent(s): 378a95c

Upload 10 files

Browse files

Files changed (10) hide show

000003.ogg +0 -0
000032.ogg +0 -0
000038.ogg +0 -0
000050.ogg +0 -0
000103.ogg +0 -0
README.md +13 -0
app.py +88 -0
article.md +44 -0
gitattributes +27 -0
requirements.txt +5 -0

000003.ogg ADDED Viewed

Binary file (394 kB). View file

000032.ogg ADDED Viewed

Binary file (380 kB). View file

000038.ogg ADDED Viewed

Binary file (416 kB). View file

000050.ogg ADDED Viewed

Binary file (368 kB). View file

000103.ogg ADDED Viewed

Binary file (435 kB). View file

README.md ADDED Viewed

	@@ -0,0 +1,13 @@

+---
+title: Audioclassification
+emoji: 💻
+colorFrom: gray
+colorTo: indigo
+sdk: gradio
+sdk_version: 2.9.4
+app_file: app.py
+pinned: false
+license: mit
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference

app.py ADDED Viewed

	@@ -0,0 +1,88 @@

+import gradio
+import torchaudio
+from fastai.vision.all import *
+from fastai.learner import load_learner
+from torchvision.utils import save_image
+from huggingface_hub import hf_hub_download
+model = load_learner(
+    hf_hub_download("kurianbenoy/music_genre_classification_baseline", "model.pkl")
+)
+EXAMPLES_PATH = Path("./examples")
+labels = model.dls.vocab
+interface_options = {
+    "title": "Music Genre Classification",
+    "description": " ",
+    "interpretation": "default",
+    "layout": "horizontal",
+    # Audio from validation file
+    "examples": ["000003.ogg", "000032.ogg", "000038.ogg", "000050.ogg", "000103.ogg"],
+    "allow_flagging": "never"
+}
+## Code from Dien Hoa Truong inference notebook: https://www.kaggle.com/code/dienhoa/inference-submission-music-genre
+N_FFT = 2048
+HOP_LEN = 1024
+def create_spectrogram(filename):
+    audio, sr = torchaudio.load(filename)
+    specgram = torchaudio.transforms.MelSpectrogram(
+        sample_rate=sr,
+        n_fft=N_FFT,
+        win_length=N_FFT,
+        hop_length=HOP_LEN,
+        center=True,
+        pad_mode="reflect",
+        power=2.0,
+        norm="slaney",
+        onesided=True,
+        n_mels=224,
+        mel_scale="htk",
+    )(audio).mean(axis=0)
+    specgram = torchaudio.transforms.AmplitudeToDB()(specgram)
+    specgram = specgram - specgram.min()
+    specgram = specgram / specgram.max()
+    return specgram
+def create_image(filename):
+    specgram = create_spectrogram(filename)
+    dest = Path("temp.png")
+    save_image(specgram, "temp.png")
+# Code from: https://huggingface.co/spaces/suvash/food-101-resnet50
+def predict(img):
+    img = PILImage.create(img)
+    _pred, _pred_w_idx, probs = model.predict(img)
+    # gradio doesn't support tensors, so converting to float
+    labels_probs = {labels[i]: float(probs[i]) for i, _ in enumerate(labels)}
+    return labels_probs
+def end2endpipeline(filename):
+    create_image(filename)
+    return predict("temp.png")
+demo = gradio.Interface(
+    fn=end2endpipeline,
+    inputs=gradio.inputs.Audio(source="upload", type="filepath"),
+    outputs=gradio.outputs.Label(num_top_classes=5),
+    **interface_options,
+)
+launch_options = {
+    "enable_queue": True,
+    "share": False,
+    # thanks Alex for pointing this option to cache examples
+    "cache_examples": True,
+}
+demo.launch(**launch_options)

article.md ADDED Viewed

	@@ -0,0 +1,44 @@

+> Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser.
+During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before.
+[You can find how I trained this music genre classification using fast.ai in this blogpost.](https://kurianbenoy.com/posts/2022/2022-05-01-audiocnndemo.html).
+## Dataset
+1. [The competition data](https://www.kaggle.com/competitions/kaggle-pog-series-s01e02/data)
+2. [Image data generated from converting audio to melspectograms in form of images](https://www.kaggle.com/datasets/dienhoa/music-genre-spectrogram-pogchamps)
+## Training
+Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs.
+| epoch	| train_loss	| valid_loss	| error_rate	| time  |
+|-------|---------------|---------------|---------------|-------|
+|0  |	2.312176 |	1.843815 |	0.558654 |	02:07 |
+|1  |	2.102361 |	1.719162 |	0.539061 |	02:08 |
+|2  |	1.867139 |	1.623988 |	0.527003 |	02:08 |
+|3  |	1.710557 |	1.527913 |	0.507661 |	02:07 |
+|4  |	1.629478 |	1.456836 |	0.479779 |	02:05 |
+|5  |	1.519305 |	1.433036 |	0.474253 |	02:05 |
+|6  |	1.457465 |	1.379757 |	0.464456 |	02:05 |
+|7  |	1.396283 |	1.369344 |	0.457925 |	02:05 |
+|8  |	1.359388 |	1.367973 |	0.453655 |	02:05 |
+|9  |	1.364363 |	1.368887 |	0.456167 |	02:04 |
+## Examples
+The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training.
+## Credits
+Thanks [Dien Hoa Truong](https://twitter.com/DienhoaT) for providing [inference code](https://www.kaggle.com/code/dienhoa/inference-submission-music-genre) for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction.
+Thanks [@suvash](https://twitter.com/suvash) for helping me get started with huggingface
+spaces and for his [excellent space](https://huggingface.co/spaces/suvash/food-101-resnet50) which was a reference for this work.
+Thanks [@strickvl](https://twitter.com/strickvl) for reporting issue in safari browser
+and trying this space out.

gitattributes ADDED Viewed

	@@ -0,0 +1,27 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zstandard filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+fastai==2.6.0
+gradio==2.9.4
+torchaudio
+torchvision
+huggingface_hub