audioclassification / article.md
kurianbenoy's picture
Update article.md
ba226b0

A newer version of the Gradio SDK is available: 5.6.0

Upgrade

Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser.

During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before.

You can find how I trained this music genre classification using fast.ai in this blogpost..

Dataset

  1. The competition data
  2. Image data generated from converting audio to melspectograms in form of images

Training

Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs.

epoch train_loss valid_loss error_rate time
0 2.312176 1.843815 0.558654 02:07
1 2.102361 1.719162 0.539061 02:08
2 1.867139 1.623988 0.527003 02:08
3 1.710557 1.527913 0.507661 02:07
4 1.629478 1.456836 0.479779 02:05
5 1.519305 1.433036 0.474253 02:05
6 1.457465 1.379757 0.464456 02:05
7 1.396283 1.369344 0.457925 02:05
8 1.359388 1.367973 0.453655 02:05
9 1.364363 1.368887 0.456167 02:04

Examples

The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training.

Credits

Thanks Dien Hoa Truong for providing inference code for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction.

Thanks @suvash for helping me get started with huggingface spaces and for his excellent space which was a reference for this work.

Thanks @strickvl for reporting issue in safari browser and trying this space out.