article.md · kurianbenoy/audioclassification at dea9c3e8a0b6f86a6cd9e69aa39aeb4b5a104d51

Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser.

During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before.

You can find how I trained this music genre classification using fast.ai in this blogpost..

Dataset

Training

Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs.

epoch	train_loss	valid_loss	error_rate	time
0	2.312176	1.843815	0.558654	02:07
1	2.102361	1.719162	0.539061	02:08
2	1.867139	1.623988	0.527003	02:08
3	1.710557	1.527913	0.507661	02:07
4	1.629478	1.456836	0.479779	02:05
5	1.519305	1.433036	0.474253	02:05
6	1.457465	1.379757	0.464456	02:05
7	1.396283	1.369344	0.457925	02:05
8	1.359388	1.367973	0.453655	02:05
9	1.364363	1.368887	0.456167	02:04

Examples

The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training.

Credits

Thanks Dien Hoa Truong for providing inference code for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction.

Thanks @suvash for helping me get started with huggingface spaces and for his excellent space which was a reference for this work.

Thanks @strickvl for reporting issue in safari browser and trying this space out.