Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser.
During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before.
You can find how I trained this music genre classification using fast.ai in this blogpost..
Dataset
Training
Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs.
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 2.312176 | 1.843815 | 0.558654 | 02:07 |
1 | 2.102361 | 1.719162 | 0.539061 | 02:08 |
2 | 1.867139 | 1.623988 | 0.527003 | 02:08 |
3 | 1.710557 | 1.527913 | 0.507661 | 02:07 |
4 | 1.629478 | 1.456836 | 0.479779 | 02:05 |
5 | 1.519305 | 1.433036 | 0.474253 | 02:05 |
6 | 1.457465 | 1.379757 | 0.464456 | 02:05 |
7 | 1.396283 | 1.369344 | 0.457925 | 02:05 |
8 | 1.359388 | 1.367973 | 0.453655 | 02:05 |
9 | 1.364363 | 1.368887 | 0.456167 | 02:04 |
Examples
The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training.
Credits
Thanks Dien Hoa Truong for providing inference code for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction.
Thanks @suvash for helping me get started with huggingface spaces and for his excellent space which was a reference for this work.
Thanks @strickvl for reporting issue in safari browser and trying this space out.