Spaces:

kurianbenoy
/

audioclassification

Running

App Files Files Community

audioclassification / article.md

strickvl

Fix broken URL

cdcfb1f over 1 year ago

preview code

raw history blame

No virus

2.4 kB

	> Note: The examples provides doesn't work on Safari, in case people are trying to access on a Mac. Please try it in a different browser.

	During first lesson of Practical Deep Learning for Coders course, Jeremy had mentioned how using simple computer vision model by being a bit creative we can build a state of the art model to classify audio with same image classification model. I was curious on how I can train an music classifier, as I have never worked on audio data problems before.


	[You can find how I trained this music genre classification using fast.ai in this blogpost.](https://kurianbenoy.com/ml-blog/fastai/fastaicourse/2022/05/01/AudioCNNDemo.html).

	## Dataset

	1. [The competition data](https://www.kaggle.com/competitions/kaggle-pog-series-s01e02/data)
	2. [Image data generated from converting audio to melspectograms in form of images](https://www.kaggle.com/datasets/dienhoa/music-genre-spectrogram-pogchamps)


	## Training

	Fast.ai was used to train this classifier with a ResNet50 vision learner for 10 epochs.

	\| epoch \| train_loss \| valid_loss \| error_rate \| time \|
	\|-------\|---------------\|---------------\|---------------\|-------\|
	\|0 \| 2.312176 \| 1.843815 \| 0.558654 \| 02:07 \|
	\|1 \| 2.102361 \| 1.719162 \| 0.539061 \| 02:08 \|
	\|2 \| 1.867139 \| 1.623988 \| 0.527003 \| 02:08 \|
	\|3 \| 1.710557 \| 1.527913 \| 0.507661 \| 02:07 \|
	\|4 \| 1.629478 \| 1.456836 \| 0.479779 \| 02:05 \|
	\|5 \| 1.519305 \| 1.433036 \| 0.474253 \| 02:05 \|
	\|6 \| 1.457465 \| 1.379757 \| 0.464456 \| 02:05 \|
	\|7 \| 1.396283 \| 1.369344 \| 0.457925 \| 02:05 \|
	\|8 \| 1.359388 \| 1.367973 \| 0.453655 \| 02:05 \|
	\|9 \| 1.364363 \| 1.368887 \| 0.456167 \| 02:04 \|


	## Examples

	The example images provided in the demo are from the validation data from Kaggle competition data, which was not used during training.

	## Credits

	Thanks [Dien Hoa Truong](https://twitter.com/DienhoaT) for providing [inference code](https://www.kaggle.com/code/dienhoa/inference-submission-music-genre) for creating end to end pipeline from creating audio to converting to melspectograms, and then doing prediction.

	Thanks [@suvash](https://twitter.com/suvash) for helping me get started with huggingface
	spaces and for his [excellent space](https://huggingface.co/spaces/suvash/food-101-resnet50) which was a reference for this work.

	Thanks [@strickvl](https://twitter.com/strickvl) for reporting issue in safari browser
	and trying this space out.