metadata
language:
- en
tags:
- AudioClassification
datasets:
- marsyas/gtzan
metrics:
- accuracy
Audio Classification
This repo contains code and notes for this tutorial.
Dataset
GTZAN is used.
Usage
export HUGGINGFACE_TOKEN=<your_token>
python main.py
Performance
Acc: 0.81 (default setting)
Notes
🤗 Datasets support
train_test_split()
method to split the dataset.feature_extractor
can not handle resampling- To resample, one can use
dataset.map()
- To resample, one can use
from datasets import Audio
gtzan = gtzan.cast_column("audio", Audio(sampling_rate=feature_extractor.sampling_rate))
feature_extractor
do the normalization and returnsinput_values
andattention_mask
..map()
support batched preprocess.Why
AutoModelForAudioClassification.from_pretrained
takeslabel2id
andid2label
?