--- language: - en tags: - AudioClassification datasets: - marsyas/gtzan metrics: - accuracy --- # Audio Classification This repo contains code and notes for [this tutorial](https://huggingface.co/learn/audio-course/chapter4/fine-tuning). ## Dataset [GTZAN](https://huggingface.co/datasets/marsyas/gtzan) is used. ## Usage ```shell export HUGGINGFACE_TOKEN= python main.py ``` ## Performance Acc: 0.81 (default setting) ## Notes 1. 🤗 Datasets support `train_test_split()` method to split the dataset. 2. `feature_extractor` can not handle resampling - To resample, one can use `dataset.map()` ```python from datasets import Audio gtzan = gtzan.cast_column("audio", Audio(sampling_rate=feature_extractor.sampling_rate)) ``` 3. `feature_extractor` do the normalization and returns `input_values` and `attention_mask`. 4. `.map()` support batched preprocess. 5. Why `AutoModelForAudioClassification.from_pretrained` takes `label2id` and `id2label`?