---
language: 
  - en
tags:
- AudioClassification
datasets:
- marsyas/gtzan
metrics:
- accuracy
---

# Audio Classification

This repo contains code and notes for [this tutorial](https://huggingface.co/learn/audio-course/chapter4/fine-tuning).

## Dataset

[GTZAN](https://huggingface.co/datasets/marsyas/gtzan) is used.

## Usage

```shell
export HUGGINGFACE_TOKEN=<your_token>
python main.py
```

## Performance

Acc: 0.81 (default setting)

## Notes

1. 🤗 Datasets support `train_test_split()` method to split the dataset.

2. `feature_extractor` can not handle resampling
    - To resample, one can use `dataset.map()`
```python
from datasets import Audio

gtzan = gtzan.cast_column("audio", Audio(sampling_rate=feature_extractor.sampling_rate))
```

3. `feature_extractor` do the normalization and returns `input_values` and `attention_mask`.

4. `.map()` support batched preprocess.

5. Why `AutoModelForAudioClassification.from_pretrained` takes `label2id` and `id2label`?