--- license: bsd-3-clause tags: - generated_from_trainer metrics: - accuracy - precision - recall - f1 model-index: - name: ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1 results: [] --- # ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1 This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on a subset of [ashraq/esc50](https://huggingface.co/datasets/ashraq/esc50) dataset. It achieves the following results on the evaluation set: - Loss: 0.7391 - Accuracy: 0.9286 - Precision: 0.9449 - Recall: 0.9286 - F1: 0.9244 ## Training and evaluation data Training and evaluation data were augmented with audiomentations [GitHub: iver56/audiomentations](https://github.com/iver56/audiomentations) library and the following augmentation methods have been performed based on previous experiments [Elliott et al.: Tiny transformers for audio classification at the edge](https://arxiv.org/pdf/2103.12157.pdf): **Gain** - each audio sample is amplified/attenuated by a random factor between 0.5 and 1.5 with a 0.3 probability **Noise** - a random amount of Gaussian noise with a relative amplitude between 0.001 and 0.015 is added to each audio sample with a 0.5 probability **Speed adjust** - duration of each audio sample is extended by a random amount between 0.5 and 1.5 with a 0.3 probability **Pitch shift** - pitch of each audio sample is shifted by a random amount of semitones selected from the closed interval [-4,4] with a 0.3 probability **Time masking** - a random fraction of lenght of each audio sample in the range of (0,0.02] is erased with a 0.3 probability ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-06 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 | |:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:| | 9.9002 | 1.0 | 28 | 8.5662 | 0.0 | 0.0 | 0.0 | 0.0 | | 5.7235 | 2.0 | 56 | 4.3990 | 0.0357 | 0.0238 | 0.0357 | 0.0286 | | 2.4076 | 3.0 | 84 | 2.2972 | 0.4643 | 0.7405 | 0.4643 | 0.4684 | | 1.4448 | 4.0 | 112 | 1.3975 | 0.7143 | 0.7340 | 0.7143 | 0.6863 | | 0.8373 | 5.0 | 140 | 1.0468 | 0.8571 | 0.8524 | 0.8571 | 0.8448 | | 0.7239 | 6.0 | 168 | 0.8518 | 0.8929 | 0.9164 | 0.8929 | 0.8766 | | 0.6504 | 7.0 | 196 | 0.7391 | 0.9286 | 0.9449 | 0.9286 | 0.9244 | | 0.535 | 8.0 | 224 | 0.6682 | 0.9286 | 0.9449 | 0.9286 | 0.9244 | | 0.4237 | 9.0 | 252 | 0.6443 | 0.9286 | 0.9449 | 0.9286 | 0.9244 | | 0.3709 | 10.0 | 280 | 0.6304 | 0.9286 | 0.9449 | 0.9286 | 0.9244 | ### Test results | Parameter | Value | |:------------------------:|:------------------:| | test_loss | 0.5829914808273315 | | test_accuracy | 0.9285714285714286 | | test_precision | 0.9446428571428571 | | test_recall | 0.9285714285714286 | | test_f1 | 0.930292723149866 | | test_runtime (s) | 4.1488 | | test_samples_per_second | 6.749 | | test_steps_per_second | 3.374 | | epoch | 10.0 | ### Framework versions - Transformers 4.27.4 - Pytorch 2.0.0 - Datasets 2.10.1 - Tokenizers 0.13.2