Description
This model is a specialized version of the distilhubert model fine-tuned on the gtzan dataset for the task of Music Genre Classification.
Development
- Kaggle Notebook: Audio Data: Music Genre Classification
Training Parameters
evaluation_strategy = 'epoch',
save_strategy = 'epoch',
load_best_model_at_end = True,
metric_for_best_model = 'accuracy',
learning_rate = 5e-5,
seed = 42,
per_device_train_batch_size = 8,
per_device_eval_batch_size = 8,
gradient_accumulation_steps = 1,
num_train_epochs = 15,
warmup_ratio = 0.1,
fp16 = True,
save_total_limit = 2,
report_to = 'none'
Training and Validation Results
Epoch Training Loss Validation Loss Accuracy
1 No log 2.050576 0.395000
2 No log 1.387915 0.565000
3 No log 1.141497 0.665000
4 No log 1.052763 0.675000
5 1.354600 0.846402 0.745000
6 1.354600 0.858698 0.750000
7 1.354600 0.864531 0.730000
8 1.354600 0.765039 0.775000
9 1.354600 0.790847 0.785000
10 0.250100 0.873926 0.785000
11 0.250100 0.928275 0.770000
12 0.250100 0.851429 0.780000
13 0.250100 0.922214 0.770000
14 0.250100 0.916481 0.780000
15 0.028000 0.946075 0.770000
TrainOutput(global_step=1500, training_loss=0.5442592652638754,
metrics={'train_runtime': 12274.2966, 'train_samples_per_second': 0.976,
'train_steps_per_second': 0.122, 'total_flos': 8.177513845536e+17, 'train_loss': 0.5442592652638754, 'epoch': 15.0})
Reference
This model is based on the original HuBERT architecture, as detailed in:
Hsu et al. (2021). HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. arXiv:2106.07447
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.