Edit model card

speech-emotion-recognition-with-facebook-wav2vec2-large-xlsr-53

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the RAVDESS, SAVEE, TESS, and URDU dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4989
  • Accuracy: 0.9168
  • Precision: 0.9209
  • Recall: 0.9168
  • F1: 0.9166

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 5
  • total_train_batch_size: 10
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 25
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
1.9343 0.9995 394 1.9277 0.2505 0.1425 0.2505 0.1691
1.7944 1.9990 788 1.6446 0.4574 0.5759 0.4574 0.4213
1.4601 2.9985 1182 1.3242 0.5953 0.6183 0.5953 0.5709
1.0551 3.9980 1576 1.0764 0.6623 0.6659 0.6623 0.6447
0.8934 5.0 1971 0.9209 0.7059 0.7172 0.7059 0.6825
1.1156 5.9995 2365 0.8292 0.7465 0.7635 0.7465 0.7442
0.6307 6.9990 2759 0.6439 0.8043 0.8090 0.8043 0.8020
0.774 7.9985 3153 0.6666 0.7921 0.8117 0.7921 0.7916
0.5537 8.9980 3547 0.5111 0.8245 0.8268 0.8245 0.8205
0.3762 10.0 3942 0.5506 0.8306 0.8390 0.8306 0.8296
0.716 10.9995 4336 0.5499 0.8276 0.8465 0.8276 0.8268
0.5372 11.9990 4730 0.5463 0.8377 0.8606 0.8377 0.8404
0.3746 12.9985 5124 0.4758 0.8611 0.8714 0.8611 0.8597
0.4317 13.9980 5518 0.4438 0.8742 0.8843 0.8742 0.8756
0.2104 15.0 5913 0.4426 0.8803 0.8864 0.8803 0.8806
0.3193 15.9995 6307 0.4741 0.8671 0.8751 0.8671 0.8683
0.3445 16.9990 6701 0.3850 0.9037 0.9047 0.9037 0.9038
0.2777 17.9985 7095 0.4802 0.8834 0.8923 0.8834 0.8836
0.4406 18.9980 7489 0.4053 0.9047 0.9096 0.9047 0.9043
0.1707 20.0 7884 0.4434 0.9067 0.9129 0.9067 0.9069
0.2138 20.9995 8278 0.5051 0.9037 0.9155 0.9037 0.9053
0.1812 21.9990 8672 0.4238 0.8955 0.9007 0.8955 0.8953
0.3639 22.9985 9066 0.4021 0.9138 0.9182 0.9138 0.9143
0.3193 23.9980 9460 0.4989 0.9168 0.9209 0.9168 0.9166
0.2067 24.9873 9850 0.4959 0.8976 0.9032 0.8976 0.8975

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
294
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for firdhokk/speech-emotion-recognition-with-facebook-wav2vec2-large-xlsr-53

Finetuned
(206)
this model