wav2vec2-base-960h-librispeech-model

This model is a fine-tuned version of facebook/wav2vec2-base-960h on the LIBRI10H - ENG dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2499
  • Wer: 0.8936

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 100.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
8.6563 1.1565 200 7.9045 1.0
4.1521 2.3130 400 2.9653 1.0
2.8915 3.4696 600 2.9149 1.0
2.8689 4.6261 800 2.9028 1.0
2.8582 5.7826 1000 2.8968 1.0
2.8507 6.9391 1200 2.8890 1.0
2.8389 8.0928 1400 2.8819 1.0
2.8422 9.2493 1600 2.8790 1.0
2.8379 10.4058 1800 2.8765 1.0
2.836 11.5623 2000 2.8713 1.0
2.8344 12.7188 2200 2.8699 1.0
2.8305 13.8754 2400 2.8661 1.0
2.8205 15.0290 2600 2.8601 1.0
2.8159 16.1855 2800 2.8347 1.0
2.7875 17.3420 3000 2.7791 1.0
2.7341 18.4986 3200 2.6825 1.0
2.6461 19.6551 3400 2.5673 1.0
2.56 20.8116 3600 2.4579 0.9998
2.4669 21.9681 3800 2.3507 0.9994
2.3753 23.1217 4000 2.2474 0.9984
2.2962 24.2783 4200 2.1507 0.9972
2.2141 25.4348 4400 2.0632 0.9955
2.1469 26.5913 4600 1.9897 0.9934
2.0822 27.7478 4800 1.9277 0.9907
2.0331 28.9043 5000 1.8730 0.9870
1.9848 30.0580 5200 1.8289 0.9847
1.9489 31.2145 5400 1.7907 0.9818
1.9186 32.3710 5600 1.7529 0.9777
1.8885 33.5275 5800 1.7237 0.9749
1.8608 34.6841 6000 1.6964 0.9739
1.8355 35.8406 6200 1.6691 0.9663
1.8182 36.9971 6400 1.6461 0.9681
1.7877 38.1507 6600 1.6199 0.9618
1.7735 39.3072 6800 1.6006 0.9566
1.7571 40.4638 7000 1.5786 0.9561
1.7405 41.6203 7200 1.5609 0.9535
1.7215 42.7768 7400 1.5436 0.9506
1.7062 43.9333 7600 1.5301 0.9506
1.6917 45.0870 7800 1.5141 0.9458
1.6826 46.2435 8000 1.5032 0.9476
1.6664 47.4 8200 1.4850 0.9415
1.6569 48.5565 8400 1.4750 0.9376
1.6457 49.7130 8600 1.4610 0.9405
1.6359 50.8696 8800 1.4494 0.9343
1.6234 52.0232 9000 1.4389 0.9337
1.6108 53.1797 9200 1.4274 0.9310
1.6041 54.3362 9400 1.4188 0.9311
1.597 55.4928 9600 1.4083 0.9294
1.587 56.6493 9800 1.3982 0.9260
1.581 57.8058 10000 1.3917 0.9253
1.5649 58.9623 10200 1.3831 0.9266
1.5607 60.1159 10400 1.3737 0.9226
1.5536 61.2725 10600 1.3670 0.9227
1.5449 62.4290 10800 1.3577 0.9195
1.5404 63.5855 11000 1.3498 0.9182
1.5349 64.7420 11200 1.3442 0.9181
1.5238 65.8986 11400 1.3374 0.9152
1.5167 67.0522 11600 1.3306 0.9129
1.5123 68.2087 11800 1.3246 0.9135
1.513 69.3652 12000 1.3189 0.9113
1.5031 70.5217 12200 1.3138 0.9106
1.4965 71.6783 12400 1.3086 0.9084
1.4917 72.8348 12600 1.3032 0.9072
1.4885 73.9913 12800 1.2989 0.9077
1.4792 75.1449 13000 1.2940 0.9055
1.4852 76.3014 13200 1.2907 0.9035
1.4719 77.4580 13400 1.2868 0.9037
1.4716 78.6145 13600 1.2835 0.9026
1.471 79.7710 13800 1.2787 0.9016
1.4627 80.9275 14000 1.2749 0.9005
1.4613 82.0812 14200 1.2721 0.8990
1.4559 83.2377 14400 1.2703 0.9002
1.4562 84.3942 14600 1.2656 0.8974
1.4544 85.5507 14800 1.2649 0.8977
1.4489 86.7072 15000 1.2631 0.8977
1.4468 87.8638 15200 1.2600 0.8961
1.445 89.0174 15400 1.2579 0.8954
1.444 90.1739 15600 1.2559 0.8947
1.4433 91.3304 15800 1.2541 0.8950
1.4417 92.4870 16000 1.2534 0.8946
1.4458 93.6435 16200 1.2519 0.8938
1.441 94.8 16400 1.2516 0.8939
1.4404 95.9565 16600 1.2513 0.8942
1.4354 97.1101 16800 1.2504 0.8939
1.4386 98.2667 17000 1.2503 0.8942
1.4383 99.4232 17200 1.2498 0.8937

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
3
Safetensors
Model size
104M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for csikasote/wav2vec2-base-960h-librispeech-model

Finetuned
(128)
this model