wav2vec2-base-960h-librispeech-model
This model is a fine-tuned version of facebook/wav2vec2-base-960h on the LIBRI10H - ENG dataset. It achieves the following results on the evaluation set:
- Loss: 1.2499
- Wer: 0.8936
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 100.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
8.6563 | 1.1565 | 200 | 7.9045 | 1.0 |
4.1521 | 2.3130 | 400 | 2.9653 | 1.0 |
2.8915 | 3.4696 | 600 | 2.9149 | 1.0 |
2.8689 | 4.6261 | 800 | 2.9028 | 1.0 |
2.8582 | 5.7826 | 1000 | 2.8968 | 1.0 |
2.8507 | 6.9391 | 1200 | 2.8890 | 1.0 |
2.8389 | 8.0928 | 1400 | 2.8819 | 1.0 |
2.8422 | 9.2493 | 1600 | 2.8790 | 1.0 |
2.8379 | 10.4058 | 1800 | 2.8765 | 1.0 |
2.836 | 11.5623 | 2000 | 2.8713 | 1.0 |
2.8344 | 12.7188 | 2200 | 2.8699 | 1.0 |
2.8305 | 13.8754 | 2400 | 2.8661 | 1.0 |
2.8205 | 15.0290 | 2600 | 2.8601 | 1.0 |
2.8159 | 16.1855 | 2800 | 2.8347 | 1.0 |
2.7875 | 17.3420 | 3000 | 2.7791 | 1.0 |
2.7341 | 18.4986 | 3200 | 2.6825 | 1.0 |
2.6461 | 19.6551 | 3400 | 2.5673 | 1.0 |
2.56 | 20.8116 | 3600 | 2.4579 | 0.9998 |
2.4669 | 21.9681 | 3800 | 2.3507 | 0.9994 |
2.3753 | 23.1217 | 4000 | 2.2474 | 0.9984 |
2.2962 | 24.2783 | 4200 | 2.1507 | 0.9972 |
2.2141 | 25.4348 | 4400 | 2.0632 | 0.9955 |
2.1469 | 26.5913 | 4600 | 1.9897 | 0.9934 |
2.0822 | 27.7478 | 4800 | 1.9277 | 0.9907 |
2.0331 | 28.9043 | 5000 | 1.8730 | 0.9870 |
1.9848 | 30.0580 | 5200 | 1.8289 | 0.9847 |
1.9489 | 31.2145 | 5400 | 1.7907 | 0.9818 |
1.9186 | 32.3710 | 5600 | 1.7529 | 0.9777 |
1.8885 | 33.5275 | 5800 | 1.7237 | 0.9749 |
1.8608 | 34.6841 | 6000 | 1.6964 | 0.9739 |
1.8355 | 35.8406 | 6200 | 1.6691 | 0.9663 |
1.8182 | 36.9971 | 6400 | 1.6461 | 0.9681 |
1.7877 | 38.1507 | 6600 | 1.6199 | 0.9618 |
1.7735 | 39.3072 | 6800 | 1.6006 | 0.9566 |
1.7571 | 40.4638 | 7000 | 1.5786 | 0.9561 |
1.7405 | 41.6203 | 7200 | 1.5609 | 0.9535 |
1.7215 | 42.7768 | 7400 | 1.5436 | 0.9506 |
1.7062 | 43.9333 | 7600 | 1.5301 | 0.9506 |
1.6917 | 45.0870 | 7800 | 1.5141 | 0.9458 |
1.6826 | 46.2435 | 8000 | 1.5032 | 0.9476 |
1.6664 | 47.4 | 8200 | 1.4850 | 0.9415 |
1.6569 | 48.5565 | 8400 | 1.4750 | 0.9376 |
1.6457 | 49.7130 | 8600 | 1.4610 | 0.9405 |
1.6359 | 50.8696 | 8800 | 1.4494 | 0.9343 |
1.6234 | 52.0232 | 9000 | 1.4389 | 0.9337 |
1.6108 | 53.1797 | 9200 | 1.4274 | 0.9310 |
1.6041 | 54.3362 | 9400 | 1.4188 | 0.9311 |
1.597 | 55.4928 | 9600 | 1.4083 | 0.9294 |
1.587 | 56.6493 | 9800 | 1.3982 | 0.9260 |
1.581 | 57.8058 | 10000 | 1.3917 | 0.9253 |
1.5649 | 58.9623 | 10200 | 1.3831 | 0.9266 |
1.5607 | 60.1159 | 10400 | 1.3737 | 0.9226 |
1.5536 | 61.2725 | 10600 | 1.3670 | 0.9227 |
1.5449 | 62.4290 | 10800 | 1.3577 | 0.9195 |
1.5404 | 63.5855 | 11000 | 1.3498 | 0.9182 |
1.5349 | 64.7420 | 11200 | 1.3442 | 0.9181 |
1.5238 | 65.8986 | 11400 | 1.3374 | 0.9152 |
1.5167 | 67.0522 | 11600 | 1.3306 | 0.9129 |
1.5123 | 68.2087 | 11800 | 1.3246 | 0.9135 |
1.513 | 69.3652 | 12000 | 1.3189 | 0.9113 |
1.5031 | 70.5217 | 12200 | 1.3138 | 0.9106 |
1.4965 | 71.6783 | 12400 | 1.3086 | 0.9084 |
1.4917 | 72.8348 | 12600 | 1.3032 | 0.9072 |
1.4885 | 73.9913 | 12800 | 1.2989 | 0.9077 |
1.4792 | 75.1449 | 13000 | 1.2940 | 0.9055 |
1.4852 | 76.3014 | 13200 | 1.2907 | 0.9035 |
1.4719 | 77.4580 | 13400 | 1.2868 | 0.9037 |
1.4716 | 78.6145 | 13600 | 1.2835 | 0.9026 |
1.471 | 79.7710 | 13800 | 1.2787 | 0.9016 |
1.4627 | 80.9275 | 14000 | 1.2749 | 0.9005 |
1.4613 | 82.0812 | 14200 | 1.2721 | 0.8990 |
1.4559 | 83.2377 | 14400 | 1.2703 | 0.9002 |
1.4562 | 84.3942 | 14600 | 1.2656 | 0.8974 |
1.4544 | 85.5507 | 14800 | 1.2649 | 0.8977 |
1.4489 | 86.7072 | 15000 | 1.2631 | 0.8977 |
1.4468 | 87.8638 | 15200 | 1.2600 | 0.8961 |
1.445 | 89.0174 | 15400 | 1.2579 | 0.8954 |
1.444 | 90.1739 | 15600 | 1.2559 | 0.8947 |
1.4433 | 91.3304 | 15800 | 1.2541 | 0.8950 |
1.4417 | 92.4870 | 16000 | 1.2534 | 0.8946 |
1.4458 | 93.6435 | 16200 | 1.2519 | 0.8938 |
1.441 | 94.8 | 16400 | 1.2516 | 0.8939 |
1.4404 | 95.9565 | 16600 | 1.2513 | 0.8942 |
1.4354 | 97.1101 | 16800 | 1.2504 | 0.8939 |
1.4386 | 98.2667 | 17000 | 1.2503 | 0.8942 |
1.4383 | 99.4232 | 17200 | 1.2498 | 0.8937 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.6.0+cu124
- Datasets 3.3.2
- Tokenizers 0.21.0
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for csikasote/wav2vec2-base-960h-librispeech-model
Base model
facebook/wav2vec2-base-960h