wav2vec2-base-960h-librispeech-model

This model is a fine-tuned version of facebook/wav2vec2-base-960h on the LIBRI10H - ENG dataset. It achieves the following results on the evaluation set:

Loss: 1.2499
Wer: 0.8936

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
8.6563	1.1565	200	7.9045	1.0
4.1521	2.3130	400	2.9653	1.0
2.8915	3.4696	600	2.9149	1.0
2.8689	4.6261	800	2.9028	1.0
2.8582	5.7826	1000	2.8968	1.0
2.8507	6.9391	1200	2.8890	1.0
2.8389	8.0928	1400	2.8819	1.0
2.8422	9.2493	1600	2.8790	1.0
2.8379	10.4058	1800	2.8765	1.0
2.836	11.5623	2000	2.8713	1.0
2.8344	12.7188	2200	2.8699	1.0
2.8305	13.8754	2400	2.8661	1.0
2.8205	15.0290	2600	2.8601	1.0
2.8159	16.1855	2800	2.8347	1.0
2.7875	17.3420	3000	2.7791	1.0
2.7341	18.4986	3200	2.6825	1.0
2.6461	19.6551	3400	2.5673	1.0
2.56	20.8116	3600	2.4579	0.9998
2.4669	21.9681	3800	2.3507	0.9994
2.3753	23.1217	4000	2.2474	0.9984
2.2962	24.2783	4200	2.1507	0.9972
2.2141	25.4348	4400	2.0632	0.9955
2.1469	26.5913	4600	1.9897	0.9934
2.0822	27.7478	4800	1.9277	0.9907
2.0331	28.9043	5000	1.8730	0.9870
1.9848	30.0580	5200	1.8289	0.9847
1.9489	31.2145	5400	1.7907	0.9818
1.9186	32.3710	5600	1.7529	0.9777
1.8885	33.5275	5800	1.7237	0.9749
1.8608	34.6841	6000	1.6964	0.9739
1.8355	35.8406	6200	1.6691	0.9663
1.8182	36.9971	6400	1.6461	0.9681
1.7877	38.1507	6600	1.6199	0.9618
1.7735	39.3072	6800	1.6006	0.9566
1.7571	40.4638	7000	1.5786	0.9561
1.7405	41.6203	7200	1.5609	0.9535
1.7215	42.7768	7400	1.5436	0.9506
1.7062	43.9333	7600	1.5301	0.9506
1.6917	45.0870	7800	1.5141	0.9458
1.6826	46.2435	8000	1.5032	0.9476
1.6664	47.4	8200	1.4850	0.9415
1.6569	48.5565	8400	1.4750	0.9376
1.6457	49.7130	8600	1.4610	0.9405
1.6359	50.8696	8800	1.4494	0.9343
1.6234	52.0232	9000	1.4389	0.9337
1.6108	53.1797	9200	1.4274	0.9310
1.6041	54.3362	9400	1.4188	0.9311
1.597	55.4928	9600	1.4083	0.9294
1.587	56.6493	9800	1.3982	0.9260
1.581	57.8058	10000	1.3917	0.9253
1.5649	58.9623	10200	1.3831	0.9266
1.5607	60.1159	10400	1.3737	0.9226
1.5536	61.2725	10600	1.3670	0.9227
1.5449	62.4290	10800	1.3577	0.9195
1.5404	63.5855	11000	1.3498	0.9182
1.5349	64.7420	11200	1.3442	0.9181
1.5238	65.8986	11400	1.3374	0.9152
1.5167	67.0522	11600	1.3306	0.9129
1.5123	68.2087	11800	1.3246	0.9135
1.513	69.3652	12000	1.3189	0.9113
1.5031	70.5217	12200	1.3138	0.9106
1.4965	71.6783	12400	1.3086	0.9084
1.4917	72.8348	12600	1.3032	0.9072
1.4885	73.9913	12800	1.2989	0.9077
1.4792	75.1449	13000	1.2940	0.9055
1.4852	76.3014	13200	1.2907	0.9035
1.4719	77.4580	13400	1.2868	0.9037
1.4716	78.6145	13600	1.2835	0.9026
1.471	79.7710	13800	1.2787	0.9016
1.4627	80.9275	14000	1.2749	0.9005
1.4613	82.0812	14200	1.2721	0.8990
1.4559	83.2377	14400	1.2703	0.9002
1.4562	84.3942	14600	1.2656	0.8974
1.4544	85.5507	14800	1.2649	0.8977
1.4489	86.7072	15000	1.2631	0.8977
1.4468	87.8638	15200	1.2600	0.8961
1.445	89.0174	15400	1.2579	0.8954
1.444	90.1739	15600	1.2559	0.8947
1.4433	91.3304	15800	1.2541	0.8950
1.4417	92.4870	16000	1.2534	0.8946
1.4458	93.6435	16200	1.2519	0.8938
1.441	94.8	16400	1.2516	0.8939
1.4404	95.9565	16600	1.2513	0.8942
1.4354	97.1101	16800	1.2504	0.8939
1.4386	98.2667	17000	1.2503	0.8942
1.4383	99.4232	17200	1.2498	0.8937

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.3.2
Tokenizers 0.21.0

csikasote
/

wav2vec2-base-960h-librispeech-model

wav2vec2-base-960h-librispeech-model

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for csikasote/wav2vec2-base-960h-librispeech-model

Evaluation results