Whisper Small GA-EN Speech Translation

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, SpokenWords, Tatoeba, and Wikimedia dataset. It achieves the following results on the evaluation set:

Loss: 1.2028
Bleu: 33.77
Chrf: 52.79
Wer: 60.8285

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Chrf	Wer
2.4145	0.0109	100	2.1019	2.32	16.08	170.5088
2.6073	0.0219	200	2.0370	5.94	23.77	158.8924
2.593	0.0328	300	1.8529	3.67	21.53	238.6312
2.3123	0.0438	400	1.8500	9.01	28.13	135.0293
2.3347	0.0547	500	1.7816	15.05	31.9	90.7249
2.1277	0.0657	600	1.6916	14.24	32.29	88.4286
2.1836	0.0766	700	1.6517	12.15	32.7	128.1405
2.0112	0.0876	800	1.6275	19.76	38.15	79.2886
1.8387	0.0985	900	1.6349	17.26	38.82	91.0851
1.8335	0.1095	1000	1.5843	20.93	38.02	75.9118
1.7849	0.1204	1100	1.5863	15.98	37.5	96.5781
1.5698	0.1314	1200	1.5371	16.42	39.07	103.6020
1.4759	0.1423	1300	1.5250	18.56	38.41	96.5781
1.4915	0.1533	1400	1.4862	22.05	40.15	75.1013
1.6583	0.1642	1500	1.4727	18.11	39.65	95.7677
1.3981	0.1752	1600	1.4367	27.31	44.5	66.0513
1.2646	0.1861	1700	1.4574	22.85	42.19	74.4710
1.2172	0.1970	1800	1.3818	20.77	42.5	82.7105
1.183	0.2080	1900	1.4380	22.75	41.28	76.7672
1.1931	0.2189	2000	1.3917	23.58	41.13	77.3075
1.172	0.2299	2100	1.3892	24.58	44.4	74.3809
1.0284	0.2408	2200	1.3806	23.34	44.1	78.0279
0.8507	0.2518	2300	1.3210	28.67	46.79	67.1769
0.9615	0.2627	2400	1.3103	27.95	46.8	70.0135
0.8049	0.2737	2500	1.3141	29.92	48.9	67.2220
0.7639	0.2846	2600	1.3085	30.91	49.05	64.2053
0.8594	0.2956	2700	1.3378	27.8	47.84	68.8879
0.7482	0.3065	2800	1.2978	30.6	48.62	64.9257
0.6941	0.3175	2900	1.3060	29.92	47.92	65.8712
0.7282	0.3284	3000	1.2959	31.09	48.13	65.3309
0.6298	0.3394	3100	1.2893	29.76	48.8	67.1769
0.619	0.3503	3200	1.2388	32.61	50.27	62.0891
0.6252	0.3612	3300	1.2550	32.71	50.96	62.4493
0.4699	0.3722	3400	1.2463	32.02	51.24	65.2409
0.5121	0.3831	3500	1.2214	32.26	51.29	63.7551
0.5092	0.3941	3600	1.2182	32.88	51.59	62.0891
0.4365	0.4050	3700	1.2049	32.16	51.5	62.3143
0.2971	0.4160	3800	1.2201	34.45	52.78	59.7479
0.389	0.4269	3900	1.2007	33.86	53.28	60.6033
0.3879	0.4379	4000	1.2028	33.77	52.79	60.8285

Framework versions

Transformers 4.41.2
Pytorch 2.2.0+cu121
Datasets 2.19.2
Tokenizers 0.19.1

ymoslem
/

whisper-medium-ga2en-v5.3.1-r

Whisper Small GA-EN Speech Translation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Datasets used to train ymoslem/whisper-medium-ga2en-v5.3.1-r

Evaluation results

Whisper Small GA-EN Speech Translation

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from openai/whisper-medium

Datasets used to train ymoslem/whisper-medium-ga2en-v5.3.1-r

Evaluation results

Finetuned from