whisper-medium-myanmar

This model is a fine-tuned version of openai/whisper-medium on the chuuhtetnaing/myanmar-speech-dataset-openslr-80 dataset. It achieves the following results on the evaluation set:

Loss: 0.2282
Wer: 49.4657

Usage

from datasets import Audio, load_dataset
from transformers import pipeline

# Load a sample audio
dataset = load_dataset("chuuhtetnaing/myanmar-speech-dataset-openslr-80")
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
test_dataset = dataset['test']
input_speech = test_dataset[42]['audio']

pipe = pipeline(model='chuuhtetnaing/whisper-medium-myanmar')

output = pipe(input_speech, generate_kwargs={"language": "myanmar", "task": "transcribe"})
print(output['text']) # ကျမ ပြည်ပ မှာ ပညာသင် တော့ စာမေးပွဲ ကို တပတ်တခါ စစ်တယ်

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 40
eval_batch_size: 40
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 200
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.8546	1.0	57	0.5703	98.0855
0.2643	2.0	114	0.2404	84.9510
0.1982	3.0	171	0.1889	71.6385
0.1608	4.0	228	0.1781	68.4773
0.1212	5.0	285	0.1511	63.7133
0.1067	6.0	342	0.1427	60.2404
0.0682	7.0	399	0.1330	59.3500
0.0413	8.0	456	0.1322	56.9902
0.0249	9.0	513	0.1271	55.6545
0.0158	10.0	570	0.1430	54.8085
0.0124	11.0	627	0.1486	55.0312
0.0099	12.0	684	0.1550	53.7845
0.0082	13.0	741	0.1486	55.1647
0.0057	14.0	798	0.1747	53.6955
0.0041	15.0	855	0.1608	53.3393
0.0029	16.0	912	0.1596	50.6233
0.0013	17.0	969	0.1798	51.2912
0.0005	18.0	1026	0.1796	50.3562
0.0006	19.0	1083	0.1799	50.0890
0.0	20.0	1140	0.1849	50.2671
0.0001	21.0	1197	0.1878	50.0445
0.0	22.0	1254	0.1907	50.1781
0.0	23.0	1311	0.1929	50.0890
0.0	24.0	1368	0.1942	49.8664
0.0	25.0	1425	0.2019	50.0445
0.0	26.0	1482	0.2068	49.9555
0.0	27.0	1539	0.2103	50.0
0.0	28.0	1596	0.2129	49.9555
0.0	29.0	1653	0.2150	50.0
0.0	30.0	1710	0.2168	49.9555
0.0	31.0	1767	0.2183	49.9555
0.0	32.0	1824	0.2196	49.8664
0.0	33.0	1881	0.2208	49.6438
0.0	34.0	1938	0.2218	49.7329
0.0	35.0	1995	0.2227	49.5993
0.0	36.0	2052	0.2234	49.5548
0.0	37.0	2109	0.2242	49.5548
0.0	38.0	2166	0.2248	49.5102
0.0	39.0	2223	0.2253	49.5548
0.0	40.0	2280	0.2259	49.5548
0.0	41.0	2337	0.2263	49.5548
0.0	42.0	2394	0.2267	49.4657
0.0	43.0	2451	0.2271	49.5102
0.0	44.0	2508	0.2274	49.5102
0.0	45.0	2565	0.2276	49.4657
0.0	46.0	2622	0.2278	49.4657
0.0	47.0	2679	0.2280	49.5548
0.0	48.0	2736	0.2281	49.5102
0.0	49.0	2793	0.2282	49.5102
0.0	50.0	2850	0.2282	49.4657

Framework versions

Transformers 4.35.2
Pytorch 2.1.1+cu121
Datasets 2.14.5
Tokenizers 0.15.1

chuuhtetnaing
/

whisper-medium-myanmar

whisper-medium-myanmar

Usage

Training hyperparameters

Training results

Framework versions

Model tree for chuuhtetnaing/whisper-medium-myanmar

Dataset used to train chuuhtetnaing/whisper-medium-myanmar

Collection including chuuhtetnaing/whisper-medium-myanmar

Whisper Myanmar

Evaluation results