Whisper base uz

This model is a fine-tuned version of openai/whisper-base based on the Common Voice dataset. It achieves the following results on the evaluation set:

Loss: 0.1052
Wer: 10.5982

Working for test audios

Model description

The jamshidahmadov/whisper-uz is a fine-tuned version of OpenAI's Whisper model, specifically optimized for Uzbek speech-to-text (STT) tasks. The model converts spoken Uzbek language into written text, making it useful for a variety of speech recognition applications, such as transcription, voice commands, and speech analytics. It performs well on audio recordings and can transcribe both clean and noisy speech, with a special focus on the unique phonetics and nuances of the Uzbek language.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 2000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.1303	0.5714	500	0.1232	12.7454
0.0664	1.1429	1000	0.1115	11.2883
0.0742	1.7143	1500	0.1074	10.9356
0.0383	2.2857	2000	0.1052	10.5982

Framework versions

Transformers 4.47.0
Pytorch 2.4.0
Datasets 3.2.0
Tokenizers 0.21.0

jamshidahmadov
/

whisper-uz

Whisper base uz

Working for test audios

Model description

Training hyperparameters

Training results

Framework versions

Model tree for jamshidahmadov/whisper-uz

Dataset used to train jamshidahmadov/whisper-uz

Evaluation results