Whisper base uz
This model is a fine-tuned version of openai/whisper-base based on the Common Voice dataset. It achieves the following results on the evaluation set:
- Loss: 0.1052
- Wer: 10.5982
Working for test audios
Model description
The jamshidahmadov/whisper-uz is a fine-tuned version of OpenAI's Whisper model, specifically optimized for Uzbek speech-to-text (STT) tasks. The model converts spoken Uzbek language into written text, making it useful for a variety of speech recognition applications, such as transcription, voice commands, and speech analytics. It performs well on audio recordings and can transcribe both clean and noisy speech, with a special focus on the unique phonetics and nuances of the Uzbek language.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 2000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.1303 | 0.5714 | 500 | 0.1232 | 12.7454 |
0.0664 | 1.1429 | 1000 | 0.1115 | 11.2883 |
0.0742 | 1.7143 | 1500 | 0.1074 | 10.9356 |
0.0383 | 2.2857 | 2000 | 0.1052 | 10.5982 |
Framework versions
- Transformers 4.47.0
- Pytorch 2.4.0
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 749
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for jamshidahmadov/whisper-uz
Base model
openai/whisper-base