license: cc-by-4.0
tags:
- audio
- automatic-speech-recognition
- hf-asr-leaderboard
language: et
model-index:
- name: TalTechNLP/whisper-medium-et
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 11
type: mozilla-foundation/common_voice_11_0
config: et
split: test
metrics:
- name: Test WER
type: wer
value: 14.66
- name: Test CER
type: cer
value: 3.76
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 8
type: mozilla-foundation/common_voice_8_0
config: et
split: test
metrics:
- name: Test WER
type: wer
value: 13.793
- name: Test CER
type: cer
value: 3.194
Whisper-medium-et
This is a Whisper-medium model openai/whisper-medium finetuned on around 800 hours of diverse Estonian data.
Model description
This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech.
Intended uses & limitations
This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
How to use
Use as any other Whisper model via HF transformers, or use a faster decoder like faster-whisper.
Limitations and bias
Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following:
- Speech containing technical and other domain-specific terms
- Children's speech
- Non-native speech
- Speech recorded under very noisy conditions or with a microphone far from the speaker
- Very spontaneous and overlapping speech
Training data
Acoustic training data:
Type | Amount (h) |
---|---|
Broadcast speech | 591 |
Spontaneous speech | 53 |
Elderly speech corpus | 53 |
Talks, lectures | 49 |
Parliament speeches | 31 |
Total | 761 |
Training procedure
Finetuned using Espnet, and then comverted to transformers format using this script. Finetuning procedure is similar to this model.
Evaluation results
WER
WER results below are obtained using greedy decoding (i.e., beam size 1).
Dataset | WER |
---|---|
Common Voice 8.0 | 13.8 |
Common Voice 11.0 | 14.7 |