language:
- pt
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: Whisper Medium Portuguese
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: mozilla-foundation/common_voice_11_0 pt
type: mozilla-foundation/common_voice_11_0
config: pt
split: test
args: pt
metrics:
- name: Wer
type: wer
value: 6.5785713084850626
Whisper Medium Portuguese 🇧🇷🇵🇹
Bem-vindo ao whisper medium para transcrição em português 👋🏻
If you are looking to quickly, and reliably, transcribe Portuguese audio to text, you are in the right place!
With a state-of-the-art Word Error Rate (WER) of just 6.579 in Common Voice 11, this model offers an x2 precision increase compared to prior state-of-the-art wav2vec2 models. Compared to the original whisper-medium model it delivers an x1.2 improvement 🚀.
This model is a fine-tuned version of openai/whisper-medium on the mozilla-foundation/common_voice_11 dataset.
The following table displays a comparison between the results of our model and those achieved by the most downloaded models in the hub for Portuguese Automatic Speech Recognition 🗣:
Model | WER | Parameters |
---|---|---|
openai/whisper-medium | 8.100 | 769M |
jlondonobo/whisper-medium-pt | 6.579 🤗 | 769M |
jonatasgrosman/wav2vec2-large-xlsr-53-portuguese | 11.310 | 317M |
Edresson/wav2vec2-large-xlsr-coraa-portuguese | 20.080 | 317M |
Training hyperparameters
We used the following hyperparameters for training:
learning_rate
: 1e-05train_batch_size
: 32eval_batch_size
: 16seed
: 42optimizer
: Adam with betas=(0.9,0.999) and epsilon=1e-08lr_scheduler_type
: linearlr_scheduler_warmup_steps
: 500training_steps
: 5000mixed_precision_training
: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
0.0698 | 1.09 | 1000 | 0.1876 | 7.189 |
0.0218 | 3.07 | 2000 | 0.2254 | 7.110 |
0.0053 | 5.06 | 3000 | 0.2711 | 6.969 |
0.0017 | 7.04 | 4000 | 0.3030 | 6.686 |
0.0005 | 9.02 | 5000 | 0.3205 | 6.579 🤗 |
Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.7.1.dev0
- Tokenizers 0.13.2