whisper-large-v2-pt / README.md
jlondonobo's picture
Update README.md
a4e3282
---
language:
- pt
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: Whisper Large v2 Portuguese
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: mozilla-foundation/common_voice_11_0 pt
type: mozilla-foundation/common_voice_11_0
config: pt
split: test
args: pt
metrics:
- name: Wer
type: wer
value: 5.590020342630419
---
# Whisper Large V2 Portuguese 🇧🇷🇵🇹
Bem-vindo ao **whisper large-v2** para transcrição em português 👋🏻
Transcribe Portuguese audio to text with the highest precision.
- Loss: 0.282
- Wer: 5.590
This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the [mozilla-foundation/common_voice_11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) dataset. If you want a lighter model, you may be interested in [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt). It achieves faster inference with almost no difference in WER.
### Comparable models
Reported **WER** is based on the evaluation subset of Common Voice.
| Model | WER | # Parameters |
|--------------------------------------------------|:--------:|:------------:|
| [jlondonobo/whisper-large-v2-pt](https://huggingface.co/jlondonobo/whisper-large-v2-pt) | **5.590** 🤗 | 1550M |
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 6.300 | 1550M |
| [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt) | 6.579 | 769M |
| [jonatasgrosman/wav2vec2-large-xlsr-53-portuguese](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-portuguese) | 11.310 | 317M |
| [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) | 20.080 | 317M |
### Training hyperparameters
We used the following hyperparameters for training:
- `learning_rate`: 1e-05
- `train_batch_size`: 16
- `eval_batch_size`: 8
- `seed`: 42
- `gradient_accumulation_steps`: 2
- `total_train_batch_size`: 32
- `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
- `lr_scheduler_type`: linear
- `lr_scheduler_warmup_steps`: 500
- `training_steps`: 5000
- `mixed_precision_training`: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| 0.0828 | 1.09 | 1000 | 0.1868 | 6.778 |
| 0.0241 | 3.07 | 2000 | 0.2057 | 6.109 |
| 0.0084 | 5.06 | 3000 | 0.2367 | 6.029 |
| 0.0015 | 7.04 | 4000 | 0.2469 | 5.709 |
| 0.0009 | 9.02 | 5000 | 0.2821 | 5.590 🤗|
### Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.7.1.dev0
- Tokenizers 0.13.2