File size: 3,145 Bytes
1443405
7982bb5
 
1443405
 
7982bb5
1443405
 
7982bb5
1443405
 
 
7982bb5
1443405
 
 
 
 
7982bb5
 
1443405
 
 
 
 
 
 
 
 
a4e3282
1443405
a4e3282
1443405
a4e3282
1443405
a4e3282
 
1443405
a4e3282
1443405
a4e3282
 
 
 
 
 
 
 
 
1443405
 
 
a4e3282
 
 
 
 
 
 
 
 
 
 
 
1443405
 
 
 
 
a4e3282
 
 
 
 
1443405
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
language:
- pt
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: Whisper Large v2 Portuguese
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: mozilla-foundation/common_voice_11_0 pt
      type: mozilla-foundation/common_voice_11_0
      config: pt
      split: test
      args: pt
    metrics:
    - name: Wer
      type: wer
      value: 5.590020342630419
---

# Whisper Large V2 Portuguese 🇧🇷🇵🇹

Bem-vindo ao **whisper large-v2** para transcrição em português 👋🏻

Transcribe Portuguese audio to text with the highest precision.

- Loss: 0.282
- Wer: 5.590

This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the [mozilla-foundation/common_voice_11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) dataset. If you want a lighter model, you may be interested in [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt). It achieves faster inference with almost no difference in WER.

### Comparable models
Reported **WER** is based on the evaluation subset of Common Voice.
| Model                                            | WER    | # Parameters |
|--------------------------------------------------|:--------:|:------------:|
| [jlondonobo/whisper-large-v2-pt](https://huggingface.co/jlondonobo/whisper-large-v2-pt)                     | **5.590** 🤗  | 1550M       |
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2)                            | 6.300   | 1550M       |
| [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt)                            | 6.579   | 769M       |
| [jonatasgrosman/wav2vec2-large-xlsr-53-portuguese](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-portuguese) | 11.310  | 317M       |
| [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese)    | 20.080 | 317M       |


### Training hyperparameters
We used the following hyperparameters for training:
- `learning_rate`: 1e-05
- `train_batch_size`: 16
- `eval_batch_size`: 8
- `seed`: 42
- `gradient_accumulation_steps`: 2
- `total_train_batch_size`: 32
- `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
- `lr_scheduler_type`: linear
- `lr_scheduler_warmup_steps`: 500
- `training_steps`: 5000
- `mixed_precision_training`: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer    |
|:-------------:|:-----:|:----:|:---------------:|:------:|
| 0.0828        | 1.09  | 1000 | 0.1868          | 6.778 |
| 0.0241        | 3.07  | 2000 | 0.2057          | 6.109 |
| 0.0084        | 5.06  | 3000 | 0.2367          | 6.029 |
| 0.0015        | 7.04  | 4000 | 0.2469          | 5.709 |
| 0.0009        | 9.02  | 5000 | 0.2821          | 5.590 🤗|


### Framework versions

- Transformers 4.26.0.dev0
- Pytorch 1.13.0+cu117
- Datasets 2.7.1.dev0
- Tokenizers 0.13.2