File size: 2,322 Bytes
b14e0b6
 
 
 
 
 
 
 
 
 
 
 
d589a60
b14e0b6
 
 
 
 
 
 
 
 
 
 
1f76e35
b14e0b6
 
1f76e35
b14e0b6
 
1f76e35
b14e0b6
 
1f76e35
6bfaa2c
 
 
 
 
 
309c461
6bfaa2c
6233918
6bfaa2c
 
6233918
6bfaa2c
 
6233918
6bfaa2c
 
6233918
6bfaa2c
 
b14e0b6
 
 
 
1f76e35
 
b14e0b6
1f76e35
 
 
b14e0b6
 
 
 
 
 
98ec9ac
b14e0b6
 
 
 
 
98ec9ac
b14e0b6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
language:
- pt
license: apache-2.0
tags:
- automatic-speech-recognition
- mozilla-foundation/common_voice_8_0
- pt
- robust-speech-event
datasets:
- mozilla-foundation/common_voice_8_0
model-index:
- name: XLS-R Wav2Vec2 Portuguese by Jonatas Grosman
  results:
  - task: 
      name: Automatic Speech Recognition 
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 8
      type: mozilla-foundation/common_voice_8_0
      args: pt
    metrics:
       - name: Test WER
         type: wer
         value: 8.70
       - name: Test CER
         type: cer
         value: 2.55
       - name: Test WER (+LM)
         type: wer
         value: 6.04
       - name: Test CER (+LM)
         type: cer
         value: 1.98
  - task: 
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Robust Speech Event - Dev Data
      type: speech-recognition-community-v2/dev_data
      args: pt
    metrics:
       - name: Dev WER
         type: wer
         value: 24.23
       - name: Dev CER
         type: cer
         value: 11.30
       - name: Dev WER (+LM)
         type: wer
         value: 19.41
       - name: Dev CER (+LM)
         type: cer
         value: 10.19
---

# XLS-R-1B-PORTUGUESE

Fine-tuned [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on Portuguese using the [Common Voice 8](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0).
When using this model, make sure that your speech input is sampled at 16kHz.

This model has been fine-tuned thanks to the GPU credits generously given by the [OVHcloud](https://www.ovhcloud.com/en/public-cloud/ai-training/) :)

The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint

## Evaluation Commands

1. To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`

```bash
python eval.py --model_id jonatasgrosman/wav2vec2-xls-r-1b-portuguese --dataset mozilla-foundation/common_voice_8_0 --config pt --split test
```

2. To evaluate on `speech-recognition-community-v2/dev_data`

```bash
python eval.py --model_id jonatasgrosman/wav2vec2-xls-r-1b-portuguese --dataset speech-recognition-community-v2/dev_data --config pt --split validation --chunk_length_s 5.0 --stride_length_s 1.0
```