Update README.md
Browse files
README.md
CHANGED
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
datasets:
|
5 |
+
- mozilla-foundation/common_voice_13_0
|
6 |
+
- facebook/voxpopuli
|
7 |
+
- LIUM/tedlium
|
8 |
+
- librispeech_asr
|
9 |
+
- fisher_corpus
|
10 |
+
- Switchboard-1
|
11 |
+
- WSJ-0
|
12 |
+
metrics:
|
13 |
+
- wer
|
14 |
+
pipeline_tag: automatic-speech-recognition
|
15 |
+
model-index:
|
16 |
+
- name: tbd
|
17 |
+
results:
|
18 |
+
- task:
|
19 |
+
type: automatic-speech-recognition
|
20 |
+
name: Automatic Speech Recognition
|
21 |
+
dataset:
|
22 |
+
name: LibriSpeech (clean)
|
23 |
+
type: librispeech_asr
|
24 |
+
config: other
|
25 |
+
split: test
|
26 |
+
args:
|
27 |
+
language: en
|
28 |
+
metrics:
|
29 |
+
- type: wer
|
30 |
+
value: 2.5
|
31 |
+
name: Test WER
|
32 |
+
- type: wer
|
33 |
+
value: 6.0
|
34 |
+
name: Test WER
|
35 |
+
- task:
|
36 |
+
type: Automatic Speech Recognition
|
37 |
+
name: automatic-speech-recognition
|
38 |
+
dataset:
|
39 |
+
name: tedlium-v3
|
40 |
+
type: LIUM/tedlium
|
41 |
+
config: release1
|
42 |
+
split: test
|
43 |
+
args:
|
44 |
+
language: en
|
45 |
+
metrics:
|
46 |
+
- type: wer
|
47 |
+
value: 4.5
|
48 |
+
name: Test WER
|
49 |
+
- task:
|
50 |
+
type: automatic-speech-recognition
|
51 |
+
name: Automatic Speech Recognition
|
52 |
+
dataset:
|
53 |
+
name: Vox Populi
|
54 |
+
type: facebook/voxpopuli
|
55 |
+
config: en
|
56 |
+
split: test
|
57 |
+
args:
|
58 |
+
language: en
|
59 |
+
metrics:
|
60 |
+
- type: wer
|
61 |
+
value: 7.2
|
62 |
+
name: Test WER
|
63 |
+
- task:
|
64 |
+
type: Automatic Speech Recognition
|
65 |
+
name: automatic-speech-recognition
|
66 |
+
dataset:
|
67 |
+
name: Mozilla Common Voice 13.0
|
68 |
+
type: mozilla-foundation/common_voice_13_0
|
69 |
+
config: en
|
70 |
+
split: test
|
71 |
+
args:
|
72 |
+
language: en
|
73 |
+
metrics:
|
74 |
+
- type: wer
|
75 |
+
value: 12.9
|
76 |
+
name: Test WER
|
77 |
+
---
|
78 |
+
# EBranchRegulaFormer
|
79 |
+
This is a **174M encoder-decoder Ebranchformer model** trained with an intermediate regularization technique on 6,000 hours of open-source English data.
|
80 |
+
It achieves Word Error Rates (WERs) comparable to `openai/whisper-medium.en` across multiple datasets with just 1/4 of the parameters.
|
81 |
+
|
82 |
+
Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.
|
83 |
+
|
84 |
+
*Disclaimer: The model currently hallucinates on segments containing silence only; joint CTC decoding will be incorporated soon to resolve this issue.*
|
85 |
+
|
86 |
+
The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
|
87 |
+
class to transcribe audio files of arbitrary length.
|
88 |
+
|
89 |
+
```python
|
90 |
+
from transformers import pipeline
|
91 |
+
model_id = "BUT-FIT/EBranchRegulaFormer"
|
92 |
+
pipe = pipeline("automatic-speech-recognition",model=model_id,feature_extractor=model_id,trust_remote_code=True)
|
93 |
+
# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
|
94 |
+
# The warning can be ignored.
|
95 |
+
pipe.type = "seq2seq"
|
96 |
+
|
97 |
+
# Greedy decoding generation
|
98 |
+
result = pipe("audio.mp3")
|
99 |
+
|
100 |
+
# Beam search decoding with joint CTC-attention scorer
|
101 |
+
generation_config = pipe.model.generation_config
|
102 |
+
generation_config.ctc_weight = 0.5
|
103 |
+
generation_config.num_beams = 5
|
104 |
+
generation_config.ctc_margin=0
|
105 |
+
result = pipe("audio.mp3", generate_kwargs=generation_config.to_dict())
|
106 |
+
```
|