Lakoc commited on
Commit
5fc43a2
1 Parent(s): 38ec527

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ datasets:
5
+ - mozilla-foundation/common_voice_13_0
6
+ - facebook/voxpopuli
7
+ - LIUM/tedlium
8
+ - librispeech_asr
9
+ - fisher_corpus
10
+ - Switchboard-1
11
+ - WSJ-0
12
+ metrics:
13
+ - wer
14
+ pipeline_tag: automatic-speech-recognition
15
+ model-index:
16
+ - name: tbd
17
+ results:
18
+ - task:
19
+ type: automatic-speech-recognition
20
+ name: Automatic Speech Recognition
21
+ dataset:
22
+ name: LibriSpeech (clean)
23
+ type: librispeech_asr
24
+ config: other
25
+ split: test
26
+ args:
27
+ language: en
28
+ metrics:
29
+ - type: wer
30
+ value: 2.5
31
+ name: Test WER
32
+ - type: wer
33
+ value: 6.0
34
+ name: Test WER
35
+ - task:
36
+ type: Automatic Speech Recognition
37
+ name: automatic-speech-recognition
38
+ dataset:
39
+ name: tedlium-v3
40
+ type: LIUM/tedlium
41
+ config: release1
42
+ split: test
43
+ args:
44
+ language: en
45
+ metrics:
46
+ - type: wer
47
+ value: 4.5
48
+ name: Test WER
49
+ - task:
50
+ type: automatic-speech-recognition
51
+ name: Automatic Speech Recognition
52
+ dataset:
53
+ name: Vox Populi
54
+ type: facebook/voxpopuli
55
+ config: en
56
+ split: test
57
+ args:
58
+ language: en
59
+ metrics:
60
+ - type: wer
61
+ value: 7.2
62
+ name: Test WER
63
+ - task:
64
+ type: Automatic Speech Recognition
65
+ name: automatic-speech-recognition
66
+ dataset:
67
+ name: Mozilla Common Voice 13.0
68
+ type: mozilla-foundation/common_voice_13_0
69
+ config: en
70
+ split: test
71
+ args:
72
+ language: en
73
+ metrics:
74
+ - type: wer
75
+ value: 12.9
76
+ name: Test WER
77
+ ---
78
+ # EBranchRegulaFormer
79
+ This is a **174M encoder-decoder Ebranchformer model** trained with an intermediate regularization technique on 6,000 hours of open-source English data.
80
+ It achieves Word Error Rates (WERs) comparable to `openai/whisper-medium.en` across multiple datasets with just 1/4 of the parameters.
81
+
82
+ Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.
83
+
84
+ *Disclaimer: The model currently hallucinates on segments containing silence only; joint CTC decoding will be incorporated soon to resolve this issue.*
85
+
86
+ The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
87
+ class to transcribe audio files of arbitrary length.
88
+
89
+ ```python
90
+ from transformers import pipeline
91
+ model_id = "BUT-FIT/EBranchRegulaFormer"
92
+ pipe = pipeline("automatic-speech-recognition",model=model_id,feature_extractor=model_id,trust_remote_code=True)
93
+ # In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
94
+ # The warning can be ignored.
95
+ pipe.type = "seq2seq"
96
+
97
+ # Greedy decoding generation
98
+ result = pipe("audio.mp3")
99
+
100
+ # Beam search decoding with joint CTC-attention scorer
101
+ generation_config = pipe.model.generation_config
102
+ generation_config.ctc_weight = 0.5
103
+ generation_config.num_beams = 5
104
+ generation_config.ctc_margin=0
105
+ result = pipe("audio.mp3", generate_kwargs=generation_config.to_dict())
106
+ ```