BUT-FIT
/

EBranchRegulaFormer-medium

Automatic Speech Recognition

joint_aed_ctc_speech-encoder-decoder

Model card Files Files and versions Community

Lakoc commited on Jan 22

Commit

5fc43a2

•

1 Parent(s): 38ec527

Update README.md

Files changed (1) hide show

README.md +106 -0

README.md CHANGED Viewed

	@@ -0,0 +1,106 @@

+---
+language:
+- en
+datasets:
+- mozilla-foundation/common_voice_13_0
+- facebook/voxpopuli
+- LIUM/tedlium
+- librispeech_asr
+- fisher_corpus
+- Switchboard-1
+- WSJ-0
+metrics:
+- wer
+pipeline_tag: automatic-speech-recognition
+model-index:
+- name: tbd
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Automatic Speech Recognition
+    dataset:
+      name: LibriSpeech (clean)
+      type: librispeech_asr
+      config: other
+      split: test
+      args:
+        language: en
+    metrics:
+    - type: wer
+      value: 2.5
+      name: Test WER
+    - type: wer
+      value: 6.0
+      name: Test WER
+  - task:
+      type: Automatic Speech Recognition
+      name: automatic-speech-recognition
+    dataset:
+      name: tedlium-v3
+      type: LIUM/tedlium
+      config: release1
+      split: test
+      args:
+        language: en
+    metrics:
+    - type: wer
+      value: 4.5
+      name: Test WER
+  - task:
+      type: automatic-speech-recognition
+      name: Automatic Speech Recognition
+    dataset:
+      name: Vox Populi
+      type: facebook/voxpopuli
+      config: en
+      split: test
+      args:
+        language: en
+    metrics:
+    - type: wer
+      value: 7.2
+      name: Test WER
+  - task:
+      type: Automatic Speech Recognition
+      name: automatic-speech-recognition
+    dataset:
+      name: Mozilla Common Voice 13.0
+      type: mozilla-foundation/common_voice_13_0
+      config: en
+      split: test
+      args:
+        language: en
+    metrics:
+    - type: wer
+      value: 12.9
+      name: Test WER
+---
+# EBranchRegulaFormer
+This is a  **174M encoder-decoder Ebranchformer model** trained with an intermediate regularization technique on 6,000 hours of open-source English data.
+It achieves Word Error Rates (WERs) comparable to `openai/whisper-medium.en` across multiple datasets with just 1/4 of the parameters.
+Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.
+*Disclaimer: The model currently hallucinates on segments containing silence only; joint CTC decoding will be incorporated soon to resolve this issue.*
+The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
+class to transcribe audio files of arbitrary length.
+```python
+from transformers import pipeline
+model_id = "BUT-FIT/EBranchRegulaFormer"
+pipe = pipeline("automatic-speech-recognition",model=model_id,feature_extractor=model_id,trust_remote_code=True)
+# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
+# The warning can be ignored.
+pipe.type = "seq2seq"
+# Greedy decoding generation
+result = pipe("audio.mp3")
+# Beam search decoding with joint CTC-attention scorer
+generation_config = pipe.model.generation_config
+generation_config.ctc_weight = 0.5
+generation_config.num_beams = 5
+generation_config.ctc_margin=0
+result = pipe("audio.mp3", generate_kwargs=generation_config.to_dict())
+```