BUT-FIT
/

ED-small

Automatic Speech Recognition

joint_aed_ctc_speech-encoder-decoder

Model card Files Files and versions Community

ED-small / README.md

Lakoc's picture

Update README.md

ff631e7 verified 7 months ago

|

history blame contribute delete

3.81 kB

	---
	language:
	- en
	datasets:
	- mozilla-foundation/common_voice_13_0
	- facebook/voxpopuli
	- LIUM/tedlium
	- librispeech_asr
	- fisher_corpus
	- WSJ-0
	metrics:
	- wer
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: tbd
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: LibriSpeech (clean)
	type: librispeech_asr
	config: clean
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 3.4
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: LibriSpeech (other)
	type: librispeech_asr
	config: other
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 7.7
	name: Test WER
	- task:
	type: Automatic Speech Recognition
	name: automatic-speech-recognition
	dataset:
	name: tedlium-v3
	type: LIUM/tedlium
	config: release1
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 5.5
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Vox Populi
	type: facebook/voxpopuli
	config: en
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 8.3
	name: Test WER
	- task:
	type: Automatic Speech Recognition
	name: automatic-speech-recognition
	dataset:
	name: Mozilla Common Voice 13.0
	type: mozilla-foundation/common_voice_13_0
	config: en
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 16.1
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: FLEURS
	type: google/fleurs
	split: test
	args:
	language: en_us
	metrics:
	- type: wer
	value: 9.9
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Switchboard
	type: unk
	split: eval2000
	args:
	language: en
	metrics:
	- type: wer
	value: 12.5
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Wall Street Journal
	type: unk
	split: eval92
	args:
	language: en
	metrics:
	- type: wer
	value: 2.4
	name: Test WER
	---
	# DeCRED-base
	This is a 39M encoder-decoder Ebranchformer model trained on 6,000 hours of open-source normalised English data.

	Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.

	Disclaimer: The model currently hallucinates on segments containing silence only, as it was previously not trained on such data. The fix will be added soon.

	The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
	class to transcribe audio files of arbitrary length.

	```python
	from transformers import pipeline

	model_id = "BUT-FIT/ED-small"
	pipe = pipeline("automatic-speech-recognition", model=model_id, feature_extractor=model_id, trust_remote_code=True)
	# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
	# The warning can be ignored.
	pipe.type = "seq2seq"

	# Run beam search decoding with joint CTC-attention scorer
	result_beam = pipe("audio.wav")

	# Run greedy decoding without joint CTC-attention scorer
	pipe.model.generation_config.ctc_weight = 0.0
	pipe.model.generation_config.num_beams = 1

	result_greedy = pipe("audio.wav")

	```