gxbag
/

wav2vec2-large-960h-lv60-self-with-wikipedia-lm

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

gxbag commited on May 23, 2022

Commit

a50db71

·

1 Parent(s): e0ca32c

Create README.md

Files changed (1) hide show

README.md +18 -0

README.md ADDED Viewed

	@@ -0,0 +1,18 @@

+This is `facebook/wav2vec2-large-960h-lv60-self` enhanced with a Wikipedia language model.
+The dataset used is `wikipedia/20200501.en`. All articles were used. It was cleaned of references and external links and all text inside of parantheses. It has 8092546 words.
+The language model was built using KenLM. It is a 5-gram model where all singletons of 3-grams and bigger were pruned. It was built as:
+`kenlm/build/bin/lmplz -o 5 -S 120G --vocab_estimate 8092546 --text text.txt --arpa text.arpa --prune 0 0 1`
+Suggested usage:
+```
+from transformers import pipeline
+pipe = pipeline("automatic-speech-recognition", model="gxbag/wav2vec2-large-960h-lv60-self-with-wikipedia-lm")
+output = pipe("/path/to/audio.wav", chunk_length_s=30, stride_length_s=(6, 3))
+output
+```
+Note that in the current version of `transformers` (as of the release of this model), when using striding in the pipeline it will chop off the last portion of audio, in this case 3 seconds. Add 3 seconds of silence to the end as a workaround. This problem was fixed in the GitHub version of `transformers`.