gxbag commited on
Commit
a50db71
·
1 Parent(s): e0ca32c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is `facebook/wav2vec2-large-960h-lv60-self` enhanced with a Wikipedia language model.
2
+
3
+ The dataset used is `wikipedia/20200501.en`. All articles were used. It was cleaned of references and external links and all text inside of parantheses. It has 8092546 words.
4
+
5
+ The language model was built using KenLM. It is a 5-gram model where all singletons of 3-grams and bigger were pruned. It was built as:
6
+
7
+ `kenlm/build/bin/lmplz -o 5 -S 120G --vocab_estimate 8092546 --text text.txt --arpa text.arpa --prune 0 0 1`
8
+
9
+ Suggested usage:
10
+
11
+ ```
12
+ from transformers import pipeline
13
+ pipe = pipeline("automatic-speech-recognition", model="gxbag/wav2vec2-large-960h-lv60-self-with-wikipedia-lm")
14
+ output = pipe("/path/to/audio.wav", chunk_length_s=30, stride_length_s=(6, 3))
15
+ output
16
+ ```
17
+
18
+ Note that in the current version of `transformers` (as of the release of this model), when using striding in the pipeline it will chop off the last portion of audio, in this case 3 seconds. Add 3 seconds of silence to the end as a workaround. This problem was fixed in the GitHub version of `transformers`.