Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This is `facebook/wav2vec2-large-960h-lv60-self` enhanced with a Wikipedia language model.
|
2 |
+
|
3 |
+
The dataset used is `wikipedia/20200501.en`. All articles were used. It was cleaned of references and external links and all text inside of parantheses. It has 8092546 words.
|
4 |
+
|
5 |
+
The language model was built using KenLM. It is a 5-gram model where all singletons of 3-grams and bigger were pruned. It was built as:
|
6 |
+
|
7 |
+
`kenlm/build/bin/lmplz -o 5 -S 120G --vocab_estimate 8092546 --text text.txt --arpa text.arpa --prune 0 0 1`
|
8 |
+
|
9 |
+
Suggested usage:
|
10 |
+
|
11 |
+
```
|
12 |
+
from transformers import pipeline
|
13 |
+
pipe = pipeline("automatic-speech-recognition", model="gxbag/wav2vec2-large-960h-lv60-self-with-wikipedia-lm")
|
14 |
+
output = pipe("/path/to/audio.wav", chunk_length_s=30, stride_length_s=(6, 3))
|
15 |
+
output
|
16 |
+
```
|
17 |
+
|
18 |
+
Note that in the current version of `transformers` (as of the release of this model), when using striding in the pipeline it will chop off the last portion of audio, in this case 3 seconds. Add 3 seconds of silence to the end as a workaround. This problem was fixed in the GitHub version of `transformers`.
|