Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
metrics:
|
6 |
+
- cer
|
7 |
+
- wer
|
8 |
+
library_name: transformers
|
9 |
+
pipeline_tag: automatic-speech-recognition
|
10 |
+
---
|
11 |
+
|
12 |
+
# Model
|
13 |
+
This model is [Wav2Vec2-Large-XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
|
14 |
+
fine-tuned on the manually annotated subset of
|
15 |
+
CMU's [L2-Arctic dataset](https://psi.engr.tamu.edu/l2-arctic-corpus/). It was fine-tuned
|
16 |
+
to perform automatic phonetic transcriptions in IPA.
|
17 |
+
It was tuned following a similar procedure as described
|
18 |
+
by [vitouphy](https://huggingface.co/vitouphy/wav2vec2-xls-r-300m-timit-phoneme)
|
19 |
+
with the TIMIT dataset.
|
20 |
+
|
21 |
+
# Usage
|
22 |
+
To use the model, create a pipeline and invoke it with
|
23 |
+
the path to your WAV, which must be sampled at 16KHz.
|
24 |
+
|
25 |
+
```python
|
26 |
+
from transformers import pipeline
|
27 |
+
|
28 |
+
pipe = pipeline(model="mrrubino/wav2vec2-large-xlsr-53-l2-arctic-phoneme")
|
29 |
+
transcription = pipe("file.wav")["text"]
|
30 |
+
```
|
31 |
+
|
32 |
+
# Results
|
33 |
+
The manually annotated subset of L2-Arctic was divided
|
34 |
+
into training and testing datasets with a 90/10 split.
|
35 |
+
The performance metrics for the testing dataset are
|
36 |
+
included below.
|
37 |
+
|
38 |
+
WER - 0.425
|
39 |
+
CER - 0.128
|