Edit model card

Model

This model is Wav2Vec2-Large-XLSR-53 fine-tuned on the manually annotated subset of CMU's L2-Arctic dataset. It was fine-tuned to perform automatic phonetic transcriptions in IPA. It was tuned following a similar procedure as described by vitouphy with the TIMIT dataset.

Usage

To use the model, create a pipeline and invoke it with the path to your WAV, which must be sampled at 16KHz.

from transformers import pipeline

pipe = pipeline(model="mrrubino/wav2vec2-large-xlsr-53-l2-arctic-phoneme")
transcription = pipe("file.wav")["text"]

Results

The manually annotated subset of L2-Arctic was divided into training and testing datasets with a 90/10 split. The performance metrics for the testing dataset are included below.

WER - 0.425

CER - 0.128

Downloads last month
1,909
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.