mrrubino's picture
Update README.md
2f35202 verified
metadata
license: apache-2.0
language:
  - en
metrics:
  - cer
  - wer
library_name: transformers
pipeline_tag: automatic-speech-recognition

Model

This model is Wav2Vec2-Large-XLSR-53 fine-tuned on the manually annotated subset of CMU's L2-Arctic dataset. It was fine-tuned to perform automatic phonetic transcriptions in IPA. It was tuned following a similar procedure as described by vitouphy with the TIMIT dataset.

Usage

To use the model, create a pipeline and invoke it with the path to your WAV, which must be sampled at 16KHz.

from transformers import pipeline

pipe = pipeline(model="mrrubino/wav2vec2-large-xlsr-53-l2-arctic-phoneme")
transcription = pipe("file.wav")["text"]

Results

The manually annotated subset of L2-Arctic was divided into training and testing datasets with a 90/10 split. The performance metrics for the testing dataset are included below.

WER - 0.425

CER - 0.128

Citation

If you find our model helpful, please feel free to cite us.

@article{Bo_Rubino_Xu_2024,
  title={A Mispronunciation-Based Voice-Omics Representation Framework for Screening Specific Language Impairments in Children},
  DOI={10.1109/ichi61247.2024.00045},
  journal={2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)},
  author={Bo, Wei and Rubino, Matthew and Xu, Wenyao},
  year={2024},
  month={Jun},
  pages={294–304}
}