--- license: apache-2.0 language: - en metrics: - cer - wer library_name: transformers pipeline_tag: automatic-speech-recognition --- # Model This model is [Wav2Vec2-Large-XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) fine-tuned on the manually annotated subset of CMU's [L2-Arctic dataset](https://psi.engr.tamu.edu/l2-arctic-corpus/). It was fine-tuned to perform automatic phonetic transcriptions in IPA. It was tuned following a similar procedure as described by [vitouphy](https://huggingface.co/vitouphy/wav2vec2-xls-r-300m-timit-phoneme) with the TIMIT dataset. # Usage To use the model, create a pipeline and invoke it with the path to your WAV, which must be sampled at 16KHz. ```python from transformers import pipeline pipe = pipeline(model="mrrubino/wav2vec2-large-xlsr-53-l2-arctic-phoneme") transcription = pipe("file.wav")["text"] ``` # Results The manually annotated subset of L2-Arctic was divided into training and testing datasets with a 90/10 split. The performance metrics for the testing dataset are included below. WER - 0.425 CER - 0.128 # Citation If you find our model helpful, please feel free to cite us. ``` @article{Bo_Rubino_Xu_2024, title={A Mispronunciation-Based Voice-Omics Representation Framework for Screening Specific Language Impairments in Children}, DOI={10.1109/ichi61247.2024.00045}, journal={2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)}, author={Bo, Wei and Rubino, Matthew and Xu, Wenyao}, year={2024}, month={Jun}, pages={294–304} } ```