metadata

language:
  - el
tags:
  - pytorch
  - ASR

Greek (el) version of the XLSR-Wav2Vec2 automatic speech recognition (ASR) model

language: el
licence: apache-2.0
dataset: CommonVoice (EL), 364MB: https://commonvoice.mozilla.org/el/datasets
model: XLSR-Wav2Vec2
metrics: WER

Model description

Wav2Vec2 is a pretrained model for Automatic Speech Recognition (ASR) and was released in September 2020 by Alexei Baevski, Michael Auli, and Alex Conneau. Soon after the superior performance of Wav2Vec2 was demonstrated on the English ASR dataset LibriSpeech, Facebook AI presented XLSR-Wav2Vec2 (click here). XLSR stands for cross-lingual speech representations and refers to XLSR-Wav2Vec2`s ability to learn speech representations that are useful across multiple languages.

Similar to Wav2Vec2, XLSR-Wav2Vec2 learns powerful speech representations from hundreds of thousands of hours of speech in more than 50 languages of unlabeled speech. Similar, to BERT's masked language modeling, the model learns contextualized speech representations by randomly masking feature vectors before passing them to a transformer network.

How to use

Instructions to replicate the process are included in the Jupyter notebook.

Metrics

Metric	Value
Training Loss	0.0536
Validation Loss	0.61605
WER	0.45049

BibTeX entry and citation info

Based on the tutorial of Patrick von Platen: https://huggingface.co/blog/fine-tune-xlsr-wav2vec2 Original colab notebook here: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLSR_Wav2Vec2_on_Turkish_ASR_with_%F0%9F%A4%97_Transformers.ipynb#scrollTo=V7YOT2mnUiea