Yeroyan/stt_arm_conformer_ctc_large

Model Overview

This model is a fine-tuned version of the NVIDIA NeMo Conformer CTC large model, adapted for transcribing Armenian speech.

NVIDIA NeMo: Training

To train, fine-tune, or play with the model, you will need to install NVIDIA NeMo. We recommend installing it after you've installed the latest Pytorch version.

pip install nemo_toolkit['all']

How to Use this Model

The model is available for use in the NeMo toolkit, and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.

Automatically instantiate the model

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("Yeroyan/stt_arm_conformer_ctc_large")

Transcribing using Python

First, let's get a sample

wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav

Then simply do:

asr_model.transcribe(['2086-149220-0033.wav'])

Transcribing many audio files

python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py  pretrained_name="Yeroyan/stt_arm_conformer_ctc_large"  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"

Input

This model accepts 16000 KHz Mono-channel Audio (wav files) as input.

Output

This model provides transcribed speech as a string for a given audio sample.

Model Architecture

The model uses a Conformer Convolutional Neural Network architecture with CTC loss for speech recognition.

Training

This model was originally trained on diverse English speech datasets and fine-tuned on a dataset comprising Armenian speech (100epochs)

Datasets

The model was fine-tuned on the Armenian dataset from the Common Voice corpus, version 17.0 (Mozilla Foundation). For dataset processing, we have used the following fork: NeMo-Speech-Data-Processor

Performance

Version	Tokenizer	Vocabulary Size	MCV Test WER	MCV Test WER (no punctuation)	Train Dataset
1.6.0	SentencePiece	128	15.0%	12.44%	MCV v17
	Unigram				(Armenian)

Limitations

Eastern Armenian
Need to replace "եւ" with "և" after each prediction (tokenizer does not contain "և" symbol which is unique linguistic exceptions as it does not have an uppercase version)

References

[1] NVIDIA NeMo Toolkit [2] Enhancing ASR on low-resource languages (paper)

Yeroyan
/

stt_arm_conformer_ctc_large

You need to agree to share your contact information to access this model