--- license: mit language: ar datasets: - mozilla-foundation/common_voice_17_0 metrics: - wer library_name: nemo pipeline_tag: text-to-speech tags: - asr - automatic speech recognition --- --- # Model Card for Arabic ASR with NeMo Conformer CTC ## Model Details **Model Name:** NeMo-Conformer-CTC-Arabic-ASR **Model Type:** Conformer CTC (Connectionist Temporal Classification) (small) **Language:** Arabic **License:** MIT **Model Creator:** Mostafa Ahmed **Contact Information:** mostafa.ahmed00976@gmail.com **Model Version:** 1.0 ## Overview NeMo-Conformer-CTC-Arabic-ASR is a fine-tuned version of the NeMo Conformer CTC model specifically designed for Automatic Speech Recognition (ASR) task in Arabic. The model has been trained to convert spoken Arabic into written text, making it suitable for various applications such as transcription services, voice assistants, and accessibility tools. ## Intended Use The model is intended for use in: - Automatic Speech Recognition (ASR) systems for Arabic - Transcription services for Arabic audio - Voice assistants and conversational agents - Accessibility tools for Arabic speakers ## Training Data The model was fine-tuned on the Arabic Common Voice dataset, an open-source dataset of transcribed speech. The dataset includes a variety of speakers and audio conditions, ensuring the model's robustness in different scenarios. **Data Sources:** - [Common Voice](https://commonvoice.mozilla.org/en/datasets): A multilingual dataset for speech recognition tasks. ## Training Procedure The model was trained using NVIDIA's NeMo framework. The training process involved: - Preprocessing the Common Voice dataset and convert it to manifests to format the audio and transcriptions for ASR. - Fine-tuning the pre-trained Conformer CTC model on the Arabic common voice dataset. - Evaluating the model's performance using standard ASR metrics (Word Error Rate, WER). ## Evaluation Results The model was evaluated on a held-out test set from the Arabic portion of the Common Voice dataset. Here are the key performance metrics: - **Word Error Rate (WER):** 30% on Train, 32% on Validation and 40% on Test (No Language Model) This metric indicates the model's effectiveness in accurately transcribing Arabic speech into text. ## How to Use You can load and use the model with the NeMo framework as follows: ```python import nemo.collections.asr as nemo_asr # Load the model asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained("MostafaAhmed98/Conformer-CTC-Arabic-ASR") # Example usage audio_file = "path/to/arabic_audio.wav" transcription = asr_model.transcribe([audio_file]) print(transcription[0]) # Output: Transcribed Arabic text ```