This is a small CTC-based Automatic Speech Recognition system for French.

This model is part of our SLU demo available here: https://huggingface.co/spaces/naver/French-SLU-DEMO-Interspeech2024

Please check our blog post available at: TBD

  • Training data: 123 hours (84,707 utterances)
  • Normalization: Whisper normalization

Table of Contents:

  1. Performance
  2. Training Parameters
  3. ASR Model class
  4. Running inference

Performance

dev WER dev CER test WER test CER
speechMASSIVE 9.2 2.6 9.6 2.9
fleurs102 20.0 7.0 22.0 7.7
CommonVoice 17 16.0 4.9 19.0 6.5

Training Parameters

This is a mHuBERT-147 ASR fine-tuned model. The training parameters are available in config.json. We highlight the use of 0.3 for hubert.final_dropout, which we found to be very helpful in convergence. We also use fp32 training, as we found fp16 training to be unstable.

ASR Model Class

We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. The code is available in CTC_model.py.

Running Inference

The run_inference.py file illustrates how to load the model for inference (load_asr_model), and how to produce transcription for a file (run_asr_inference). Please follow the requirements file to avoid incorrect model loading.

Here is a simple example of the inference loop. Please notice that the sampling rate must be 16,000Hz.

from inference_code.run_inference import load_asr_model, run_asr_inference

model, processor = load_asr_model()

prediction = run_inference(model, processor, your_audio_file)
Downloads last month
36
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for naver/mHuBERT-147-ASR-fr

Finetuned
(7)
this model

Datasets used to train naver/mHuBERT-147-ASR-fr

Space using naver/mHuBERT-147-ASR-fr 1