--- language: - fr license: apache-2.0 tags: - automatic-speech-recognition - polinaeterna/voxpopuli - generated_from_trainer - hf-asr-leaderboard - robust-speech-event datasets: - polinaeterna/voxpopuli model-index: - name: Fine-tuned Wav2Vec2 XLS-R 1B model for ASR in French results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Voxpopuli type: polinaeterna/voxpopuli args: fr metrics: - name: Test WER type: wer value: 11.70 - name: Test CER type: cer value: 5.80 - name: Test WER (+LM) type: wer value: 10.01 - name: Test CER (+LM) type: cer value: 5.63 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 9 type: mozilla-foundation/common_voice_9_0 args: fr metrics: - name: Test WER type: wer value: 45.74 - name: Test CER type: cer value: 22.99 - name: Test WER (+LM) type: wer value: 38.81 - name: Test CER (+LM) type: cer value: 23.25 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Robust Speech Event - Dev Data type: speech-recognition-community-v2/dev_data args: fr metrics: - name: Test WER type: wer value: 27.86 - name: Test CER type: cer value: 13.20 - name: Test WER (+LM) type: wer value: 22.53 - name: Test CER (+LM) type: cer value: 12.82 --- # Fine-tuned Wav2Vec2 XLS-R 1B model for ASR in French This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the POLINAETERNA/VOXPOPULI - FR dataset. It achieves the following results on the evaluation set: - Loss: 0.2906 - Wer: 0.1093 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 12.0 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:----:|:---------------:|:------:| | 0.4628 | 0.93 | 500 | 0.3834 | 0.1625 | | 0.3577 | 1.85 | 1000 | 0.3231 | 0.1367 | | 0.3103 | 2.78 | 1500 | 0.2918 | 0.1287 | | 0.2884 | 3.7 | 2000 | 0.2845 | 0.1227 | | 0.2615 | 4.63 | 2500 | 0.2819 | 0.1189 | | 0.242 | 5.56 | 3000 | 0.2915 | 0.1165 | | 0.2268 | 6.48 | 3500 | 0.2768 | 0.1187 | | 0.2188 | 7.41 | 4000 | 0.2719 | 0.1128 | | 0.1979 | 8.33 | 4500 | 0.2741 | 0.1134 | | 0.1834 | 9.26 | 5000 | 0.2827 | 0.1096 | | 0.1719 | 10.19 | 5500 | 0.2906 | 0.1093 | | 0.1723 | 11.11 | 6000 | 0.2868 | 0.1104 | ### Framework versions - Transformers 4.23.0.dev0 - Pytorch 1.12.0+cu113 - Datasets 2.4.0 - Tokenizers 0.12.1 ## Evaluation 1. To evaluate on `mozilla-foundation/common_voice_9_0` ```bash python eval.py \ --model_id "bhuang/wav2vec2-xls-r-1b-voxpopuli-fr" \ --dataset "polinaeterna/voxpopuli" \ --config "fr" \ --split "test" \ --log_outputs \ --outdir "outputs/results_polinaeterna_voxpopuli_with_lm" ``` 2. To evaluate on `mozilla-foundation/common_voice_9_0` ```bash python eval.py \ --model_id "bhuang/wav2vec2-xls-r-1b-voxpopuli-fr" \ --dataset "mozilla-foundation/common_voice_9_0" \ --config "fr" \ --split "test" \ --log_outputs \ --outdir "outputs/results_mozilla-foundatio_common_voice_9_0_with_lm" ``` 3. To evaluate on `speech-recognition-community-v2/dev_data` ```bash python eval.py \ --model_id "bhuang/wav2vec2-xls-r-1b-voxpopuli-fr" \ --dataset "speech-recognition-community-v2/dev_data" \ --config "fr" \ --split "validation" \ --chunk_length_s 5.0 \ --stride_length_s 1.0 \ --log_outputs \ --outdir "outputs/results_speech-recognition-community-v2_dev_data_with_lm" ```