metadata
license: apache-2.0
datasets:
- mozilla-foundation/common_voice_12_0
language:
- fr
metrics:
- wer
pipeline_tag: automatic-speech-recognition
training on full commonvoice
The WERs are:
decoding method | chunk size | test | comment | decoding mode |
---|---|---|---|---|
greedy search | 640ms | 10.90 | --epoch 30 --avg 9 | simulated streaming |
modified beam search | 640ms | 10.55 | --epoch 30 --avg 9 | simulated streaming |
fast beam search | 640ms | 10.75 | --epoch 30 --avg 9 | simulated streaming |
training on full librispeech then finetune on full commonvoice
The WERs are:
decoding method | chunk size | test | comment | decoding mode |
---|---|---|---|---|
greedy search | 640ms | 10.57 | --epoch 29 --avg 9 | simulated streaming |
modified beam search | 640ms | 10.19 | --epoch 29 --avg 9 | simulated streaming |
fast beam search | 640ms | 10.25 | --epoch 29 --avg 9 | simulated streaming |
training on full librispeech and gigaspeech then finetune on full commonvoice
The WERs are:
decoding method | chunk size | test | comment | decoding mode |
---|---|---|---|---|
greedy search | 640ms | 9.95 | --epoch 30 --avg 9 | simulated streaming |
modified beam search | 640ms | 9.57 | --epoch 30 --avg 9 | simulated streaming |
fast beam search | 640ms | 9.67 | --epoch 30 --avg 9 | simulated streaming |