rristo's picture
readme
5fbce16
---
tags:
- espnet
- audio
- automatic-speech-recognition
language:
- et
license: apache-2.0
metrics:
- wer
model-index:
- name: e-branchformer et
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: ERR2020
type: audio
metrics:
- name: Wer
type: wer
value: 9.9
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# e-branchformer et
Espnet2 e-branchformer based recipe (https://github.com/espnet/espnet/tree/master/egs2/librispeech_100/asr1) trained Estonian ASR model using ERR2020 dataset
- WER on ERR2020: 9.9
- WER on mozilla commonvoice_11: 20.8
For usage:
- clone this repo (`git clone https://huggingface.co/rristo/espnet_ebranchformer_et`)
- go to repo (`cd espnet_ebranchformer_et`)
- build docker image for needed libraries (`build.sh` or `build.bat`)
- run docker container (`run.sh` or `run.sh`). This mounts current directory
- run notebook `example_usage.ipynb` for example usage
- currently expects audio to be in .wav format
## Model description
ASR model for Estonian, uses Estonian Public Broadcasting data ERR2020 data (around 340 hours of audio)
## Intended uses & limitations
Pretty much a toy model, trained on limited amount of data. Might not work well on data out of domain
(especially spontaneous/noisy data).
## Training and evaluation data
Trained on ERR2020 data, evaluated on ERR2020 and mozilla commonvoice test data.
## Training procedure
Used espnet e-branchformer based recipe (https://github.com/espnet/espnet/tree/master/egs2/librispeech_100/asr1)
### Training results
Look into folder exp/images.
Validation results are in exp/RESULTS.md
### Framework versions
- espnet2