ESPnet2 ASR model
espnet/shihlun_asr_whisper_medium_finetuned_librispeech100
This model was trained by Shih-Lun Wu (slseanwu) using the librispeech_100 recipe in espnet.
Demo: How to use in ESPnet2
cd espnet
pip install -e .
cd egs2/librispeech_100/asr1
train_set="train_clean_100"
valid_set="dev"
test_sets="test_clean test_other dev_clean dev_other"
asr_tag=whisper_medium_finetune_lr1e-5_adamw_wd1e-2_3epochs
asr_config=conf/tuning/train_asr_whisper_full.yaml
inference_config=conf/decode_asr_whisper_noctc_greedy.yaml
./asr.sh \
--skip_data_prep false \
--skip_train true \
--skip_eval false \
--lang en \
--ngpu 1 \
--nj 4 \
--stage 1 \
--stop_stage 13 \
--gpu_inference true \
--inference_nj 1 \
--token_type whisper_multilingual \
--feats_normalize '' \
--max_wav_duration 30 \
--speed_perturb_factors "0.9 1.0 1.1" \
--audio_format "flac.ark" \
--feats_type raw \
--use_lm false \
--cleaner whisper_en \
--asr_tag "${asr_tag}" \
--asr_config "${asr_config}" \
--inference_config "${inference_config}" \
--inference_asr_model valid.acc.ave.pth \
--train_set "${train_set}" \
--valid_set "${valid_set}" \
--test_sets "${test_sets}" "$@"
RESULTS
Environments
- date:
Mon Jan 9 23:06:34 CST 2023
- python version:
3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]
- espnet version:
espnet 202211
- pytorch version:
pytorch 1.12.1
- Git hash:
d89be931dcc8f61437ac49cbe39a773f2054c50c
- Commit date:
Mon Jan 9 11:06:45 2023 -0600
asr_whisper_medium_finetune_lr1e-5_adamw_wd1e-2_3epochs
WER
dataset |
Snt |
Wrd |
Corr |
Sub |
Del |
Ins |
Err |
S.Err |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_clean |
2703 |
54798 |
97.7 |
1.9 |
0.3 |
0.3 |
2.6 |
30.1 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_other |
2864 |
51528 |
95.3 |
4.3 |
0.4 |
0.6 |
5.3 |
45.4 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_clean |
2620 |
53027 |
97.6 |
2.1 |
0.3 |
0.4 |
2.7 |
30.9 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_other |
2939 |
52882 |
95.1 |
4.4 |
0.5 |
0.7 |
5.6 |
47.5 |
CER
dataset |
Snt |
Wrd |
Corr |
Sub |
Del |
Ins |
Err |
S.Err |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_clean |
2703 |
287287 |
99.3 |
0.3 |
0.4 |
0.3 |
1.0 |
30.1 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dev_other |
2864 |
265648 |
98.3 |
1.0 |
0.7 |
0.6 |
2.3 |
45.4 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_clean |
2620 |
280691 |
99.3 |
0.3 |
0.3 |
0.3 |
1.0 |
30.9 |
decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/test_other |
2939 |
271738 |
98.3 |
1.0 |
0.7 |
0.7 |
2.4 |
47.5 |