luomingshuang's picture
Update README.md
0ea23cf

Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/399

Pre-trained Transducer-Stateless5 models for the AISHELL4 dataset with icefall.

The model was trained on the far data of AISHELL4 with the scripts in icefall based on the latest version k2.

Training procedure

The main repositories are list below, we will update the training and decoding scripts with the update of version.
k2: https://github.com/k2-fsa/k2 icefall: https://github.com/k2-fsa/icefall lhotse: https://github.com/lhotse-speech/lhotse

git clone https://github.com/k2-fsa/icefall
cd icefall
  • Preparing data.
cd egs/aishell4/ASR
bash ./prepare.sh
  • Training
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./pruned_transducer_stateless5/train.py \
                  --world-size 4 \
                  --num-epochs 30 \
                  --start-epoch 1 \
                  --exp-dir pruned_transducer_stateless5/exp \
                  --lang-dir data/lang_char \
                  --max-duration 220

Evaluation results

The decoding results (CER%) on AISHELL4(test) are listed below:

When use-averaged-model=False, the CERs are

test comment
greedy search 30.05 --epoch 30, --avg 25, --max-duration 800
modified beam search (beam size 4) 29.16 --epoch 30, --avg 25, --max-duration 800
fast beam search (set as default) 29.20 --epoch 30, --avg 25, --max-duration 1500

When use-averaged-model=True, the CERs are

test comment
greedy search 29.89 --iter 36000, --avg 8, --max-duration 800 --use-averaged-model=True
modified beam search (beam size 4) 28.91 --iter 36000, --avg 8, --max-duration 800 --use-averaged-model=True
fast beam search (set as default) 29.08 --iter 36000, --avg 8, --max-duration 1500 --use-averaged-model=True