sherpa_onnx_models / k2_zipformer2_english_v1_small.md

Create k2_zipformer2_english_v1_small.md

9fe5087 verified 5 months ago

1.16 kB

	# k2_zipformer2_english_v1

	- Zipformer2 recipe derived from : https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer

	- Training data : CommonVoice, VoxPopuli (speed-perturb 3x):
	- cca 1500 + 3x500 hours of training data

	- Output text/symbols include:
	- TrueCase capitalization
	- punctuation `[,.?!]` as standalone tokens

	## Config:
	```
	--num-epochs 20 \
	--base-lr 0.04 \
	\
	--causal 1 \
	--use-transducer 1 \
	--use-ctc 0 \
	\
	--num-encoder-layers 2,2,2,2,2,2 \
	--feedforward-dim 512,768,768,768,768,768 \
	--encoder-dim 192,256,256,256,256,256 \
	--encoder-unmasked-dim 192,192,192,192,192,192 \
	```

	## Results

	\| ID \| System \| cv-dev \| cv-test \| vp-dev \| vp-test \| Comment \|
	\|---\|-------------------------\|--------\|---------\|--------\|---------\|-------------------------------\|
	\| A \| small (24M) \| 18.57 \| 21.98 \| 13.66 \| 13.26 \| ep20,avg4 \|

	- non-streaming results from [decode.py](https://github.com/BUTSpeechFIT/k2_streaming_training/blob/main/training/zipformer/decode.py)
	- cv = CommonVoice, vp = VoxPopuli
	- exported the model averaging : ep20,avg4

	## Note
	- Not the best possible results, this model is for integration tests