This model is trained on Vox2. It uses ECAPA-TDNN as teacher model. The speaker module is placed at the end of the encoder
-