|
# Speaker Identification with ECAPA-TDNN embeddings on Voxceleb |
|
|
|
This repository provides a pretrained ECAPA-TDNN model using SpeechBrain. The system can be used to extract speaker embeddings as well. It is trained on Voxceleb 2 development data only. |
|
|
|
# Pipeline description |
|
|
|
This system is composed of an ECAPA-TDNN model. It is a combination of convolutional and residual blocks. The embeddings are extracted using attentive statistical pooling. The system is trained with Additive Margin Softmax Loss. |
|
|
|
# Compute the speaker embeddings |
|
|
|
The system is trained with recordings sampled at 16kHz (single channel). |
|
|
|
```python |
|
import torchaudio |
|
from speechbrain.pretrained import EncoderClassifier |
|
classifier = EncoderClassifier.from_hparams( |
|
source="yangwang825/ecapa-tdnn-vox2" |
|
) |
|
signal, fs = torchaudio.load('spk1_snt1.wav') |
|
embeddings = classifier.encode_batch(signal) |
|
``` |
|
|
|
You can find our training results (models, logs, etc) here. |