File size: 939 Bytes
07f32ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80420dd
 
 
07f32ca
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Speaker Identification with ECAPA-TDNN embeddings on Voxceleb

This repository provides a pretrained ECAPA-TDNN model using SpeechBrain. The system can be used to extract speaker embeddings as well. It is trained on Voxceleb 2 development data only.

# Pipeline description

This system is composed of an ECAPA-TDNN model. It is a combination of convolutional and residual blocks. The embeddings are extracted using attentive statistical pooling. The system is trained with Additive Margin Softmax Loss.

# Compute the speaker embeddings

The system is trained with recordings sampled at 16kHz (single channel).

```python
import torchaudio
from speechbrain.pretrained import EncoderClassifier
classifier = EncoderClassifier.from_hparams(
    source="yangwang825/ecapa-tdnn-vox2"
)
signal, fs = torchaudio.load('spk1_snt1.wav')
embeddings = classifier.encode_batch(signal)
```

You can find our training results (models, logs, etc) here.