yangwang825
commited on
Commit
•
b8ba5b8
1
Parent(s):
f3d9c66
Update README.md
Browse files
README.md
CHANGED
@@ -26,14 +26,16 @@ widget:
|
|
26 |
|
27 |
This repository provides a pretrained ECAPA-TDNN model using SpeechBrain. The system can be used to extract speaker embeddings as well. It is trained on Voxceleb 2 development data only.
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
| Release | EER(%) | minDCF(0.01) |
|
30 |
|:-------------:|:--------------:|:--------------:|
|
31 |
| 05-03-21 | 1.45 | 0.17 |
|
32 |
|
33 |
-
# Pipeline description
|
34 |
-
|
35 |
-
This system is composed of an ECAPA-TDNN model. It is a combination of convolutional and residual blocks. The embeddings are extracted using attentive statistical pooling. The system is trained with Additive Margin Softmax Loss.
|
36 |
-
|
37 |
# Compute the speaker embeddings
|
38 |
|
39 |
The system is trained with recordings sampled at 16kHz (single channel).
|
|
|
26 |
|
27 |
This repository provides a pretrained ECAPA-TDNN model using SpeechBrain. The system can be used to extract speaker embeddings as well. It is trained on Voxceleb 2 development data only.
|
28 |
|
29 |
+
# Pipeline description
|
30 |
+
|
31 |
+
This system is composed of an ECAPA-TDNN model. It is a combination of convolutional and residual blocks. The embeddings are extracted using attentive statistical pooling. The system is trained with Additive Margin Softmax Loss. It was trained using initial learning rate of 0.001 and batch size of 512 with cyclical learning rate policy (CLR) for 10 epochs on 4 A100 GPUs. We employ additive noises and reverberation from [MUSAN](http://www.openslr.org/17/) and [RIR](http://www.openslr.org/28/) datasets to enrich the supervised information. The pre-training progress takes approximately seven days for the ECAPA-TDNN model.
|
32 |
+
|
33 |
+
# Performance
|
34 |
+
|
35 |
| Release | EER(%) | minDCF(0.01) |
|
36 |
|:-------------:|:--------------:|:--------------:|
|
37 |
| 05-03-21 | 1.45 | 0.17 |
|
38 |
|
|
|
|
|
|
|
|
|
39 |
# Compute the speaker embeddings
|
40 |
|
41 |
The system is trained with recordings sampled at 16kHz (single channel).
|