Mirco commited on
Commit
218decd
·
1 Parent(s): 684a496

upload model

Browse files
.gitattributes CHANGED
@@ -14,3 +14,6 @@
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
 
 
 
 
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
17
+ classifier.ckpt filter=lfs diff=lfs merge=lfs -text
18
+ embedding_model.ckpt filter=lfs diff=lfs merge=lfs -text
19
+ normalizer.ckpt filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ thumbnail:
4
+ tags:
5
+ - embeddings
6
+ - Speaker
7
+ - Verification
8
+ - Identification
9
+ - pytorch
10
+ - xvectors
11
+ - TDNN
12
+ license: "apache-2.0"
13
+ datasets:
14
+ - voxceleb
15
+ metrics:
16
+ - EER
17
+ - min_dct
18
+ ---
19
+
20
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
21
+ <br/><br/>
22
+
23
+ # Speaker Verification with xvector embeddings on Voxceleb
24
+
25
+ This repository provides all the necessary tools to extract speaker embeddings with a pretrained TDNN model using SpeechBrain.
26
+ The system is trained on Voxceleb 1+ Voxceleb2 training data.
27
+
28
+ For a better experience, we encourage you to learn more about
29
+ [SpeechBrain](https://speechbrain.github.io). The given model performance on Voxceleb1-test set (Cleaned) is:
30
+
31
+ | Release | EER(%)
32
+ |:-------------:|:--------------:|
33
+ | 05-03-21 | 3.2 |
34
+
35
+
36
+ ## Pipeline description
37
+ This system is composed of a TDNN model coupled with statistical pooling. The system is trained with Categorical Cross-Entropy Loss.
38
+
39
+ ## Install SpeechBrain
40
+
41
+ First of all, please install SpeechBrain with the following command:
42
+
43
+ ```
44
+ pip install speechbrain
45
+ ```
46
+
47
+ Please notice that we encourage you to read our tutorials and learn more about
48
+ [SpeechBrain](https://speechbrain.github.io).
49
+
50
+ ### Compute your speaker embeddings
51
+
52
+ ```python
53
+ import torchaudio
54
+ from speechbrain.pretrained import EncoderClassifier
55
+ classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb")
56
+ signal, fs =torchaudio.load('samples/audio_samples/example1.wav')
57
+ embeddings = classifier.encode_batch(signal)
58
+ ```
59
+
60
+ ### Inference on GPU
61
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
62
+
63
+ ### Training
64
+ The model was trained with SpeechBrain (aa018540).
65
+ To train it from scratch follows these steps:
66
+ 1. Clone SpeechBrain:
67
+ ```bash
68
+ git clone https://github.com/speechbrain/speechbrain/
69
+ ```
70
+ 2. Install it:
71
+ ```
72
+ cd speechbrain
73
+ pip install -r requirements.txt
74
+ pip install -e .
75
+ ```
76
+
77
+ 3. Run Training:
78
+ ```
79
+ cd recipes/VoxCeleb/SpeakerRec/
80
+ python train_speaker_embeddings.py hparams/train_x_vectors.yaml --data_folder=your_data_folder
81
+ ```
82
+
83
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1RtCBJ3O8iOCkFrJItCKT9oL-Q1MNCwMH?usp=sharing).
84
+
85
+ ### Limitations
86
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
87
+
88
+ #### Referencing xvectors
89
+ ```@inproceedings{DBLP:conf/odyssey/SnyderGMSPK18,
90
+ author = {David Snyder and
91
+ Daniel Garcia{-}Romero and
92
+ Alan McCree and
93
+ Gregory Sell and
94
+ Daniel Povey and
95
+ Sanjeev Khudanpur},
96
+ title = {Spoken Language Recognition using X-vectors},
97
+ booktitle = {Odyssey 2018},
98
+ pages = {105--111},
99
+ year = {2018},
100
+ }
101
+ ```
102
+
103
+
104
+ #### Referencing SpeechBrain
105
+
106
+ ```
107
+ @misc{SB2021,
108
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
109
+ title = {SpeechBrain},
110
+ year = {2021},
111
+ publisher = {GitHub},
112
+ journal = {GitHub repository},
113
+ howpublished = {\url{https://github.com/speechbrain/speechbrain}},
114
+ }
115
+ ```
116
+
117
+ #### About SpeechBrain
118
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
119
+
120
+ Website: https://speechbrain.github.io/
121
+
122
+ GitHub: https://github.com/speechbrain/speechbrain
classifier.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7c811b8af2c91e2917e3e967695330cfcad6682391e8ad1f3b8295e625ed506
3
+ size 8568
embedding_model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7adfcb7acca11914e9556e4390ac23b25296cbe8efc682a9eda43d1608e5f44a
3
+ size 83316686
hyperparams.yaml ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ############################################################################
2
+ # Model: xvector for Sound Classification with UrbanSound8k
3
+ # ############################################################################
4
+
5
+ # Pretrain folder (HuggingFace)
6
+ pretrained_path: speechbrain/urbansound8k_ecapa
7
+
8
+ # Feature parameters
9
+ n_mels: 80
10
+
11
+ # Output parameters
12
+ out_n_neurons: 10 # Possible sounds in the dataset
13
+
14
+
15
+ # Model params
16
+ compute_features: !new:speechbrain.lobes.features.Fbank
17
+ n_mels: !ref <n_mels>
18
+
19
+ mean_var_norm: !new:speechbrain.processing.features.InputNormalization
20
+ norm_type: sentence
21
+ std_norm: False
22
+
23
+ embedding_model: !new:speechbrain.lobes.models.ECAPA_TDNN.ECAPA_TDNN
24
+ input_size: !ref <n_mels>
25
+ channels: [1024, 1024, 1024, 1024, 3072]
26
+ kernel_sizes: [5, 3, 3, 1, 1]
27
+ dilations: [1, 2, 3, 4, 1]
28
+ attention_channels: 128
29
+ lin_neurons: 192
30
+
31
+ classifier: !new:speechbrain.lobes.models.ECAPA_TDNN.Classifier
32
+ input_shape: 192
33
+ out_neurons: !ref <out_n_neurons>
34
+
35
+ modules:
36
+ compute_features: !ref <compute_features>
37
+ mean_var_norm: !ref <mean_var_norm>
38
+ embedding_model: !ref <embedding_model>
39
+ classifier: !ref <classifier>
40
+
41
+ label_encoder: !new:speechbrain.dataio.encoder.CategoricalEncoder
42
+
43
+
44
+ pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
45
+ loadables:
46
+ embedding_model: !ref <embedding_model>
47
+ classifier: !ref <classifier>
48
+ label_encoder: !ref <label_encoder>
49
+ paths:
50
+ embedding_model: !ref <pretrained_path>/embedding_model.ckpt
51
+ classifier: !ref <pretrained_path>/classifier.ckpt
52
+ label_encoder: !ref <pretrained_path>/label_encoder.txt
label_encoder.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 'dog_bark' => 0
2
+ 'children_playing' => 1
3
+ 'air_conditioner' => 2
4
+ 'street_music' => 3
5
+ 'gun_shot' => 4
6
+ 'siren' => 5
7
+ 'engine_idling' => 6
8
+ 'jackhammer' => 7
9
+ 'drilling' => 8
10
+ 'car_horn' => 9
11
+ ================
12
+ 'starting_index' => 0
normalizer.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b584182e3c34b18768121c3020cc94790e15fc502610df317c29c1be5325bcf8
3
+ size 1153