upload model

Browse files

Files changed (7) hide show

.gitattributes +3 -0
README.md +122 -0
classifier.ckpt +3 -0
embedding_model.ckpt +3 -0
hyperparams.yaml +52 -0
label_encoder.txt +12 -0
normalizer.ckpt +3 -0

.gitattributes CHANGED Viewed

@@ -14,3 +14,6 @@
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text

 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
+classifier.ckpt filter=lfs diff=lfs merge=lfs -text
+embedding_model.ckpt filter=lfs diff=lfs merge=lfs -text
+normalizer.ckpt filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,122 @@

+---
+language: "en"
+thumbnail:
+tags:
+- embeddings
+- Speaker
+- Verification
+- Identification
+- pytorch
+- xvectors
+- TDNN
+license: "apache-2.0"
+datasets:
+- voxceleb
+metrics:
+- EER
+- min_dct
+---
+<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
+<br/><br/>
+# Speaker Verification with xvector embeddings on Voxceleb
+This repository provides all the necessary tools to extract speaker embeddings with a pretrained TDNN model using SpeechBrain.
+The system is trained on Voxceleb 1+ Voxceleb2 training data.
+For a better experience, we encourage you to learn more about
+[SpeechBrain](https://speechbrain.github.io). The given model performance on Voxceleb1-test set (Cleaned) is:
+| Release | EER(%)
+|:-------------:|:--------------:|
+| 05-03-21 | 3.2 |
+## Pipeline description
+This system is composed of a TDNN model coupled with statistical pooling. The system is trained with Categorical Cross-Entropy Loss.
+## Install SpeechBrain
+First of all, please install SpeechBrain with the following command:
+```
+pip install speechbrain
+```
+Please notice that we encourage you to read our tutorials and learn more about
+[SpeechBrain](https://speechbrain.github.io).
+### Compute your speaker embeddings
+```python
+import torchaudio
+from speechbrain.pretrained import EncoderClassifier
+classifier = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb", savedir="pretrained_models/spkrec-xvect-voxceleb")
+signal, fs =torchaudio.load('samples/audio_samples/example1.wav')
+embeddings = classifier.encode_batch(signal)
+```
+### Inference on GPU
+To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
+### Training
+The model was trained with SpeechBrain (aa018540).
+To train it from scratch follows these steps:
+1. Clone SpeechBrain:
+```bash
+git clone https://github.com/speechbrain/speechbrain/
+```
+2. Install it:
+```
+cd speechbrain
+pip install -r requirements.txt
+pip install -e .
+```
+3. Run Training:
+```
+cd  recipes/VoxCeleb/SpeakerRec/
+python train_speaker_embeddings.py hparams/train_x_vectors.yaml --data_folder=your_data_folder
+```
+You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1RtCBJ3O8iOCkFrJItCKT9oL-Q1MNCwMH?usp=sharing).
+### Limitations
+The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
+#### Referencing xvectors
+```@inproceedings{DBLP:conf/odyssey/SnyderGMSPK18,
+  author    = {David Snyder and
+               Daniel Garcia{-}Romero and
+               Alan McCree and
+               Gregory Sell and
+               Daniel Povey and
+               Sanjeev Khudanpur},
+  title     = {Spoken Language Recognition using X-vectors},
+  booktitle = {Odyssey 2018},
+  pages     = {105--111},
+  year      = {2018},
+}
+```
+#### Referencing SpeechBrain
+```
+@misc{SB2021,
+    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
+    title = {SpeechBrain},
+    year = {2021},
+    publisher = {GitHub},
+    journal = {GitHub repository},
+    howpublished = {\url{https://github.com/speechbrain/speechbrain}},
+  }
+```
+#### About SpeechBrain
+SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
+Website: https://speechbrain.github.io/
+GitHub: https://github.com/speechbrain/speechbrain

classifier.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7c811b8af2c91e2917e3e967695330cfcad6682391e8ad1f3b8295e625ed506
+size 8568

embedding_model.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7adfcb7acca11914e9556e4390ac23b25296cbe8efc682a9eda43d1608e5f44a
+size 83316686

hyperparams.yaml ADDED Viewed

	@@ -0,0 +1,52 @@

+# ############################################################################
+# Model: xvector for Sound Classification with UrbanSound8k
+# ############################################################################
+# Pretrain folder (HuggingFace)
+pretrained_path: speechbrain/urbansound8k_ecapa
+# Feature parameters
+n_mels: 80
+# Output parameters
+out_n_neurons: 10 # Possible sounds in the dataset
+# Model params
+compute_features: !new:speechbrain.lobes.features.Fbank
+    n_mels: !ref <n_mels>
+mean_var_norm: !new:speechbrain.processing.features.InputNormalization
+    norm_type: sentence
+    std_norm: False
+embedding_model: !new:speechbrain.lobes.models.ECAPA_TDNN.ECAPA_TDNN
+    input_size: !ref <n_mels>
+    channels: [1024, 1024, 1024, 1024, 3072]
+    kernel_sizes: [5, 3, 3, 1, 1]
+    dilations: [1, 2, 3, 4, 1]
+    attention_channels: 128
+    lin_neurons: 192
+classifier: !new:speechbrain.lobes.models.ECAPA_TDNN.Classifier
+    input_shape: 192
+    out_neurons: !ref <out_n_neurons>
+modules:
+    compute_features: !ref <compute_features>
+    mean_var_norm: !ref <mean_var_norm>
+    embedding_model: !ref <embedding_model>
+    classifier: !ref <classifier>
+label_encoder: !new:speechbrain.dataio.encoder.CategoricalEncoder
+pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
+    loadables:
+        embedding_model: !ref <embedding_model>
+        classifier: !ref <classifier>
+        label_encoder: !ref <label_encoder>
+    paths:
+        embedding_model: !ref <pretrained_path>/embedding_model.ckpt
+        classifier: !ref <pretrained_path>/classifier.ckpt
+        label_encoder: !ref <pretrained_path>/label_encoder.txt

label_encoder.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+'dog_bark' => 0
+'children_playing' => 1
+'air_conditioner' => 2
+'street_music' => 3
+'gun_shot' => 4
+'siren' => 5
+'engine_idling' => 6
+'jackhammer' => 7
+'drilling' => 8
+'car_horn' => 9
+================
+'starting_index' => 0

normalizer.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b584182e3c34b18768121c3020cc94790e15fc502610df317c29c1be5325bcf8
+size 1153