push our wav2vec2 modelgit add *!'

Browse files

Files changed (4) hide show

README.md +105 -0
hyperparams.yaml +40 -0
latent_encoder.ckpt +3 -0
latent_extractor.ckpt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,105 @@

+---
+language: "en"
+thumbnail:
+tags:
+- pretraining
+- CTC
+- pytorch
+- speechbrain
+- speech
+license: "apache-2.0"
+datasets:
+- commonvoice
+---
+<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
+<br/><br/>
+# wav2vec 2.0 base model pretrained on librispeech 960h
+This HuggingFace repository provides all the necessary tools to extract wav2vec2
+embeddings from a pretrained model. For a better experience, we encourage you to learn more about
+[SpeechBrain](https://speechbrain.github.io). The wav2vec2 model has entirely been
+pretrained with SpeechBrain (not with fairseq or HuggingFace).
+The performance of the model is the following:
+| Release | Test WER | GPUs |
+|:-------------:|:--------------:|:--------------:| :--------:|
+| 22-09-22 | 7.X | 1xV100 32GB |
+## Pipeline description
+This w2v2 system is composed of 2 different but linked blocks:
+- A convolutional backend to extract features from the raw waveform.
+- A latent encoder made of a transformer network.
+The obtained embeddings are the output of the transformer after going through each
+block.
+## Install SpeechBrain
+First of all, please install SpeechBrain with the following command:
+```
+pip install speechbrain
+```
+Please notice that we encourage you to read our tutorials and learn more about
+[SpeechBrain](https://speechbrain.github.io).
+### Extracting embeddings for your own audio files
+```python
+from speechbrain.pretrained import WaveformEncoder
+ssl_model = WaveformEncoder.from_hparams(source="speechbrain/ssl-wav2vec2-base-librispeech", savedir="speechbrain/ssl-wav2vec2-base-librispeech")
+ssl_model.transcribe_file("example-fr.wav")
+```
+### Inference on GPU
+To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
+### Training
+The model was trained with SpeechBrain.
+To train it from scratch follow these steps:
+1. Clone SpeechBrain:
+```bash
+git clone https://github.com/speechbrain/speechbrain/
+```
+2. Install it:
+```bash
+cd speechbrain
+pip install -r requirements.txt
+pip install -e .
+```
+3. Run Training:
+```bash
+cd recipes/LibriSpeech/self-supervised-learning/wav2vec2
+python train_sb_wav2vec2.py hparams/wav2vec2_base.yaml --data_folder=your_data_folder
+```
+You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1eXA6HQtiKfgrPejvvoKvRRfTEvOI3BQt?usp=sharing).
+### Limitations
+The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
+#### Referencing SpeechBrain
+```
+@misc{SB2021,
+    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
+    title = {SpeechBrain},
+    year = {2021},
+    publisher = {GitHub},
+    journal = {GitHub repository},
+    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
+  }
+```
+#### About SpeechBrain
+SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
+Website: https://speechbrain.github.io/
+GitHub: https://github.com/speechbrain/speechbrain

hyperparams.yaml ADDED Viewed

	@@ -0,0 +1,40 @@

+# ################################
+# Model: wav2vec2
+# Authors: Rudolf A. Braun 2022, Titouan Parcollet 2022
+# ################################
+sample_rate: 16000
+# standard parameters for the BASE model
+latent_extractor: !new:speechbrain.lobes.models.wav2vec.W2VLatentExtractor
+   out_channels: [512, 512, 512, 512, 512, 512, 512]
+# standard parameters for the BASE model
+latent_encoder: !new:speechbrain.lobes.models.transformer.Transformer.TransformerEncoder
+   d_model: 768
+   num_layers: 12
+   nhead: 8
+   d_ffn: 3072
+   dropout: 0.1
+   layerdrop_prob: 0.0
+   normalize_before: True
+   activation: !name:torch.nn.GELU
+# standard parameters for the BASE model
+encoder_wrapper: !new:speechbrain.lobes.models.wav2vec.EncoderWrapper
+   in_dim: 512
+   embedding_dim: 768
+   latent_encoder: !ref <latent_encoder>
+   dropout_encoder_input: 0.1
+encoder: !new:speechbrain.nnet.containers.LengthsCapableSequential
+    latent_extractor: !ref <latent_extractor>
+    encoder_wrapper: !ref <encoder_wrapper>
+modules:
+   encoder: !ref <encoder>
+pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
+    loadables:
+        latent_encoder: !ref <encoder_wrapper>
+        latent_extractor: !ref <latent_extractor>

latent_encoder.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51e99c06d6669d41563e023b747d6187f87ca786a2007d83a3a74feea1e884f2
+size 349543637

latent_extractor.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2542aee415c988a0452322e1278f0ac045c5bcb294b54cf6a4ff7c0491bbdc7
+size 18939616