Titouan commited on
Commit
c89c07f
1 Parent(s): d95b3dc

push our wav2vec2 modelgit add *!'

Browse files
Files changed (4) hide show
  1. README.md +105 -0
  2. hyperparams.yaml +40 -0
  3. latent_encoder.ckpt +3 -0
  4. latent_extractor.ckpt +3 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ thumbnail:
4
+ tags:
5
+ - pretraining
6
+ - CTC
7
+ - pytorch
8
+ - speechbrain
9
+ - speech
10
+ license: "apache-2.0"
11
+ datasets:
12
+ - commonvoice
13
+ ---
14
+
15
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
16
+ <br/><br/>
17
+
18
+ # wav2vec 2.0 base model pretrained on librispeech 960h
19
+
20
+ This HuggingFace repository provides all the necessary tools to extract wav2vec2
21
+ embeddings from a pretrained model. For a better experience, we encourage you to learn more about
22
+ [SpeechBrain](https://speechbrain.github.io). The wav2vec2 model has entirely been
23
+ pretrained with SpeechBrain (not with fairseq or HuggingFace).
24
+
25
+ The performance of the model is the following:
26
+
27
+ | Release | Test WER | GPUs |
28
+ |:-------------:|:--------------:|:--------------:| :--------:|
29
+ | 22-09-22 | 7.X | 1xV100 32GB |
30
+
31
+ ## Pipeline description
32
+
33
+ This w2v2 system is composed of 2 different but linked blocks:
34
+ - A convolutional backend to extract features from the raw waveform.
35
+ - A latent encoder made of a transformer network.
36
+ The obtained embeddings are the output of the transformer after going through each
37
+ block.
38
+
39
+ ## Install SpeechBrain
40
+
41
+ First of all, please install SpeechBrain with the following command:
42
+
43
+ ```
44
+ pip install speechbrain
45
+ ```
46
+
47
+ Please notice that we encourage you to read our tutorials and learn more about
48
+ [SpeechBrain](https://speechbrain.github.io).
49
+
50
+ ### Extracting embeddings for your own audio files
51
+
52
+ ```python
53
+ from speechbrain.pretrained import WaveformEncoder
54
+
55
+ ssl_model = WaveformEncoder.from_hparams(source="speechbrain/ssl-wav2vec2-base-librispeech", savedir="speechbrain/ssl-wav2vec2-base-librispeech")
56
+ ssl_model.transcribe_file("example-fr.wav")
57
+
58
+ ```
59
+ ### Inference on GPU
60
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
61
+
62
+ ### Training
63
+ The model was trained with SpeechBrain.
64
+ To train it from scratch follow these steps:
65
+ 1. Clone SpeechBrain:
66
+ ```bash
67
+ git clone https://github.com/speechbrain/speechbrain/
68
+ ```
69
+ 2. Install it:
70
+ ```bash
71
+ cd speechbrain
72
+ pip install -r requirements.txt
73
+ pip install -e .
74
+ ```
75
+
76
+ 3. Run Training:
77
+ ```bash
78
+ cd recipes/LibriSpeech/self-supervised-learning/wav2vec2
79
+ python train_sb_wav2vec2.py hparams/wav2vec2_base.yaml --data_folder=your_data_folder
80
+ ```
81
+
82
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1eXA6HQtiKfgrPejvvoKvRRfTEvOI3BQt?usp=sharing).
83
+
84
+ ### Limitations
85
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
86
+
87
+ #### Referencing SpeechBrain
88
+
89
+ ```
90
+ @misc{SB2021,
91
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
92
+ title = {SpeechBrain},
93
+ year = {2021},
94
+ publisher = {GitHub},
95
+ journal = {GitHub repository},
96
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
97
+ }
98
+ ```
99
+
100
+ #### About SpeechBrain
101
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
102
+
103
+ Website: https://speechbrain.github.io/
104
+
105
+ GitHub: https://github.com/speechbrain/speechbrain
hyperparams.yaml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ################################
2
+ # Model: wav2vec2
3
+ # Authors: Rudolf A. Braun 2022, Titouan Parcollet 2022
4
+ # ################################
5
+
6
+ sample_rate: 16000
7
+
8
+ # standard parameters for the BASE model
9
+ latent_extractor: !new:speechbrain.lobes.models.wav2vec.W2VLatentExtractor
10
+ out_channels: [512, 512, 512, 512, 512, 512, 512]
11
+
12
+ # standard parameters for the BASE model
13
+ latent_encoder: !new:speechbrain.lobes.models.transformer.Transformer.TransformerEncoder
14
+ d_model: 768
15
+ num_layers: 12
16
+ nhead: 8
17
+ d_ffn: 3072
18
+ dropout: 0.1
19
+ layerdrop_prob: 0.0
20
+ normalize_before: True
21
+ activation: !name:torch.nn.GELU
22
+
23
+ # standard parameters for the BASE model
24
+ encoder_wrapper: !new:speechbrain.lobes.models.wav2vec.EncoderWrapper
25
+ in_dim: 512
26
+ embedding_dim: 768
27
+ latent_encoder: !ref <latent_encoder>
28
+ dropout_encoder_input: 0.1
29
+
30
+ encoder: !new:speechbrain.nnet.containers.LengthsCapableSequential
31
+ latent_extractor: !ref <latent_extractor>
32
+ encoder_wrapper: !ref <encoder_wrapper>
33
+
34
+ modules:
35
+ encoder: !ref <encoder>
36
+
37
+ pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
38
+ loadables:
39
+ latent_encoder: !ref <encoder_wrapper>
40
+ latent_extractor: !ref <latent_extractor>
latent_encoder.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51e99c06d6669d41563e023b747d6187f87ca786a2007d83a3a74feea1e884f2
3
+ size 349543637
latent_extractor.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2542aee415c988a0452322e1278f0ac045c5bcb294b54cf6a4ff7c0491bbdc7
3
+ size 18939616