Audio-to-Audio
ESPnet
English
audio
lichenda commited on
Commit
1c51ec3
1 Parent(s): 7bb64ad

Update model

Browse files
README.md ADDED
@@ -0,0 +1,266 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language: en
7
+ datasets:
8
+ - wsj0_2mix
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### `lichenda/wsj0_2mix_skim_noncausal`
15
+
16
+ This model was trained by LiChenda using wsj0_2mix recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ ```bash
21
+ cd espnet
22
+ git checkout ac3c10cfe4faf82c0bb30f8b32d9e8692363e0a9
23
+ pip install -e .
24
+ cd egs2/wsj0_2mix/enh1
25
+ ./run.sh --skip_data_prep false --skip_train true --download_model lichenda/wsj0_2mix_skim_noncausal
26
+ ```
27
+
28
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
29
+ # RESULTS
30
+ ## Environments
31
+ - date: `Wed Feb 23 16:42:06 CST 2022`
32
+ - python version: `3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]`
33
+ - espnet version: `espnet 0.10.7a1`
34
+ - pytorch version: `pytorch 1.8.1`
35
+ - Git hash: `ac3c10cfe4faf82c0bb30f8b32d9e8692363e0a9`
36
+ - Commit date: `Fri Feb 11 16:22:52 2022 +0800`
37
+
38
+
39
+ ## ..
40
+
41
+ config: conf/tuning/train_enh_skim_tasnet_noncausal.yaml
42
+
43
+ |dataset|STOI|SAR|SDR|SIR|
44
+ |---|---|---|---|---|
45
+ |enhanced_cv_min_8k|0.96|19.17|18.70|29.56|
46
+ |enhanced_tt_min_8k|0.97|18.96|18.45|29.31|
47
+
48
+ ## ENH config
49
+
50
+ <details><summary>expand</summary>
51
+
52
+ ```
53
+ config: conf/tuning/train_enh_skim_tasnet_noncausal.yaml
54
+ print_config: false
55
+ log_level: INFO
56
+ dry_run: false
57
+ iterator_type: chunk
58
+ output_dir: exp/enh_train_enh_skim_tasnet_noncausal_raw
59
+ ngpu: 1
60
+ seed: 0
61
+ num_workers: 4
62
+ num_att_plot: 3
63
+ dist_backend: nccl
64
+ dist_init_method: env://
65
+ dist_world_size: null
66
+ dist_rank: null
67
+ local_rank: 0
68
+ dist_master_addr: null
69
+ dist_master_port: null
70
+ dist_launcher: null
71
+ multiprocessing_distributed: false
72
+ unused_parameters: false
73
+ sharded_ddp: false
74
+ cudnn_enabled: true
75
+ cudnn_benchmark: false
76
+ cudnn_deterministic: true
77
+ collect_stats: false
78
+ write_collected_feats: false
79
+ max_epoch: 150
80
+ patience: 20
81
+ val_scheduler_criterion:
82
+ - valid
83
+ - loss
84
+ early_stopping_criterion:
85
+ - valid
86
+ - loss
87
+ - min
88
+ best_model_criterion:
89
+ - - valid
90
+ - si_snr
91
+ - max
92
+ - - valid
93
+ - loss
94
+ - min
95
+ keep_nbest_models: 1
96
+ nbest_averaging_interval: 0
97
+ grad_clip: 5.0
98
+ grad_clip_type: 2.0
99
+ grad_noise: false
100
+ accum_grad: 1
101
+ no_forward_run: false
102
+ resume: true
103
+ train_dtype: float32
104
+ use_amp: false
105
+ log_interval: null
106
+ use_matplotlib: true
107
+ use_tensorboard: true
108
+ use_wandb: false
109
+ wandb_project: null
110
+ wandb_id: null
111
+ wandb_entity: null
112
+ wandb_name: null
113
+ wandb_model_log_interval: -1
114
+ detect_anomaly: false
115
+ pretrain_path: null
116
+ init_param: []
117
+ ignore_init_mismatch: false
118
+ freeze_param: []
119
+ num_iters_per_epoch: null
120
+ batch_size: 8
121
+ valid_batch_size: null
122
+ batch_bins: 1000000
123
+ valid_batch_bins: null
124
+ train_shape_file:
125
+ - exp/enh_stats_8k/train/speech_mix_shape
126
+ - exp/enh_stats_8k/train/speech_ref1_shape
127
+ - exp/enh_stats_8k/train/speech_ref2_shape
128
+ valid_shape_file:
129
+ - exp/enh_stats_8k/valid/speech_mix_shape
130
+ - exp/enh_stats_8k/valid/speech_ref1_shape
131
+ - exp/enh_stats_8k/valid/speech_ref2_shape
132
+ batch_type: folded
133
+ valid_batch_type: null
134
+ fold_length:
135
+ - 80000
136
+ - 80000
137
+ - 80000
138
+ sort_in_batch: descending
139
+ sort_batch: descending
140
+ multiple_iterator: false
141
+ chunk_length: 16000
142
+ chunk_shift_ratio: 0.5
143
+ num_cache_chunks: 1024
144
+ train_data_path_and_name_and_type:
145
+ - - dump/raw/tr_min_8k/wav.scp
146
+ - speech_mix
147
+ - sound
148
+ - - dump/raw/tr_min_8k/spk1.scp
149
+ - speech_ref1
150
+ - sound
151
+ - - dump/raw/tr_min_8k/spk2.scp
152
+ - speech_ref2
153
+ - sound
154
+ valid_data_path_and_name_and_type:
155
+ - - dump/raw/cv_min_8k/wav.scp
156
+ - speech_mix
157
+ - sound
158
+ - - dump/raw/cv_min_8k/spk1.scp
159
+ - speech_ref1
160
+ - sound
161
+ - - dump/raw/cv_min_8k/spk2.scp
162
+ - speech_ref2
163
+ - sound
164
+ allow_variable_data_keys: false
165
+ max_cache_size: 0.0
166
+ max_cache_fd: 32
167
+ valid_max_cache_size: null
168
+ optim: adam
169
+ optim_conf:
170
+ lr: 0.001
171
+ eps: 1.0e-08
172
+ weight_decay: 0
173
+ scheduler: reducelronplateau
174
+ scheduler_conf:
175
+ mode: min
176
+ factor: 0.7
177
+ patience: 1
178
+ init: xavier_uniform
179
+ model_conf:
180
+ stft_consistency: false
181
+ loss_type: mask_mse
182
+ mask_type: null
183
+ criterions:
184
+ - name: si_snr
185
+ conf:
186
+ eps: 1.0e-07
187
+ wrapper: pit
188
+ wrapper_conf:
189
+ weight: 1.0
190
+ independent_perm: true
191
+ use_preprocessor: false
192
+ encoder: conv
193
+ encoder_conf:
194
+ channel: 64
195
+ kernel_size: 2
196
+ stride: 1
197
+ separator: skim
198
+ separator_conf:
199
+ causal: false
200
+ num_spk: 2
201
+ layer: 6
202
+ nonlinear: relu
203
+ unit: 128
204
+ segment_size: 250
205
+ dropout: 0.1
206
+ mem_type: hc
207
+ seg_overlap: true
208
+ decoder: conv
209
+ decoder_conf:
210
+ channel: 64
211
+ kernel_size: 2
212
+ stride: 1
213
+ required:
214
+ - output_dir
215
+ version: 0.10.7a1
216
+ distributed: false
217
+ ```
218
+
219
+ </details>
220
+
221
+
222
+
223
+ ### Citing ESPnet
224
+
225
+ ```BibTex
226
+ @inproceedings{watanabe2018espnet,
227
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
228
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
229
+ year={2018},
230
+ booktitle={Proceedings of Interspeech},
231
+ pages={2207--2211},
232
+ doi={10.21437/Interspeech.2018-1456},
233
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
234
+ }
235
+
236
+
237
+ @inproceedings{ESPnet-SE,
238
+ author = {Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and
239
+ Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph B{"{o}}ddeker and Zhuo Chen and Shinji Watanabe},
240
+ title = {ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
241
+ booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2021, Shenzhen, China, January 19-22, 2021},
242
+ pages = {785--792},
243
+ publisher = {{IEEE}},
244
+ year = {2021},
245
+ url = {https://doi.org/10.1109/SLT48900.2021.9383615},
246
+ doi = {10.1109/SLT48900.2021.9383615},
247
+ timestamp = {Mon, 12 Apr 2021 17:08:59 +0200},
248
+ biburl = {https://dblp.org/rec/conf/slt/Li0ZSCKHHBC021.bib},
249
+ bibsource = {dblp computer science bibliography, https://dblp.org}
250
+ }
251
+
252
+
253
+ ```
254
+
255
+ or arXiv:
256
+
257
+ ```bibtex
258
+ @misc{watanabe2018espnet,
259
+ title={ESPnet: End-to-End Speech Processing Toolkit},
260
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
261
+ year={2018},
262
+ eprint={1804.00015},
263
+ archivePrefix={arXiv},
264
+ primaryClass={cs.CL}
265
+ }
266
+ ```
exp/enh_stats_8k/train/feats_stats.npz ADDED
Binary file (778 Bytes). View file
 
exp/enh_train_enh_skim_tasnet_noncausal_raw/146epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:078d9c4cbc6a25a3d3df1e47a44355caa8be06d6fa48aed0e1c4884ac228b108
3
+ size 23716652
exp/enh_train_enh_skim_tasnet_noncausal_raw/RESULTS.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Wed Feb 23 16:42:06 CST 2022`
5
+ - python version: `3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]`
6
+ - espnet version: `espnet 0.10.7a1`
7
+ - pytorch version: `pytorch 1.8.1`
8
+ - Git hash: `ac3c10cfe4faf82c0bb30f8b32d9e8692363e0a9`
9
+ - Commit date: `Fri Feb 11 16:22:52 2022 +0800`
10
+
11
+
12
+ ## ..
13
+
14
+ config: conf/tuning/train_enh_skim_tasnet_noncausal.yaml
15
+
16
+ |dataset|STOI|SAR|SDR|SIR|
17
+ |---|---|---|---|---|
18
+ |enhanced_cv_min_8k|0.96|19.17|18.70|29.56|
19
+ |enhanced_tt_min_8k|0.97|18.96|18.45|29.31|
20
+
exp/enh_train_enh_skim_tasnet_noncausal_raw/config.yaml ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_enh_skim_tasnet_noncausal.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: chunk
6
+ output_dir: exp/enh_train_enh_skim_tasnet_noncausal_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: false
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 150
28
+ patience: 20
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - si_snr
39
+ - max
40
+ - - valid
41
+ - loss
42
+ - min
43
+ keep_nbest_models: 1
44
+ nbest_averaging_interval: 0
45
+ grad_clip: 5.0
46
+ grad_clip_type: 2.0
47
+ grad_noise: false
48
+ accum_grad: 1
49
+ no_forward_run: false
50
+ resume: true
51
+ train_dtype: float32
52
+ use_amp: false
53
+ log_interval: null
54
+ use_matplotlib: true
55
+ use_tensorboard: true
56
+ use_wandb: false
57
+ wandb_project: null
58
+ wandb_id: null
59
+ wandb_entity: null
60
+ wandb_name: null
61
+ wandb_model_log_interval: -1
62
+ detect_anomaly: false
63
+ pretrain_path: null
64
+ init_param: []
65
+ ignore_init_mismatch: false
66
+ freeze_param: []
67
+ num_iters_per_epoch: null
68
+ batch_size: 8
69
+ valid_batch_size: null
70
+ batch_bins: 1000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp/enh_stats_8k/train/speech_mix_shape
74
+ - exp/enh_stats_8k/train/speech_ref1_shape
75
+ - exp/enh_stats_8k/train/speech_ref2_shape
76
+ valid_shape_file:
77
+ - exp/enh_stats_8k/valid/speech_mix_shape
78
+ - exp/enh_stats_8k/valid/speech_ref1_shape
79
+ - exp/enh_stats_8k/valid/speech_ref2_shape
80
+ batch_type: folded
81
+ valid_batch_type: null
82
+ fold_length:
83
+ - 80000
84
+ - 80000
85
+ - 80000
86
+ sort_in_batch: descending
87
+ sort_batch: descending
88
+ multiple_iterator: false
89
+ chunk_length: 16000
90
+ chunk_shift_ratio: 0.5
91
+ num_cache_chunks: 1024
92
+ train_data_path_and_name_and_type:
93
+ - - dump/raw/tr_min_8k/wav.scp
94
+ - speech_mix
95
+ - sound
96
+ - - dump/raw/tr_min_8k/spk1.scp
97
+ - speech_ref1
98
+ - sound
99
+ - - dump/raw/tr_min_8k/spk2.scp
100
+ - speech_ref2
101
+ - sound
102
+ valid_data_path_and_name_and_type:
103
+ - - dump/raw/cv_min_8k/wav.scp
104
+ - speech_mix
105
+ - sound
106
+ - - dump/raw/cv_min_8k/spk1.scp
107
+ - speech_ref1
108
+ - sound
109
+ - - dump/raw/cv_min_8k/spk2.scp
110
+ - speech_ref2
111
+ - sound
112
+ allow_variable_data_keys: false
113
+ max_cache_size: 0.0
114
+ max_cache_fd: 32
115
+ valid_max_cache_size: null
116
+ optim: adam
117
+ optim_conf:
118
+ lr: 0.001
119
+ eps: 1.0e-08
120
+ weight_decay: 0
121
+ scheduler: reducelronplateau
122
+ scheduler_conf:
123
+ mode: min
124
+ factor: 0.7
125
+ patience: 1
126
+ init: xavier_uniform
127
+ model_conf:
128
+ stft_consistency: false
129
+ loss_type: mask_mse
130
+ mask_type: null
131
+ criterions:
132
+ - name: si_snr
133
+ conf:
134
+ eps: 1.0e-07
135
+ wrapper: pit
136
+ wrapper_conf:
137
+ weight: 1.0
138
+ independent_perm: true
139
+ use_preprocessor: false
140
+ encoder: conv
141
+ encoder_conf:
142
+ channel: 64
143
+ kernel_size: 2
144
+ stride: 1
145
+ separator: skim
146
+ separator_conf:
147
+ causal: false
148
+ num_spk: 2
149
+ layer: 6
150
+ nonlinear: relu
151
+ unit: 128
152
+ segment_size: 250
153
+ dropout: 0.1
154
+ mem_type: hc
155
+ seg_overlap: true
156
+ decoder: conv
157
+ decoder_conf:
158
+ channel: 64
159
+ kernel_size: 2
160
+ stride: 1
161
+ required:
162
+ - output_dir
163
+ version: 0.10.7a1
164
+ distributed: false
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/backward_time.png ADDED
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/forward_time.png ADDED
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/gpu_max_cached_mem_GB.png ADDED
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/iter_time.png ADDED
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/loss.png ADDED
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/optim0_lr0.png ADDED
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/optim_step_time.png ADDED
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/si_snr_loss.png ADDED
exp/enh_train_enh_skim_tasnet_noncausal_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: 0.10.7a1
2
+ files:
3
+ model_file: exp/enh_train_enh_skim_tasnet_noncausal_raw/146epoch.pth
4
+ python: "3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]"
5
+ timestamp: 1645606563.393269
6
+ torch: 1.8.1
7
+ yaml_files:
8
+ train_config: exp/enh_train_enh_skim_tasnet_noncausal_raw/config.yaml