Emrys365 commited on
Commit
53643e6
1 Parent(s): 0de571b

Update model

Browse files
Files changed (45) hide show
  1. README.md +379 -3
  2. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/83epoch.pth +3 -0
  3. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/config.yaml +237 -0
  4. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/enhanced_test_16k/RESULTS.md +23 -0
  5. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/enhanced_test_48k/RESULTS.md +18 -0
  6. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/backward_time.png +0 -0
  7. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/clip.png +0 -0
  8. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/forward_time.png +0 -0
  9. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/grad_norm.png +0 -0
  11. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/iter_time.png +0 -0
  12. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_16k.png +0 -0
  13. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png +0 -0
  14. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_24k.png +0 -0
  15. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_48k.png +0 -0
  16. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_8k.png +0 -0
  17. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png +0 -0
  18. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_2ch_16k.png +0 -0
  19. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png +0 -0
  20. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_2ch_8k.png +0 -0
  21. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png +0 -0
  22. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_5ch_16k.png +0 -0
  23. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_5ch_8k.png +0 -0
  24. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png +0 -0
  25. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png +0 -0
  26. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/loss.png +0 -0
  27. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/loss_scale.png +0 -0
  28. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/optim0_lr0.png +0 -0
  29. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/optim_step_time.png +0 -0
  30. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_16k.png +0 -0
  31. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_16k_r.png +0 -0
  32. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_24k.png +0 -0
  33. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_48k.png +0 -0
  34. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_8k.png +0 -0
  35. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_8k_r.png +0 -0
  36. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_2ch_16k.png +0 -0
  37. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_2ch_16k_r.png +0 -0
  38. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_2ch_8k.png +0 -0
  39. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_2ch_8k_r.png +0 -0
  40. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_5ch_16k.png +0 -0
  41. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_5ch_8k.png +0 -0
  42. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_8ch_16k_r.png +0 -0
  43. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_8ch_8k_r.png +0 -0
  44. exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/train_time.png +0 -0
  45. meta.yaml +8 -0
README.md CHANGED
@@ -1,3 +1,379 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - audio-to-audio
6
+ language: en
7
+ datasets:
8
+ - universal_se
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ENH model
13
+
14
+ ### `wyz/vctk_dns2020_whamr_tfgridnet_xxtiny`
15
+
16
+ This model was trained by wyz using universal_se recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ To use the model in the Python interface, you could use the following code:
24
+
25
+ ```python
26
+ import soundfile as sf
27
+ from espnet2.bin.enh_inference import SeparateSpeech
28
+
29
+ # For model downloading + loading
30
+ model = SeparateSpeech.from_pretrained(
31
+ model_tag="wyz/vctk_dns2020_whamr_tfgridnet_xxtiny",
32
+ normalize_output_wav=True,
33
+ device="cuda",
34
+ )
35
+ # For loading a downloaded model
36
+ # model = SeparateSpeech(
37
+ # train_config="exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/config.yaml",
38
+ # model_file="exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/xxxx.pth",
39
+ # normalize_output_wav=True,
40
+ # device="cuda",
41
+ # )
42
+
43
+ audio, fs = sf.read("/path/to/noisy/utt1.flac")
44
+ enhanced = model(audio[None, :], fs=fs)[0]
45
+ ```
46
+
47
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
48
+ # RESULTS
49
+ ## Environments
50
+ - date: `Sun Mar 3 22:03:37 EST 2024`
51
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
52
+ - espnet version: `espnet 202304`
53
+ - pytorch version: `pytorch 2.0.1+cu118`
54
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
55
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
56
+
57
+
58
+ ## enhanced_test_16k
59
+
60
+
61
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
62
+ |---|---|---|---|---|---|---|---|---|---|---|
63
+ |chime4_et05_real_isolated_6ch_track|1.18|52.77|-3.19|-3.19|0.00|-31.84|2.75|3.06|3.75|3.47|
64
+ |chime4_et05_simu_isolated_6ch_track|1.42|81.92|8.23|8.23|0.00|1.62|2.63|3.00|3.64|3.13|
65
+ |dns20_tt_synthetic_no_reverb|2.82|96.64|17.99|17.99|0.00|17.85|3.24|3.52|4.01|3.93|
66
+ |reverb_et_real_8ch_multich|1.13|70.16|3.29|3.29|0.00|0.91|2.99|3.30|3.91|3.76|
67
+ |reverb_et_simu_8ch_multich|2.12|92.45|10.33|10.33|0.00|-8.65|3.12|3.42|3.96|3.84|
68
+ |whamr_tt_mix_single_reverb_max_16k|1.84|90.09|8.74|8.74|0.00|5.98|3.01|3.30|3.95|3.57|
69
+
70
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
71
+ # RESULTS
72
+ ## Environments
73
+ - date: `Sun Mar 3 16:44:15 EST 2024`
74
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
75
+ - espnet version: `espnet 202304`
76
+ - pytorch version: `pytorch 2.0.1+cu118`
77
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
78
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
79
+
80
+
81
+ ## enhanced_test_48k
82
+
83
+
84
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
85
+ |---|---|---|---|---|---|---|---|---|---|
86
+ |vctk_noisy_tt_2spk|94.58|19.73|19.73|0.00|18.14|3.09|3.41|3.93|3.50|
87
+
88
+ ## ENH config
89
+
90
+ <details><summary>expand</summary>
91
+
92
+ ```
93
+ config: conf/tuning/tfgridnet_xtiny.yaml
94
+ print_config: false
95
+ log_level: INFO
96
+ dry_run: false
97
+ iterator_type: chunk
98
+ output_dir: exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw
99
+ ngpu: 1
100
+ seed: 0
101
+ num_workers: 4
102
+ num_att_plot: 3
103
+ dist_backend: nccl
104
+ dist_init_method: env://
105
+ dist_world_size: null
106
+ dist_rank: null
107
+ local_rank: 0
108
+ dist_master_addr: null
109
+ dist_master_port: null
110
+ dist_launcher: null
111
+ multiprocessing_distributed: false
112
+ unused_parameters: true
113
+ sharded_ddp: false
114
+ cudnn_enabled: true
115
+ cudnn_benchmark: false
116
+ cudnn_deterministic: true
117
+ collect_stats: false
118
+ write_collected_feats: false
119
+ max_epoch: 100
120
+ patience: 40
121
+ val_scheduler_criterion:
122
+ - valid
123
+ - loss
124
+ early_stopping_criterion:
125
+ - valid
126
+ - loss
127
+ - min
128
+ best_model_criterion:
129
+ - - valid
130
+ - loss
131
+ - min
132
+ keep_nbest_models: 1
133
+ nbest_averaging_interval: 0
134
+ grad_clip: 5.0
135
+ grad_clip_type: 2.0
136
+ grad_noise: false
137
+ accum_grad: 1
138
+ no_forward_run: false
139
+ resume: true
140
+ save_interval: 1000
141
+ train_dtype: float32
142
+ use_amp: false
143
+ log_interval: null
144
+ use_matplotlib: true
145
+ use_tensorboard: true
146
+ create_graph_in_tensorboard: false
147
+ use_wandb: false
148
+ wandb_project: null
149
+ wandb_id: null
150
+ wandb_entity: null
151
+ wandb_name: null
152
+ wandb_model_log_interval: -1
153
+ detect_anomaly: false
154
+ pretrain_path: null
155
+ init_param: []
156
+ ignore_init_mismatch: false
157
+ freeze_param: []
158
+ num_iters_per_epoch: 8000
159
+ num_iters_valid: null
160
+ batch_size: 4
161
+ valid_batch_size: null
162
+ batch_bins: 1000000
163
+ valid_batch_bins: null
164
+ train_shape_file:
165
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
166
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
167
+ valid_shape_file:
168
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
169
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
170
+ batch_type: folded
171
+ valid_batch_type: null
172
+ fold_length:
173
+ - 80000
174
+ - 80000
175
+ sort_in_batch: descending
176
+ sort_batch: descending
177
+ multiple_iterator: false
178
+ chunk_length: 32000
179
+ chunk_shift_ratio: 0.5
180
+ num_cache_chunks: 1024
181
+ chunk_excluded_key_prefixes: []
182
+ chunk_discard_short_samples: false
183
+ train_data_path_and_name_and_type:
184
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
185
+ - speech_mix
186
+ - sound
187
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
188
+ - speech_ref1
189
+ - sound
190
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
191
+ - category
192
+ - text
193
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
194
+ - fs
195
+ - text_int
196
+ valid_data_path_and_name_and_type:
197
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
198
+ - speech_mix
199
+ - sound
200
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
201
+ - speech_ref1
202
+ - sound
203
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
204
+ - category
205
+ - text
206
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
207
+ - fs
208
+ - text_int
209
+ allow_variable_data_keys: false
210
+ max_cache_size: 0.0
211
+ max_cache_fd: 32
212
+ allow_multi_rates: true
213
+ valid_max_cache_size: null
214
+ exclude_weight_decay: false
215
+ exclude_weight_decay_conf: {}
216
+ optim: adam
217
+ optim_conf:
218
+ lr: 0.001
219
+ eps: 1.0e-08
220
+ weight_decay: 1.0e-05
221
+ scheduler: steplr
222
+ scheduler_conf:
223
+ step_size: 2
224
+ gamma: 0.99
225
+ init: null
226
+ model_conf:
227
+ normalize_variance_per_ch: true
228
+ categories:
229
+ - 1ch_8k
230
+ - 1ch_8k_r
231
+ - 1ch_16k_r
232
+ - 1ch_48k
233
+ - 1ch_24k
234
+ - 1ch_16k
235
+ - 2ch_8k
236
+ - 2ch_8k_r
237
+ - 2ch_16k
238
+ - 2ch_16k_r
239
+ - 5ch_8k
240
+ - 5ch_16k
241
+ - 8ch_8k_r
242
+ - 8ch_16k_r
243
+ criterions:
244
+ - name: mr_l1_tfd
245
+ conf:
246
+ window_sz:
247
+ - 256
248
+ - 512
249
+ - 768
250
+ - 1024
251
+ hop_sz: null
252
+ eps: 1.0e-08
253
+ time_domain_weight: 0.5
254
+ normalize_variance: true
255
+ wrapper: fixed_order
256
+ wrapper_conf:
257
+ weight: 1.0
258
+ - name: si_snr
259
+ conf:
260
+ eps: 1.0e-07
261
+ wrapper: fixed_order
262
+ wrapper_conf:
263
+ weight: 0.0
264
+ speech_volume_normalize: null
265
+ rir_scp: null
266
+ rir_apply_prob: 1.0
267
+ noise_scp: null
268
+ noise_apply_prob: 1.0
269
+ noise_db_range: '13_15'
270
+ short_noise_thres: 0.5
271
+ use_reverberant_ref: false
272
+ num_spk: 1
273
+ num_noise_type: 1
274
+ sample_rate: 8000
275
+ force_single_channel: true
276
+ channel_reordering: true
277
+ categories:
278
+ - 1ch_8k
279
+ - 1ch_8k_r
280
+ - 1ch_16k_r
281
+ - 1ch_48k
282
+ - 1ch_24k
283
+ - 1ch_16k
284
+ - 2ch_8k
285
+ - 2ch_8k_r
286
+ - 2ch_16k
287
+ - 2ch_16k_r
288
+ - 5ch_8k
289
+ - 5ch_16k
290
+ - 8ch_8k_r
291
+ - 8ch_16k_r
292
+ speech_segment: null
293
+ avoid_allzero_segment: true
294
+ flexible_numspk: false
295
+ dynamic_mixing: false
296
+ utt2spk: null
297
+ dynamic_mixing_gain_db: 0.0
298
+ encoder: stft
299
+ encoder_conf:
300
+ n_fft: 960
301
+ hop_length: 480
302
+ use_builtin_complex: true
303
+ default_fs: 48000
304
+ separator: tfgridnetv3
305
+ separator_conf:
306
+ n_srcs: 1
307
+ n_imics: 1
308
+ n_layers: 2
309
+ lstm_hidden_units: 32
310
+ attn_n_head: 2
311
+ attn_qk_output_channel: 2
312
+ emb_dim: 16
313
+ emb_ks: 4
314
+ emb_hs: 1
315
+ activation: prelu
316
+ eps: 1.0e-05
317
+ decoder: stft
318
+ decoder_conf:
319
+ n_fft: 960
320
+ hop_length: 480
321
+ default_fs: 48000
322
+ mask_module: multi_mask
323
+ mask_module_conf: {}
324
+ preprocessor: enh
325
+ preprocessor_conf: {}
326
+ required:
327
+ - output_dir
328
+ version: '202304'
329
+ distributed: false
330
+ ```
331
+
332
+ </details>
333
+
334
+
335
+
336
+ ### Citing ESPnet
337
+
338
+ ```BibTex
339
+ @inproceedings{watanabe2018espnet,
340
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
341
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
342
+ year={2018},
343
+ booktitle={Proceedings of Interspeech},
344
+ pages={2207--2211},
345
+ doi={10.21437/Interspeech.2018-1456},
346
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
347
+ }
348
+
349
+
350
+ @inproceedings{ESPnet-SE,
351
+ author = {Chenda Li and Jing Shi and Wangyou Zhang and Aswin Shanmugam Subramanian and Xuankai Chang and
352
+ Naoyuki Kamo and Moto Hira and Tomoki Hayashi and Christoph B{"{o}}ddeker and Zhuo Chen and Shinji Watanabe},
353
+ title = {ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for {ASR} Integration},
354
+ booktitle = {{IEEE} Spoken Language Technology Workshop, {SLT} 2021, Shenzhen, China, January 19-22, 2021},
355
+ pages = {785--792},
356
+ publisher = {{IEEE}},
357
+ year = {2021},
358
+ url = {https://doi.org/10.1109/SLT48900.2021.9383615},
359
+ doi = {10.1109/SLT48900.2021.9383615},
360
+ timestamp = {Mon, 12 Apr 2021 17:08:59 +0200},
361
+ biburl = {https://dblp.org/rec/conf/slt/Li0ZSCKHHBC021.bib},
362
+ bibsource = {dblp computer science bibliography, https://dblp.org}
363
+ }
364
+
365
+
366
+ ```
367
+
368
+ or arXiv:
369
+
370
+ ```bibtex
371
+ @misc{watanabe2018espnet,
372
+ title={ESPnet: End-to-End Speech Processing Toolkit},
373
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
374
+ year={2018},
375
+ eprint={1804.00015},
376
+ archivePrefix={arXiv},
377
+ primaryClass={cs.CL}
378
+ }
379
+ ```
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/83epoch.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d0e66630d6d8179b9a43271d700e04e9a5f6d04ed67e1f207c4b589160523e8
3
+ size 503327
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/config.yaml ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/tfgridnet_xtiny.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ dry_run: false
5
+ iterator_type: chunk
6
+ output_dir: exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw
7
+ ngpu: 1
8
+ seed: 0
9
+ num_workers: 4
10
+ num_att_plot: 3
11
+ dist_backend: nccl
12
+ dist_init_method: env://
13
+ dist_world_size: null
14
+ dist_rank: null
15
+ local_rank: 0
16
+ dist_master_addr: null
17
+ dist_master_port: null
18
+ dist_launcher: null
19
+ multiprocessing_distributed: false
20
+ unused_parameters: true
21
+ sharded_ddp: false
22
+ cudnn_enabled: true
23
+ cudnn_benchmark: false
24
+ cudnn_deterministic: true
25
+ collect_stats: false
26
+ write_collected_feats: false
27
+ max_epoch: 100
28
+ patience: 40
29
+ val_scheduler_criterion:
30
+ - valid
31
+ - loss
32
+ early_stopping_criterion:
33
+ - valid
34
+ - loss
35
+ - min
36
+ best_model_criterion:
37
+ - - valid
38
+ - loss
39
+ - min
40
+ keep_nbest_models: 1
41
+ nbest_averaging_interval: 0
42
+ grad_clip: 5.0
43
+ grad_clip_type: 2.0
44
+ grad_noise: false
45
+ accum_grad: 1
46
+ no_forward_run: false
47
+ resume: true
48
+ save_interval: 1000
49
+ train_dtype: float32
50
+ use_amp: false
51
+ log_interval: null
52
+ use_matplotlib: true
53
+ use_tensorboard: true
54
+ create_graph_in_tensorboard: false
55
+ use_wandb: false
56
+ wandb_project: null
57
+ wandb_id: null
58
+ wandb_entity: null
59
+ wandb_name: null
60
+ wandb_model_log_interval: -1
61
+ detect_anomaly: false
62
+ pretrain_path: null
63
+ init_param: []
64
+ ignore_init_mismatch: false
65
+ freeze_param: []
66
+ num_iters_per_epoch: 8000
67
+ num_iters_valid: null
68
+ batch_size: 4
69
+ valid_batch_size: null
70
+ batch_bins: 1000000
71
+ valid_batch_bins: null
72
+ train_shape_file:
73
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_mix_shape
74
+ - exp_vctk_dns20_whamr/enh_stats_16k/train/speech_ref1_shape
75
+ valid_shape_file:
76
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_mix_shape
77
+ - exp_vctk_dns20_whamr/enh_stats_16k/valid/speech_ref1_shape
78
+ batch_type: folded
79
+ valid_batch_type: null
80
+ fold_length:
81
+ - 80000
82
+ - 80000
83
+ sort_in_batch: descending
84
+ sort_batch: descending
85
+ multiple_iterator: false
86
+ chunk_length: 32000
87
+ chunk_shift_ratio: 0.5
88
+ num_cache_chunks: 1024
89
+ chunk_excluded_key_prefixes: []
90
+ chunk_discard_short_samples: false
91
+ train_data_path_and_name_and_type:
92
+ - - dump/raw/train_vctk_noisy_dns20_whamr/wav.scp
93
+ - speech_mix
94
+ - sound
95
+ - - dump/raw/train_vctk_noisy_dns20_whamr/spk1.scp
96
+ - speech_ref1
97
+ - sound
98
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2category
99
+ - category
100
+ - text
101
+ - - dump/raw/train_vctk_noisy_dns20_whamr/utt2fs
102
+ - fs
103
+ - text_int
104
+ valid_data_path_and_name_and_type:
105
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/wav.scp
106
+ - speech_mix
107
+ - sound
108
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/spk1.scp
109
+ - speech_ref1
110
+ - sound
111
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2category
112
+ - category
113
+ - text
114
+ - - dump/raw/valid_vctk_noisy_dns20_whamr/utt2fs
115
+ - fs
116
+ - text_int
117
+ allow_variable_data_keys: false
118
+ max_cache_size: 0.0
119
+ max_cache_fd: 32
120
+ allow_multi_rates: true
121
+ valid_max_cache_size: null
122
+ exclude_weight_decay: false
123
+ exclude_weight_decay_conf: {}
124
+ optim: adam
125
+ optim_conf:
126
+ lr: 0.001
127
+ eps: 1.0e-08
128
+ weight_decay: 1.0e-05
129
+ scheduler: steplr
130
+ scheduler_conf:
131
+ step_size: 2
132
+ gamma: 0.99
133
+ init: null
134
+ model_conf:
135
+ normalize_variance_per_ch: true
136
+ categories:
137
+ - 1ch_8k
138
+ - 1ch_8k_r
139
+ - 1ch_16k_r
140
+ - 1ch_48k
141
+ - 1ch_24k
142
+ - 1ch_16k
143
+ - 2ch_8k
144
+ - 2ch_8k_r
145
+ - 2ch_16k
146
+ - 2ch_16k_r
147
+ - 5ch_8k
148
+ - 5ch_16k
149
+ - 8ch_8k_r
150
+ - 8ch_16k_r
151
+ criterions:
152
+ - name: mr_l1_tfd
153
+ conf:
154
+ window_sz:
155
+ - 256
156
+ - 512
157
+ - 768
158
+ - 1024
159
+ hop_sz: null
160
+ eps: 1.0e-08
161
+ time_domain_weight: 0.5
162
+ normalize_variance: true
163
+ wrapper: fixed_order
164
+ wrapper_conf:
165
+ weight: 1.0
166
+ - name: si_snr
167
+ conf:
168
+ eps: 1.0e-07
169
+ wrapper: fixed_order
170
+ wrapper_conf:
171
+ weight: 0.0
172
+ speech_volume_normalize: null
173
+ rir_scp: null
174
+ rir_apply_prob: 1.0
175
+ noise_scp: null
176
+ noise_apply_prob: 1.0
177
+ noise_db_range: '13_15'
178
+ short_noise_thres: 0.5
179
+ use_reverberant_ref: false
180
+ num_spk: 1
181
+ num_noise_type: 1
182
+ sample_rate: 8000
183
+ force_single_channel: true
184
+ channel_reordering: true
185
+ categories:
186
+ - 1ch_8k
187
+ - 1ch_8k_r
188
+ - 1ch_16k_r
189
+ - 1ch_48k
190
+ - 1ch_24k
191
+ - 1ch_16k
192
+ - 2ch_8k
193
+ - 2ch_8k_r
194
+ - 2ch_16k
195
+ - 2ch_16k_r
196
+ - 5ch_8k
197
+ - 5ch_16k
198
+ - 8ch_8k_r
199
+ - 8ch_16k_r
200
+ speech_segment: null
201
+ avoid_allzero_segment: true
202
+ flexible_numspk: false
203
+ dynamic_mixing: false
204
+ utt2spk: null
205
+ dynamic_mixing_gain_db: 0.0
206
+ encoder: stft
207
+ encoder_conf:
208
+ n_fft: 960
209
+ hop_length: 480
210
+ use_builtin_complex: true
211
+ default_fs: 48000
212
+ separator: tfgridnetv3
213
+ separator_conf:
214
+ n_srcs: 1
215
+ n_imics: 1
216
+ n_layers: 2
217
+ lstm_hidden_units: 32
218
+ attn_n_head: 2
219
+ attn_qk_output_channel: 2
220
+ emb_dim: 16
221
+ emb_ks: 4
222
+ emb_hs: 1
223
+ activation: prelu
224
+ eps: 1.0e-05
225
+ decoder: stft
226
+ decoder_conf:
227
+ n_fft: 960
228
+ hop_length: 480
229
+ default_fs: 48000
230
+ mask_module: multi_mask
231
+ mask_module_conf: {}
232
+ preprocessor: enh
233
+ preprocessor_conf: {}
234
+ required:
235
+ - output_dir
236
+ version: '202304'
237
+ distributed: false
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/enhanced_test_16k/RESULTS.md ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Mar 3 22:03:37 EST 2024`
5
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
9
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
10
+
11
+
12
+ ## enhanced_test_16k
13
+
14
+
15
+ |dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
16
+ |---|---|---|---|---|---|---|---|---|---|---|
17
+ |chime4_et05_real_isolated_6ch_track|1.18|52.77|-3.19|-3.19|0.00|-31.84|2.75|3.06|3.75|3.47|
18
+ |chime4_et05_simu_isolated_6ch_track|1.42|81.92|8.23|8.23|0.00|1.62|2.63|3.00|3.64|3.13|
19
+ |dns20_tt_synthetic_no_reverb|2.82|96.64|17.99|17.99|0.00|17.85|3.24|3.52|4.01|3.93|
20
+ |reverb_et_real_8ch_multich|1.13|70.16|3.29|3.29|0.00|0.91|2.99|3.30|3.91|3.76|
21
+ |reverb_et_simu_8ch_multich|2.12|92.45|10.33|10.33|0.00|-8.65|3.12|3.42|3.96|3.84|
22
+ |whamr_tt_mix_single_reverb_max_16k|1.84|90.09|8.74|8.74|0.00|5.98|3.01|3.30|3.95|3.57|
23
+
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/enhanced_test_48k/RESULTS.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by ./scripts/utils/show_enh_score.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Sun Mar 3 16:44:15 EST 2024`
5
+ - python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202304`
7
+ - pytorch version: `pytorch 2.0.1+cu118`
8
+ - Git hash: `443028662106472c60fe8bd892cb277e5b488651`
9
+ - Commit date: `Thu May 11 03:32:59 2023 +0000`
10
+
11
+
12
+ ## enhanced_test_48k
13
+
14
+
15
+ |dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
16
+ |---|---|---|---|---|---|---|---|---|---|
17
+ |vctk_noisy_tt_2spk|94.58|19.73|19.73|0.00|18.14|3.09|3.41|3.93|3.50|
18
+
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/backward_time.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/clip.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/forward_time.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/gpu_max_cached_mem_GB.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/grad_norm.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/iter_time.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/l1_timedomain+magspec_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/loss.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/loss_scale.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/optim0_lr0.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/optim_step_time.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_24k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_48k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_1ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_2ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_2ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_2ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_2ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_5ch_16k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_5ch_8k.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_8ch_16k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/si_snr_loss_8ch_8k_r.png ADDED
exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/images/train_time.png ADDED
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202304'
2
+ files:
3
+ model_file: exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/83epoch.pth
4
+ python: "3.8.16 (default, Mar 2 2023, 03:21:46) \n[GCC 11.2.0]"
5
+ timestamp: 1723014046.160602
6
+ torch: 2.0.1+cu118
7
+ yaml_files:
8
+ train_config: exp_vctk_dns20_whamr/enh_tfgridnet_xtiny_raw/config.yaml