“siddhu001” commited on
Commit
3666434
1 Parent(s): 1633748

Update model

Browse files
Files changed (20) hide show
  1. README.md +353 -0
  2. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/RESULTS.md +46 -0
  3. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/config.yaml +233 -0
  4. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/acc.png +0 -0
  5. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/backward_time.png +0 -0
  6. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/cer.png +0 -0
  7. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/clip.png +0 -0
  8. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/forward_time.png +0 -0
  9. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/gpu_max_cached_mem_GB.png +0 -0
  10. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/grad_norm.png +0 -0
  11. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/iter_time.png +0 -0
  12. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/loss.png +0 -0
  13. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/loss_att.png +0 -0
  14. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/loss_scale.png +0 -0
  15. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/optim0_lr0.png +0 -0
  16. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/optim_step_time.png +0 -0
  17. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/train_time.png +0 -0
  18. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/wer.png +0 -0
  19. exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/valid.loss.ave_10best.pth +3 -0
  20. meta.yaml +8 -0
README.md ADDED
@@ -0,0 +1,353 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - automatic-speech-recognition
6
+ language: en
7
+ datasets:
8
+ - slue-voxceleb
9
+ license: cc-by-4.0
10
+ ---
11
+
12
+ ## ESPnet2 ASR model
13
+
14
+ ### `espnet/sluevoxceleb_owsm_finetune_sa`
15
+
16
+ This model was trained by “siddhu001” using slue-voxceleb recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout e23ef85f0b3116ad5c60d0833f186da0deec0734
26
+ pip install -e .
27
+ cd egs2/slue-voxceleb/slu1_superb_correct
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/sluevoxceleb_owsm_finetune_sa
29
+ ```
30
+
31
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
32
+ # RESULTS
33
+ ## Environments
34
+ - date: `Wed Feb 7 23:48:24 CST 2024`
35
+ - python version: `3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]`
36
+ - espnet version: `espnet 202310`
37
+ - pytorch version: `pytorch 2.1.0+cu121`
38
+ - Git hash: `21d2105784e4da98397bf487b2550d4c6e16d40d`
39
+ - Commit date: `Wed Jan 31 13:40:37 2024 -0600`
40
+
41
+ ## exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp
42
+ ### WER
43
+
44
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
45
+ |---|---|---|---|---|---|---|---|---|
46
+ |decode_asr_slu_model_valid.loss.ave/devel|1436|1436|79.5|20.5|0.0|0.0|20.5|20.5|
47
+ |decode_asr_slu_model_valid.loss.ave/test|3426|3426|79.3|20.7|0.0|0.0|20.7|20.7|
48
+
49
+ ### CER
50
+
51
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
52
+ |---|---|---|---|---|---|---|---|---|
53
+ |decode_asr_slu_model_valid.loss.ave/devel|1436|10365|81.9|16.1|2.0|0.8|18.9|20.5|
54
+ |decode_asr_slu_model_valid.loss.ave/test|3426|24887|82.1|15.8|2.2|0.6|18.6|20.7|
55
+
56
+ ### TER
57
+
58
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
59
+ |---|---|---|---|---|---|---|---|---|
60
+ ## exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/decode_asr_slu_model_valid.loss.ave
61
+ ### WER
62
+
63
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
64
+ |---|---|---|---|---|---|---|---|---|
65
+ |org/devel|1437|1437|79.5|20.5|0.0|0.0|20.5|20.5|
66
+
67
+ ### CER
68
+
69
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
70
+ |---|---|---|---|---|---|---|---|---|
71
+ |org/devel|1437|10372|81.9|16.1|2.0|0.8|18.9|20.5|
72
+
73
+ ### TER
74
+
75
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
76
+ |---|---|---|---|---|---|---|---|---|
77
+
78
+ ## ASR config
79
+
80
+ <details><summary>expand</summary>
81
+
82
+ ```
83
+ config: conf/train_asr_own3.1_weighted_finetune_0.000001.yaml
84
+ print_config: false
85
+ log_level: INFO
86
+ drop_last_iter: false
87
+ dry_run: false
88
+ iterator_type: sequence
89
+ valid_iterator_type: null
90
+ output_dir: exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp
91
+ ngpu: 1
92
+ seed: 0
93
+ num_workers: 1
94
+ num_att_plot: 3
95
+ dist_backend: nccl
96
+ dist_init_method: env://
97
+ dist_world_size: 4
98
+ dist_rank: 0
99
+ local_rank: 0
100
+ dist_master_addr: localhost
101
+ dist_master_port: 42653
102
+ dist_launcher: null
103
+ multiprocessing_distributed: true
104
+ unused_parameters: true
105
+ sharded_ddp: false
106
+ cudnn_enabled: true
107
+ cudnn_benchmark: false
108
+ cudnn_deterministic: true
109
+ collect_stats: false
110
+ write_collected_feats: false
111
+ max_epoch: 50
112
+ patience: null
113
+ val_scheduler_criterion:
114
+ - valid
115
+ - loss
116
+ early_stopping_criterion:
117
+ - valid
118
+ - loss
119
+ - min
120
+ best_model_criterion:
121
+ - - valid
122
+ - loss
123
+ - min
124
+ - - train
125
+ - loss
126
+ - min
127
+ keep_nbest_models: 10
128
+ nbest_averaging_interval: 0
129
+ grad_clip: 5.0
130
+ grad_clip_type: 2.0
131
+ grad_noise: false
132
+ accum_grad: 2
133
+ no_forward_run: false
134
+ resume: true
135
+ train_dtype: float32
136
+ use_amp: false
137
+ log_interval: null
138
+ use_matplotlib: true
139
+ use_tensorboard: true
140
+ create_graph_in_tensorboard: false
141
+ use_wandb: false
142
+ wandb_project: null
143
+ wandb_id: null
144
+ wandb_entity: null
145
+ wandb_name: null
146
+ wandb_model_log_interval: -1
147
+ detect_anomaly: false
148
+ use_lora: false
149
+ save_lora_only: true
150
+ lora_conf: {}
151
+ pretrain_path: null
152
+ init_param:
153
+ - /scratch/bbjs/arora1/new_download_espnet_egs2/harpervalley/slu1_superb_onlyda/owsm_v3.1_ebf/exp/s2t_train_s2t_ebf_conv2d_size1024_e18_d18_piecewise_lr2e-4_warmup60k_flashattn_raw_bpe50000/valid.total_count.ave_5best.till45epoch.pth:encoder:encoder
154
+ ignore_init_mismatch: false
155
+ freeze_param: []
156
+ num_iters_per_epoch: null
157
+ batch_size: 64
158
+ valid_batch_size: null
159
+ batch_bins: 1000000
160
+ valid_batch_bins: null
161
+ train_shape_file:
162
+ - exp/slu_stats_raw_en_word_sp/train/speech_shape
163
+ - exp/slu_stats_raw_en_word_sp/train/text_shape.word
164
+ valid_shape_file:
165
+ - exp/slu_stats_raw_en_word_sp/valid/speech_shape
166
+ - exp/slu_stats_raw_en_word_sp/valid/text_shape.word
167
+ batch_type: folded
168
+ valid_batch_type: null
169
+ fold_length:
170
+ - 80000
171
+ - 150
172
+ sort_in_batch: descending
173
+ shuffle_within_batch: false
174
+ sort_batch: descending
175
+ multiple_iterator: false
176
+ chunk_length: 500
177
+ chunk_shift_ratio: 0.5
178
+ num_cache_chunks: 1024
179
+ chunk_excluded_key_prefixes: []
180
+ chunk_default_fs: null
181
+ train_data_path_and_name_and_type:
182
+ - - dump/raw/train_sp/wav.scp
183
+ - speech
184
+ - sound
185
+ - - dump/raw/train_sp/text
186
+ - text
187
+ - text
188
+ valid_data_path_and_name_and_type:
189
+ - - dump/raw/devel/wav.scp
190
+ - speech
191
+ - sound
192
+ - - dump/raw/devel/text
193
+ - text
194
+ - text
195
+ allow_variable_data_keys: false
196
+ max_cache_size: 0.0
197
+ max_cache_fd: 32
198
+ allow_multi_rates: false
199
+ valid_max_cache_size: null
200
+ exclude_weight_decay: false
201
+ exclude_weight_decay_conf: {}
202
+ optim: adam
203
+ optim_conf:
204
+ lr: 1.0e-06
205
+ scheduler: warmuplr
206
+ scheduler_conf:
207
+ warmup_steps: 1000
208
+ token_list:
209
+ - <blank>
210
+ - <unk>
211
+ - Neutral
212
+ - Positive
213
+ - Negative
214
+ - <sos/eos>
215
+ transcript_token_list: null
216
+ two_pass: false
217
+ pre_postencoder_norm: false
218
+ init: null
219
+ input_size: null
220
+ ctc_conf:
221
+ dropout_rate: 0.0
222
+ ctc_type: builtin
223
+ reduce: true
224
+ ignore_nan_grad: null
225
+ zero_infinity: true
226
+ brctc_risk_strategy: exp
227
+ brctc_group_strategy: end
228
+ brctc_risk_factor: 0.0
229
+ joint_net_conf: null
230
+ use_preprocessor: true
231
+ token_type: word
232
+ bpemodel: null
233
+ non_linguistic_symbols: null
234
+ cleaner: null
235
+ g2p: null
236
+ speech_volume_normalize: null
237
+ rir_scp: null
238
+ rir_apply_prob: 1.0
239
+ noise_scp: null
240
+ noise_apply_prob: 1.0
241
+ noise_db_range: '13_15'
242
+ short_noise_thres: 0.5
243
+ frontend: default
244
+ frontend_conf:
245
+ n_fft: 512
246
+ win_length: 400
247
+ hop_length: 160
248
+ fs: 16k
249
+ specaug: specaug
250
+ specaug_conf:
251
+ apply_time_warp: false
252
+ time_warp_window: 5
253
+ time_warp_mode: bicubic
254
+ apply_freq_mask: true
255
+ freq_mask_width_range:
256
+ - 0
257
+ - 27
258
+ num_freq_mask: 2
259
+ apply_time_mask: true
260
+ time_mask_width_ratio_range:
261
+ - 0.0
262
+ - 0.05
263
+ num_time_mask: 10
264
+ normalize: global_mvn
265
+ normalize_conf:
266
+ stats_file: /scratch/bbjs/arora1/new_download_espnet_egs2/harpervalley/slu1_superb_onlyda/owsm_v3.1_ebf/exp/s2t_stats_raw_bpe50000/train/feats_stats.npz
267
+ model: espnet
268
+ model_conf:
269
+ ctc_weight: 0.0
270
+ lsm_weight: 0.1
271
+ length_normalized_loss: false
272
+ superb_setup_encoder: true
273
+ num_class: 3
274
+ ssl_input_size: 1024
275
+ weighted_sum: true
276
+ extract_feats_in_collect_stats: false
277
+ preencoder: null
278
+ preencoder_conf: {}
279
+ encoder: e_branchformer
280
+ encoder_conf:
281
+ output_size: 1024
282
+ attention_heads: 16
283
+ attention_layer_type: selfattn
284
+ pos_enc_layer_type: abs_pos
285
+ rel_pos_type: latest
286
+ cgmlp_linear_units: 4096
287
+ cgmlp_conv_kernel: 31
288
+ use_linear_after_conv: false
289
+ gate_activation: identity
290
+ num_blocks: 18
291
+ dropout_rate: 0.1
292
+ positional_dropout_rate: 0.1
293
+ attention_dropout_rate: 0.1
294
+ input_layer: conv2d
295
+ layer_drop_rate: 0.0
296
+ linear_units: 4096
297
+ positionwise_layer_type: linear
298
+ use_ffn: true
299
+ macaron_ffn: true
300
+ merge_conv_kernel: 31
301
+ prepostencoder: null
302
+ prepostencoder_conf: {}
303
+ postencoder: null
304
+ postencoder_conf: {}
305
+ deliberationencoder: null
306
+ deliberationencoder_conf: {}
307
+ decoder: rnn
308
+ decoder_conf: {}
309
+ postdecoder: null
310
+ postdecoder_conf: {}
311
+ required:
312
+ - output_dir
313
+ - token_list
314
+ version: '202310'
315
+ distributed: true
316
+ ```
317
+
318
+ </details>
319
+
320
+
321
+
322
+ ### Citing ESPnet
323
+
324
+ ```BibTex
325
+ @inproceedings{watanabe2018espnet,
326
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
327
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
328
+ year={2018},
329
+ booktitle={Proceedings of Interspeech},
330
+ pages={2207--2211},
331
+ doi={10.21437/Interspeech.2018-1456},
332
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
333
+ }
334
+
335
+
336
+
337
+
338
+
339
+
340
+ ```
341
+
342
+ or arXiv:
343
+
344
+ ```bibtex
345
+ @misc{watanabe2018espnet,
346
+ title={ESPnet: End-to-End Speech Processing Toolkit},
347
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
348
+ year={2018},
349
+ eprint={1804.00015},
350
+ archivePrefix={arXiv},
351
+ primaryClass={cs.CL}
352
+ }
353
+ ```
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/RESULTS.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Generated by scripts/utils/show_asr_result.sh -->
2
+ # RESULTS
3
+ ## Environments
4
+ - date: `Wed Feb 7 23:48:24 CST 2024`
5
+ - python version: `3.9.13 (main, Aug 25 2022, 23:26:10) [GCC 11.2.0]`
6
+ - espnet version: `espnet 202310`
7
+ - pytorch version: `pytorch 2.1.0+cu121`
8
+ - Git hash: `21d2105784e4da98397bf487b2550d4c6e16d40d`
9
+ - Commit date: `Wed Jan 31 13:40:37 2024 -0600`
10
+
11
+ ## exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp
12
+ ### WER
13
+
14
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
15
+ |---|---|---|---|---|---|---|---|---|
16
+ |decode_asr_slu_model_valid.loss.ave/devel|1436|1436|79.5|20.5|0.0|0.0|20.5|20.5|
17
+ |decode_asr_slu_model_valid.loss.ave/test|3426|3426|79.3|20.7|0.0|0.0|20.7|20.7|
18
+
19
+ ### CER
20
+
21
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
22
+ |---|---|---|---|---|---|---|---|---|
23
+ |decode_asr_slu_model_valid.loss.ave/devel|1436|10365|81.9|16.1|2.0|0.8|18.9|20.5|
24
+ |decode_asr_slu_model_valid.loss.ave/test|3426|24887|82.1|15.8|2.2|0.6|18.6|20.7|
25
+
26
+ ### TER
27
+
28
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
29
+ |---|---|---|---|---|---|---|---|---|
30
+ ## exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/decode_asr_slu_model_valid.loss.ave
31
+ ### WER
32
+
33
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
34
+ |---|---|---|---|---|---|---|---|---|
35
+ |org/devel|1437|1437|79.5|20.5|0.0|0.0|20.5|20.5|
36
+
37
+ ### CER
38
+
39
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
40
+ |---|---|---|---|---|---|---|---|---|
41
+ |org/devel|1437|10372|81.9|16.1|2.0|0.8|18.9|20.5|
42
+
43
+ ### TER
44
+
45
+ |dataset|Snt|Wrd|Corr|Sub|Del|Ins|Err|S.Err|
46
+ |---|---|---|---|---|---|---|---|---|
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/config.yaml ADDED
@@ -0,0 +1,233 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/train_asr_own3.1_weighted_finetune_0.000001.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 1
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: 4
16
+ dist_rank: 0
17
+ local_rank: 0
18
+ dist_master_addr: localhost
19
+ dist_master_port: 42653
20
+ dist_launcher: null
21
+ multiprocessing_distributed: true
22
+ unused_parameters: true
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: false
26
+ cudnn_deterministic: true
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 50
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - valid
40
+ - loss
41
+ - min
42
+ - - train
43
+ - loss
44
+ - min
45
+ keep_nbest_models: 10
46
+ nbest_averaging_interval: 0
47
+ grad_clip: 5.0
48
+ grad_clip_type: 2.0
49
+ grad_noise: false
50
+ accum_grad: 2
51
+ no_forward_run: false
52
+ resume: true
53
+ train_dtype: float32
54
+ use_amp: false
55
+ log_interval: null
56
+ use_matplotlib: true
57
+ use_tensorboard: true
58
+ create_graph_in_tensorboard: false
59
+ use_wandb: false
60
+ wandb_project: null
61
+ wandb_id: null
62
+ wandb_entity: null
63
+ wandb_name: null
64
+ wandb_model_log_interval: -1
65
+ detect_anomaly: false
66
+ use_lora: false
67
+ save_lora_only: true
68
+ lora_conf: {}
69
+ pretrain_path: null
70
+ init_param:
71
+ - /scratch/bbjs/arora1/new_download_espnet_egs2/harpervalley/slu1_superb_onlyda/owsm_v3.1_ebf/exp/s2t_train_s2t_ebf_conv2d_size1024_e18_d18_piecewise_lr2e-4_warmup60k_flashattn_raw_bpe50000/valid.total_count.ave_5best.till45epoch.pth:encoder:encoder
72
+ ignore_init_mismatch: false
73
+ freeze_param: []
74
+ num_iters_per_epoch: null
75
+ batch_size: 64
76
+ valid_batch_size: null
77
+ batch_bins: 1000000
78
+ valid_batch_bins: null
79
+ train_shape_file:
80
+ - exp/slu_stats_raw_en_word_sp/train/speech_shape
81
+ - exp/slu_stats_raw_en_word_sp/train/text_shape.word
82
+ valid_shape_file:
83
+ - exp/slu_stats_raw_en_word_sp/valid/speech_shape
84
+ - exp/slu_stats_raw_en_word_sp/valid/text_shape.word
85
+ batch_type: folded
86
+ valid_batch_type: null
87
+ fold_length:
88
+ - 80000
89
+ - 150
90
+ sort_in_batch: descending
91
+ shuffle_within_batch: false
92
+ sort_batch: descending
93
+ multiple_iterator: false
94
+ chunk_length: 500
95
+ chunk_shift_ratio: 0.5
96
+ num_cache_chunks: 1024
97
+ chunk_excluded_key_prefixes: []
98
+ chunk_default_fs: null
99
+ train_data_path_and_name_and_type:
100
+ - - dump/raw/train_sp/wav.scp
101
+ - speech
102
+ - sound
103
+ - - dump/raw/train_sp/text
104
+ - text
105
+ - text
106
+ valid_data_path_and_name_and_type:
107
+ - - dump/raw/devel/wav.scp
108
+ - speech
109
+ - sound
110
+ - - dump/raw/devel/text
111
+ - text
112
+ - text
113
+ allow_variable_data_keys: false
114
+ max_cache_size: 0.0
115
+ max_cache_fd: 32
116
+ allow_multi_rates: false
117
+ valid_max_cache_size: null
118
+ exclude_weight_decay: false
119
+ exclude_weight_decay_conf: {}
120
+ optim: adam
121
+ optim_conf:
122
+ lr: 1.0e-06
123
+ scheduler: warmuplr
124
+ scheduler_conf:
125
+ warmup_steps: 1000
126
+ token_list:
127
+ - <blank>
128
+ - <unk>
129
+ - Neutral
130
+ - Positive
131
+ - Negative
132
+ - <sos/eos>
133
+ transcript_token_list: null
134
+ two_pass: false
135
+ pre_postencoder_norm: false
136
+ init: null
137
+ input_size: null
138
+ ctc_conf:
139
+ dropout_rate: 0.0
140
+ ctc_type: builtin
141
+ reduce: true
142
+ ignore_nan_grad: null
143
+ zero_infinity: true
144
+ brctc_risk_strategy: exp
145
+ brctc_group_strategy: end
146
+ brctc_risk_factor: 0.0
147
+ joint_net_conf: null
148
+ use_preprocessor: true
149
+ token_type: word
150
+ bpemodel: null
151
+ non_linguistic_symbols: null
152
+ cleaner: null
153
+ g2p: null
154
+ speech_volume_normalize: null
155
+ rir_scp: null
156
+ rir_apply_prob: 1.0
157
+ noise_scp: null
158
+ noise_apply_prob: 1.0
159
+ noise_db_range: '13_15'
160
+ short_noise_thres: 0.5
161
+ frontend: default
162
+ frontend_conf:
163
+ n_fft: 512
164
+ win_length: 400
165
+ hop_length: 160
166
+ fs: 16k
167
+ specaug: specaug
168
+ specaug_conf:
169
+ apply_time_warp: false
170
+ time_warp_window: 5
171
+ time_warp_mode: bicubic
172
+ apply_freq_mask: true
173
+ freq_mask_width_range:
174
+ - 0
175
+ - 27
176
+ num_freq_mask: 2
177
+ apply_time_mask: true
178
+ time_mask_width_ratio_range:
179
+ - 0.0
180
+ - 0.05
181
+ num_time_mask: 10
182
+ normalize: global_mvn
183
+ normalize_conf:
184
+ stats_file: /scratch/bbjs/arora1/new_download_espnet_egs2/harpervalley/slu1_superb_onlyda/owsm_v3.1_ebf/exp/s2t_stats_raw_bpe50000/train/feats_stats.npz
185
+ model: espnet
186
+ model_conf:
187
+ ctc_weight: 0.0
188
+ lsm_weight: 0.1
189
+ length_normalized_loss: false
190
+ superb_setup_encoder: true
191
+ num_class: 3
192
+ ssl_input_size: 1024
193
+ weighted_sum: true
194
+ extract_feats_in_collect_stats: false
195
+ preencoder: null
196
+ preencoder_conf: {}
197
+ encoder: e_branchformer
198
+ encoder_conf:
199
+ output_size: 1024
200
+ attention_heads: 16
201
+ attention_layer_type: selfattn
202
+ pos_enc_layer_type: abs_pos
203
+ rel_pos_type: latest
204
+ cgmlp_linear_units: 4096
205
+ cgmlp_conv_kernel: 31
206
+ use_linear_after_conv: false
207
+ gate_activation: identity
208
+ num_blocks: 18
209
+ dropout_rate: 0.1
210
+ positional_dropout_rate: 0.1
211
+ attention_dropout_rate: 0.1
212
+ input_layer: conv2d
213
+ layer_drop_rate: 0.0
214
+ linear_units: 4096
215
+ positionwise_layer_type: linear
216
+ use_ffn: true
217
+ macaron_ffn: true
218
+ merge_conv_kernel: 31
219
+ prepostencoder: null
220
+ prepostencoder_conf: {}
221
+ postencoder: null
222
+ postencoder_conf: {}
223
+ deliberationencoder: null
224
+ deliberationencoder_conf: {}
225
+ decoder: rnn
226
+ decoder_conf: {}
227
+ postdecoder: null
228
+ postdecoder_conf: {}
229
+ required:
230
+ - output_dir
231
+ - token_list
232
+ version: '202310'
233
+ distributed: true
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/acc.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/backward_time.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/cer.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/clip.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/forward_time.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/gpu_max_cached_mem_GB.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/grad_norm.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/iter_time.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/loss.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/loss_att.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/loss_scale.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/optim0_lr0.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/optim_step_time.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/train_time.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/images/wer.png ADDED
exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/valid.loss.ave_10best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ef335c0e836757aa8506cb2e39131e2278df829ae646bb40994294a08e505e2
3
+ size 2247934138
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202310'
2
+ files:
3
+ slu_model_file: exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/valid.loss.ave_10best.pth
4
+ python: "3.9.13 (main, Aug 25 2022, 23:26:10) \n[GCC 11.2.0]"
5
+ timestamp: 1715350239.56977
6
+ torch: 2.1.0+cu121
7
+ yaml_files:
8
+ slu_train_config: exp/slu_train_asr_own3.1_weighted_finetune_0.000001_raw_en_word_sp/config.yaml