2024-09-22 10:50:57,231 INFO [train.py:1266] (0/4) Training started 2024-09-22 10:50:57,241 INFO [train.py:1276] (0/4) Device: cuda:0 2024-09-22 10:50:57,244 INFO [train.py:1307] (0/4) Using dtype=torch.float16 2024-09-22 10:50:57,244 INFO [train.py:1308] (0/4) Use AMP=True 2024-09-22 10:50:57,244 INFO [train.py:1310] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'ignore_id': -1, 'label_smoothing': 0.1, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '44a9d5682af9fd3ef77074777e15278ec6d390eb', 'k2-git-date': 'Wed Sep 27 11:22:55 2023', 'lhotse-version': '1.17.0.dev+git.ccfc5b2c.dirty', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'cr-ctc', 'icefall-git-sha1': 'a6eead6c-clean', 'icefall-git-date': 'Mon Sep 9 10:10:08 2024', 'icefall-path': '/star-zw/workspace/zipformer/icefall_cr_ctc', 'k2-path': '/star-zw/workspace/k2/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-zw/workspace/lhotse/lhotse/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-7-0905180047-6d6678bc6f-8cwvw', 'IP address': '10.30.5.48'}, 'world_size': 4, 'master_port': 12347, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 1.0, 'cr_loss_scale': 0.2, 'time_mask_ratio': 2.5, 'cr_loss_masked_scale': 1.0, 'attention_decoder_loss_scale': 0.8, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': False, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'attention_decoder_dim': 512, 'attention_decoder_num_layers': 6, 'attention_decoder_attention_dim': 512, 'attention_decoder_num_heads': 8, 'attention_decoder_feedforward_dim': 2048, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': False, 'use_ctc': True, 'use_attention_decoder': False, 'use_cr_ctc': True, 'full_libri': True, 'mini_libri': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 700, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'sos_id': 1, 'eos_id': 1, 'vocab_size': 500, 'dtype': torch.float16, 'use_autocast': True} 2024-09-22 10:50:57,245 INFO [train.py:1312] (0/4) About to create model 2024-09-22 10:50:57,890 INFO [train.py:1316] (0/4) Number of model parameters: 64250603 2024-09-22 10:50:57,891 INFO [train.py:752] (0/4) num_frame_masks: 25, max_frames_mask_fraction: 0.375 2024-09-22 10:51:03,103 INFO [train.py:1338] (0/4) Using DDP 2024-09-22 10:51:03,358 INFO [asr_datamodule.py:436] (0/4) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2024-09-22 10:51:03,601 INFO [asr_datamodule.py:232] (0/4) Enable MUSAN 2024-09-22 10:51:03,601 INFO [asr_datamodule.py:233] (0/4) About to get Musan cuts 2024-09-22 10:51:05,468 INFO [asr_datamodule.py:279] (0/4) Disable SpecAugment 2024-09-22 10:51:05,468 INFO [asr_datamodule.py:281] (0/4) About to create train dataset 2024-09-22 10:51:05,468 INFO [asr_datamodule.py:308] (0/4) Using DynamicBucketingSampler. 2024-09-22 10:51:27,854 INFO [asr_datamodule.py:325] (0/4) About to create train dataloader 2024-09-22 10:51:27,855 INFO [asr_datamodule.py:453] (0/4) About to get dev-clean cuts 2024-09-22 10:51:27,858 INFO [asr_datamodule.py:460] (0/4) About to get dev-other cuts 2024-09-22 10:51:27,859 INFO [asr_datamodule.py:356] (0/4) About to create dev dataset 2024-09-22 10:51:28,059 INFO [asr_datamodule.py:373] (0/4) About to create dev dataloader 2024-09-22 10:51:28,060 INFO [train.py:1545] (0/4) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2024-09-22 10:55:04,879 INFO [train.py:1576] (0/4) Maximum memory allocated so far is 18729MB 2024-09-22 10:55:06,893 INFO [train.py:1576] (0/4) Maximum memory allocated so far is 18729MB 2024-09-22 10:55:09,107 INFO [train.py:1576] (0/4) Maximum memory allocated so far is 18729MB 2024-09-22 10:55:10,930 INFO [train.py:1576] (0/4) Maximum memory allocated so far is 19067MB 2024-09-22 10:55:13,009 INFO [train.py:1576] (0/4) Maximum memory allocated so far is 19067MB 2024-09-22 10:55:15,403 INFO [train.py:1576] (0/4) Maximum memory allocated so far is 19067MB 2024-09-22 10:56:01,486 INFO [train.py:1198] (0/4) Epoch 1, batch 0, loss[loss=4.899, ctc_loss=4.766, cr_loss=0.6668, over 17106.00 frames. ], tot_loss[loss=4.899, ctc_loss=4.766, cr_loss=0.6668, over 17106.00 frames. ], batch size: 40, lr: 2.25e-02, grad_scale: 2.0 2024-09-22 10:56:01,487 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 10:56:18,234 INFO [train.py:1230] (0/4) Epoch 1, validation: loss=4.756, ctc_loss=4.756, cr_loss=2.853e-15, over 944034.00 frames. 2024-09-22 10:56:18,235 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 19067MB 2024-09-22 10:56:18,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=7.5 2024-09-22 10:56:30,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=0.0, ans=0.5 2024-09-22 10:56:33,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.64 vs. limit=5.0 2024-09-22 10:56:35,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=7.535 2024-09-22 10:56:39,774 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.680e+03 4.888e+03 5.131e+03 6.578e+03 8.849e+03, threshold=2.053e+04, percent-clipped=0.0 2024-09-22 10:56:40,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=46.666666666666664, ans=0.8983666666666666 2024-09-22 10:56:59,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=4.037333333333334 2024-09-22 10:57:00,472 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.227e+03 2.818e+03 4.846e+03 6.578e+03 1.124e+04, threshold=1.938e+04, percent-clipped=0.0 2024-09-22 10:57:07,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.66 vs. limit=7.57 2024-09-22 10:57:26,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=7.605 2024-09-22 10:57:28,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=140.0, ans=5.0875 2024-09-22 10:57:36,832 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.493e+02 2.434e+03 3.497e+03 4.961e+03 1.124e+04, threshold=1.399e+04, percent-clipped=0.0 2024-09-22 10:57:48,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=7.64 2024-09-22 10:57:50,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.85 vs. limit=7.5875 2024-09-22 10:57:51,514 INFO [train.py:1198] (0/4) Epoch 1, batch 50, loss[loss=1.499, ctc_loss=1.413, cr_loss=0.4322, over 16900.00 frames. ], tot_loss[loss=2.28, ctc_loss=2.218, cr_loss=0.3103, over 761837.63 frames. ], batch size: 58, lr: 2.48e-02, grad_scale: 0.5 2024-09-22 10:57:56,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=158.97 vs. limit=7.5875 2024-09-22 10:57:57,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=7.675 2024-09-22 10:57:57,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=21.98 vs. limit=5.116666666666666 2024-09-22 10:57:59,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=233.33333333333334, ans=0.09854166666666667 2024-09-22 10:58:03,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=22.78 vs. limit=7.5875 2024-09-22 10:58:03,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=43.50 vs. limit=7.5875 2024-09-22 10:58:18,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=7.605 2024-09-22 10:58:26,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=99.24 vs. limit=5.14 2024-09-22 10:58:27,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=31.23 vs. limit=7.745 2024-09-22 10:58:30,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=326.6666666666667, ans=0.4846875 2024-09-22 10:58:31,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=7.6225 2024-09-22 10:58:45,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=373.3333333333333, ans=0.4825 2024-09-22 10:58:46,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=7.64 2024-09-22 10:58:47,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=373.3333333333333, ans=0.4825 2024-09-22 10:58:55,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.80 vs. limit=7.78 2024-09-22 10:59:08,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.72 vs. limit=5.21 2024-09-22 10:59:13,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=420.0, ans=0.18425000000000002 2024-09-22 10:59:15,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=22.79 vs. limit=7.6575 2024-09-22 10:59:21,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=7.6575 2024-09-22 10:59:23,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=7.675 2024-09-22 10:59:24,438 INFO [train.py:1198] (0/4) Epoch 1, batch 100, loss[loss=1.249, ctc_loss=1.223, cr_loss=0.1343, over 17298.00 frames. ], tot_loss[loss=1.72, ctc_loss=1.67, cr_loss=0.2467, over 1328306.58 frames. ], batch size: 49, lr: 2.70e-02, grad_scale: 1.0 2024-09-22 10:59:28,077 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 5.797e+02 1.227e+03 2.964e+03 1.124e+04, threshold=2.454e+03, percent-clipped=0.0 2024-09-22 10:59:28,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=19.39 vs. limit=5.233333333333333 2024-09-22 10:59:48,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=165.72 vs. limit=7.6925 2024-09-22 10:59:58,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=25.13 vs. limit=7.885 2024-09-22 10:59:58,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=513.3333333333334, ans=0.4759375 2024-09-22 11:00:03,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=560.0, ans=7.92 2024-09-22 11:00:03,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=7.92 2024-09-22 11:00:18,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=5.14 2024-09-22 11:00:22,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=51.43 vs. limit=7.955 2024-09-22 11:00:23,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=606.6666666666666, ans=0.4715625 2024-09-22 11:00:25,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=14.22 vs. limit=5.151666666666666 2024-09-22 11:00:33,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=54.37 vs. limit=7.7275 2024-09-22 11:00:33,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=157.55 vs. limit=5.303333333333334 2024-09-22 11:00:35,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=39.41 vs. limit=7.7275 2024-09-22 11:00:38,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=35.40 vs. limit=7.7275 2024-09-22 11:00:54,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=300.45 vs. limit=7.745 2024-09-22 11:01:01,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.23 vs. limit=7.99 2024-09-22 11:01:04,607 INFO [train.py:1198] (0/4) Epoch 1, batch 150, loss[loss=1.247, ctc_loss=1.223, cr_loss=0.121, over 17294.00 frames. ], tot_loss[loss=1.51, ctc_loss=1.47, cr_loss=0.1975, over 1773567.27 frames. ], batch size: 51, lr: 2.93e-02, grad_scale: 1.0 2024-09-22 11:01:06,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=700.0, ans=0.095625 2024-09-22 11:01:30,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=746.6666666666666, ans=0.46499999999999997 2024-09-22 11:01:34,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=746.6666666666666, ans=0.04766666666666667 2024-09-22 11:01:58,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.71 vs. limit=5.21 2024-09-22 11:02:02,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=840.0, ans=0.09475 2024-09-22 11:02:06,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=64.68 vs. limit=7.815 2024-09-22 11:02:12,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.24 vs. limit=8.13 2024-09-22 11:02:18,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=886.6666666666666, ans=8.165 2024-09-22 11:02:36,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=30.59 vs. limit=7.8325 2024-09-22 11:02:38,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=74.07 vs. limit=7.85 2024-09-22 11:02:39,841 INFO [train.py:1198] (0/4) Epoch 1, batch 200, loss[loss=1.231, ctc_loss=1.211, cr_loss=0.09847, over 17310.00 frames. ], tot_loss[loss=1.403, ctc_loss=1.369, cr_loss=0.17, over 2118177.34 frames. ], batch size: 51, lr: 3.15e-02, grad_scale: 2.0 2024-09-22 11:02:43,594 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.311e+02 2.245e+02 3.033e+02 4.033e+02 1.104e+03, threshold=6.066e+02, percent-clipped=0.0 2024-09-22 11:02:46,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=39.13 vs. limit=7.85 2024-09-22 11:02:48,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=23.72 vs. limit=7.85 2024-09-22 11:02:54,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=34.77 vs. limit=7.85 2024-09-22 11:02:57,400 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=48.80 vs. limit=7.8675 2024-09-22 11:03:02,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=980.0, ans=0.8657 2024-09-22 11:03:13,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=101.09 vs. limit=7.8675 2024-09-22 11:03:15,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=7.885 2024-09-22 11:03:23,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=41.21 vs. limit=7.885 2024-09-22 11:03:24,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=32.74 vs. limit=7.885 2024-09-22 11:03:25,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1026.6666666666667, ans=7.885 2024-09-22 11:03:45,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1073.3333333333333, ans=0.4496875 2024-09-22 11:03:50,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1073.3333333333333, ans=0.36583333333333334 2024-09-22 11:03:51,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=52.21 vs. limit=7.9025 2024-09-22 11:03:54,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.99 vs. limit=7.92 2024-09-22 11:04:00,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=37.01 vs. limit=7.92 2024-09-22 11:04:12,401 INFO [train.py:1198] (0/4) Epoch 1, batch 250, loss[loss=1.077, ctc_loss=1.05, cr_loss=0.1336, over 17007.00 frames. ], tot_loss[loss=1.339, ctc_loss=1.308, cr_loss=0.1536, over 2392969.52 frames. ], batch size: 39, lr: 3.38e-02, grad_scale: 2.0 2024-09-22 11:04:13,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=8.375 2024-09-22 11:04:17,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=1166.6666666666667, ans=7.9375 2024-09-22 11:04:23,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=1166.6666666666667, ans=0.184375 2024-09-22 11:04:25,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1166.6666666666667, ans=0.21750000000000003 2024-09-22 11:04:29,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1213.3333333333333, ans=0.04620833333333334 2024-09-22 11:04:35,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=4.485333333333333 2024-09-22 11:04:41,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1213.3333333333333, ans=0.443125 2024-09-22 11:04:53,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1260.0, ans=0.15275 2024-09-22 11:04:54,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=27.85 vs. limit=7.9725 2024-09-22 11:05:01,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=24.21 vs. limit=7.9725 2024-09-22 11:05:05,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=1260.0, ans=8.445 2024-09-22 11:05:08,194 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.854e+00 2024-09-22 11:05:10,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=8.48 2024-09-22 11:05:35,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=91.99 vs. limit=8.0075 2024-09-22 11:05:37,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=8.0075 2024-09-22 11:05:42,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1353.3333333333333, ans=0.4365625 2024-09-22 11:05:47,709 INFO [train.py:1198] (0/4) Epoch 1, batch 300, loss[loss=1.223, ctc_loss=1.191, cr_loss=0.161, over 16745.00 frames. ], tot_loss[loss=1.301, ctc_loss=1.271, cr_loss=0.1504, over 2604923.20 frames. ], batch size: 61, lr: 3.60e-02, grad_scale: 4.0 2024-09-22 11:05:49,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1400.0, ans=0.851 2024-09-22 11:05:50,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=51.79 vs. limit=8.025 2024-09-22 11:05:51,295 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.872e+02 2.621e+02 3.440e+02 6.626e+02, threshold=5.242e+02, percent-clipped=4.0 2024-09-22 11:05:52,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=44.35 vs. limit=8.025 2024-09-22 11:05:53,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1400.0, ans=0.434375 2024-09-22 11:06:00,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1400.0, ans=0.286 2024-09-22 11:06:02,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=101.47 vs. limit=8.55 2024-09-22 11:06:07,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=8.0425 2024-09-22 11:06:08,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=87.02 vs. limit=8.0425 2024-09-22 11:06:52,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.24 vs. limit=5.77 2024-09-22 11:06:55,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1540.0, ans=0.28459999999999996 2024-09-22 11:07:02,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=8.655 2024-09-22 11:07:03,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.16 vs. limit=5.77 2024-09-22 11:07:25,017 INFO [train.py:1198] (0/4) Epoch 1, batch 350, loss[loss=1.19, ctc_loss=1.143, cr_loss=0.2323, over 17023.00 frames. ], tot_loss[loss=1.27, ctc_loss=1.237, cr_loss=0.1617, over 2770932.63 frames. ], batch size: 56, lr: 3.83e-02, grad_scale: 4.0 2024-09-22 11:07:30,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1633.3333333333333, ans=0.08979166666666667 2024-09-22 11:07:38,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.08 vs. limit=5.408333333333333 2024-09-22 11:08:06,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=8.1475 2024-09-22 11:08:08,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1726.6666666666667, ans=0.8395666666666667 2024-09-22 11:08:20,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=3.259 2024-09-22 11:08:44,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=8.1825 2024-09-22 11:08:52,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1820.0, ans=0.2273 2024-09-22 11:08:56,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.80 vs. limit=8.865 2024-09-22 11:09:00,231 INFO [train.py:1198] (0/4) Epoch 1, batch 400, loss[loss=1.139, ctc_loss=1.089, cr_loss=0.2501, over 17203.00 frames. ], tot_loss[loss=1.242, ctc_loss=1.206, cr_loss=0.1815, over 2911132.81 frames. ], batch size: 47, lr: 4.05e-02, grad_scale: 8.0 2024-09-22 11:09:03,789 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.305e+02 2.539e+02 3.204e+02 4.584e+02 1.114e+03, threshold=6.407e+02, percent-clipped=17.0 2024-09-22 11:09:04,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=29.71 vs. limit=8.2 2024-09-22 11:09:11,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1866.6666666666667, ans=0.13 2024-09-22 11:09:29,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1913.3333333333333, ans=0.41031249999999997 2024-09-22 11:09:30,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=8.2175 2024-09-22 11:09:40,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1960.0, ans=0.408125 2024-09-22 11:09:43,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=8.235 2024-09-22 11:09:47,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1960.0, ans=0.0559 2024-09-22 11:09:54,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2006.6666666666667, ans=8.2525 2024-09-22 11:09:55,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2006.6666666666667, ans=0.12475 2024-09-22 11:09:59,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=9.004999999999999 2024-09-22 11:10:02,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2006.6666666666667, ans=0.4059375 2024-09-22 11:10:11,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2053.3333333333335, ans=0.2433333333333333 2024-09-22 11:10:19,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=4.8213333333333335 2024-09-22 11:10:31,079 INFO [train.py:1198] (0/4) Epoch 1, batch 450, loss[loss=1.073, ctc_loss=1.018, cr_loss=0.2753, over 16572.00 frames. ], tot_loss[loss=1.213, ctc_loss=1.172, cr_loss=0.205, over 3011713.51 frames. ], batch size: 66, lr: 4.28e-02, grad_scale: 4.0 2024-09-22 11:10:31,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2100.0, ans=0.8265 2024-09-22 11:10:43,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.55 vs. limit=5.525 2024-09-22 11:10:54,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=5.536666666666667 2024-09-22 11:11:03,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=4.858666666666666 2024-09-22 11:11:13,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=8.3225 2024-09-22 11:11:32,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=9.18 2024-09-22 11:11:35,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2240.0, ans=0.2776 2024-09-22 11:11:46,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=8.34 2024-09-22 11:12:01,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2286.6666666666665, ans=0.11425 2024-09-22 11:12:01,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=9.215 2024-09-22 11:12:03,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2286.6666666666665, ans=0.27713333333333334 2024-09-22 11:12:10,286 INFO [train.py:1198] (0/4) Epoch 1, batch 500, loss[loss=0.9221, ctc_loss=0.858, cr_loss=0.3203, over 16637.00 frames. ], tot_loss[loss=1.179, ctc_loss=1.132, cr_loss=0.2332, over 3090999.52 frames. ], batch size: 37, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:12:15,704 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.330e+02 2.678e+02 3.368e+02 4.496e+02 8.489e+02, threshold=6.737e+02, percent-clipped=3.0 2024-09-22 11:12:18,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=8.375 2024-09-22 11:12:20,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=8.375 2024-09-22 11:12:57,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2426.6666666666665, ans=0.38625 2024-09-22 11:12:59,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2426.6666666666665, ans=0.38625 2024-09-22 11:13:14,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=2473.3333333333335, ans=4.494666666666666 2024-09-22 11:13:15,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2473.3333333333335, ans=0.3840625 2024-09-22 11:13:18,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.71 vs. limit=6.236666666666666 2024-09-22 11:13:42,939 INFO [train.py:1198] (0/4) Epoch 1, batch 550, loss[loss=1.017, ctc_loss=0.9459, cr_loss=0.3564, over 16898.00 frames. ], tot_loss[loss=1.138, ctc_loss=1.085, cr_loss=0.2632, over 3158811.89 frames. ], batch size: 58, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:13:45,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=8.4625 2024-09-22 11:13:47,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=5.026666666666666 2024-09-22 11:14:10,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2613.3333333333335, ans=0.041833333333333333 2024-09-22 11:14:11,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2613.3333333333335, ans=0.8085333333333333 2024-09-22 11:14:16,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2660.0, ans=0.10024999999999999 2024-09-22 11:14:28,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.60 vs. limit=6.33 2024-09-22 11:14:49,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=9.53 2024-09-22 11:14:59,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=8.5325 2024-09-22 11:15:02,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2753.3333333333335, ans=0.37093750000000003 2024-09-22 11:15:11,742 INFO [train.py:1198] (0/4) Epoch 1, batch 600, loss[loss=0.954, ctc_loss=0.8635, cr_loss=0.4524, over 17058.00 frames. ], tot_loss[loss=1.096, ctc_loss=1.037, cr_loss=0.295, over 3196993.09 frames. ], batch size: 46, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:15:13,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=2800.0, ans=0.0925 2024-09-22 11:15:17,227 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.480e+02 2.420e+02 3.324e+02 4.180e+02 8.567e+02, threshold=6.647e+02, percent-clipped=1.0 2024-09-22 11:15:31,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=9.635 2024-09-22 11:15:47,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2893.3333333333335, ans=0.364375 2024-09-22 11:16:16,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=9.705 2024-09-22 11:16:20,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.16 vs. limit=5.735 2024-09-22 11:16:26,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=9.74 2024-09-22 11:16:30,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=8.620000000000001 2024-09-22 11:16:42,202 INFO [train.py:1198] (0/4) Epoch 1, batch 650, loss[loss=0.8358, ctc_loss=0.7601, cr_loss=0.3787, over 17340.00 frames. ], tot_loss[loss=1.05, ctc_loss=0.9846, cr_loss=0.3249, over 3240842.68 frames. ], batch size: 48, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:16:52,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3033.3333333333335, ans=0.3578125 2024-09-22 11:16:55,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=8.6375 2024-09-22 11:17:42,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3173.3333333333335, ans=0.35125 2024-09-22 11:17:49,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3173.3333333333335, ans=0.0286 2024-09-22 11:18:14,947 INFO [train.py:1198] (0/4) Epoch 1, batch 700, loss[loss=0.7901, ctc_loss=0.6982, cr_loss=0.4594, over 17130.00 frames. ], tot_loss[loss=1, ctc_loss=0.9297, cr_loss=0.3522, over 3265187.97 frames. ], batch size: 48, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:18:17,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.31 vs. limit=8.725 2024-09-22 11:18:20,265 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.450e+02 2.357e+02 3.031e+02 4.358e+02 1.002e+03, threshold=6.062e+02, percent-clipped=7.0 2024-09-22 11:18:20,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3266.6666666666665, ans=0.346875 2024-09-22 11:18:40,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=8.7425 2024-09-22 11:18:42,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=9.985 2024-09-22 11:18:46,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=9.985 2024-09-22 11:18:47,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3313.3333333333335, ans=0.34468750000000004 2024-09-22 11:19:05,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.12 vs. limit=6.68 2024-09-22 11:19:23,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.93 vs. limit=6.703333333333333 2024-09-22 11:19:25,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=5.851666666666667 2024-09-22 11:19:28,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3453.3333333333335, ans=0.26546666666666663 2024-09-22 11:19:36,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=10.09 2024-09-22 11:19:44,683 INFO [train.py:1198] (0/4) Epoch 1, batch 750, loss[loss=0.6772, ctc_loss=0.5986, cr_loss=0.3931, over 17121.00 frames. ], tot_loss[loss=0.9538, ctc_loss=0.8788, cr_loss=0.3752, over 3289780.92 frames. ], batch size: 40, lr: 4.49e-02, grad_scale: 8.0 2024-09-22 11:20:28,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3593.3333333333335, ans=0.01915 2024-09-22 11:20:28,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=5.437333333333333 2024-09-22 11:20:39,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=10.23 2024-09-22 11:21:13,174 INFO [train.py:1198] (0/4) Epoch 1, batch 800, loss[loss=0.7315, ctc_loss=0.6596, cr_loss=0.3594, over 17029.00 frames. ], tot_loss[loss=0.9003, ctc_loss=0.8226, cr_loss=0.3885, over 3310793.12 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2024-09-22 11:21:18,259 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+02 2.614e+02 4.071e+02 6.304e+02 1.473e+03, threshold=8.142e+02, percent-clipped=26.0 2024-09-22 11:21:27,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3733.3333333333335, ans=0.325 2024-09-22 11:21:38,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3780.0, ans=3.567 2024-09-22 11:21:53,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=3826.6666666666665, ans=0.03475 2024-09-22 11:21:59,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.19 vs. limit=5.956666666666667 2024-09-22 11:22:31,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=5.568 2024-09-22 11:22:44,174 INFO [train.py:1198] (0/4) Epoch 1, batch 850, loss[loss=0.6462, ctc_loss=0.5646, cr_loss=0.4079, over 17157.00 frames. ], tot_loss[loss=0.8538, ctc_loss=0.7747, cr_loss=0.3955, over 3311564.96 frames. ], batch size: 45, lr: 4.49e-02, grad_scale: 16.0 2024-09-22 11:22:46,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3966.6666666666665, ans=0.01075000000000001 2024-09-22 11:23:08,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4013.3333333333335, ans=0.7595333333333334 2024-09-22 11:23:27,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4060.0, ans=0.04975 2024-09-22 11:23:30,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4060.0, ans=0.3096875 2024-09-22 11:23:35,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=4106.666666666667, ans=0.025 2024-09-22 11:23:51,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=4106.666666666667, ans=0.025 2024-09-22 11:24:07,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4153.333333333333, ans=0.3053125 2024-09-22 11:24:10,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4200.0, ans=0.303125 2024-09-22 11:24:11,782 INFO [train.py:1198] (0/4) Epoch 1, batch 900, loss[loss=0.5617, ctc_loss=0.4841, cr_loss=0.388, over 17041.00 frames. ], tot_loss[loss=0.8079, ctc_loss=0.7283, cr_loss=0.3983, over 3310152.68 frames. ], batch size: 39, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:24:16,914 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+02 2.836e+02 3.761e+02 6.423e+02 1.326e+03, threshold=7.521e+02, percent-clipped=10.0 2024-09-22 11:24:19,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4200.0, ans=0.303125 2024-09-22 11:24:27,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4246.666666666667, ans=0.009946376811594203 2024-09-22 11:24:43,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=10.685 2024-09-22 11:25:07,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4340.0, ans=0.2965625 2024-09-22 11:25:12,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4340.0, ans=7.7125 2024-09-22 11:25:15,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4340.0, ans=0.04858333333333334 2024-09-22 11:25:29,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4386.666666666667, ans=0.04838888888888889 2024-09-22 11:25:37,368 INFO [train.py:1198] (0/4) Epoch 1, batch 950, loss[loss=0.5504, ctc_loss=0.4737, cr_loss=0.3835, over 17170.00 frames. ], tot_loss[loss=0.7666, ctc_loss=0.6861, cr_loss=0.4023, over 3317727.45 frames. ], batch size: 41, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:25:47,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4433.333333333333, ans=0.7943333333333333 2024-09-22 11:25:49,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4433.333333333333, ans=0.09899494936611666 2024-09-22 11:25:57,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4480.0, ans=0.2552 2024-09-22 11:26:01,240 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.696e-03 2024-09-22 11:26:04,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4480.0, ans=0.7432000000000001 2024-09-22 11:26:13,189 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.938e-03 2024-09-22 11:26:19,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4526.666666666667, ans=0.2547333333333333 2024-09-22 11:26:41,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4573.333333333333, ans=0.7399333333333333 2024-09-22 11:27:05,077 INFO [train.py:1198] (0/4) Epoch 1, batch 1000, loss[loss=0.6057, ctc_loss=0.5151, cr_loss=0.4532, over 16673.00 frames. ], tot_loss[loss=0.7301, ctc_loss=0.6486, cr_loss=0.4072, over 3321595.92 frames. ], batch size: 61, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:27:07,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=5.866666666666667 2024-09-22 11:27:09,935 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.795e+02 3.933e+02 5.449e+02 1.373e+03, threshold=7.866e+02, percent-clipped=13.0 2024-09-22 11:27:30,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4713.333333333333, ans=0.2790625 2024-09-22 11:28:34,217 INFO [train.py:1198] (0/4) Epoch 1, batch 1050, loss[loss=0.5547, ctc_loss=0.4641, cr_loss=0.453, over 17098.00 frames. ], tot_loss[loss=0.6971, ctc_loss=0.6145, cr_loss=0.4126, over 3334152.10 frames. ], batch size: 49, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:28:51,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4946.666666666667, ans=0.04605555555555556 2024-09-22 11:28:56,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4946.666666666667, ans=0.7268666666666667 2024-09-22 11:29:08,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.19 vs. limit=6.236666666666666 2024-09-22 11:29:39,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5040.0, ans=0.009773913043478261 2024-09-22 11:29:48,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=5086.666666666667, ans=0.03410416666666667 2024-09-22 11:30:01,212 INFO [train.py:1198] (0/4) Epoch 1, batch 1100, loss[loss=0.6306, ctc_loss=0.5271, cr_loss=0.5172, over 17217.00 frames. ], tot_loss[loss=0.6662, ctc_loss=0.5829, cr_loss=0.4166, over 3348662.14 frames. ], batch size: 50, lr: 4.48e-02, grad_scale: 16.0 2024-09-22 11:30:06,284 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+02 2.354e+02 3.241e+02 4.881e+02 1.077e+03, threshold=6.482e+02, percent-clipped=7.0 2024-09-22 11:30:16,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=5180.0, ans=0.02 2024-09-22 11:30:41,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5226.666666666667, ans=0.24773333333333333 2024-09-22 11:31:01,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=11.455 2024-09-22 11:31:12,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=5320.0, ans=0.033375 2024-09-22 11:31:24,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=5366.666666666667, ans=0.7121666666666667 2024-09-22 11:31:25,797 INFO [train.py:1198] (0/4) Epoch 1, batch 1150, loss[loss=0.5433, ctc_loss=0.4578, cr_loss=0.4277, over 17027.00 frames. ], tot_loss[loss=0.6414, ctc_loss=0.5574, cr_loss=0.4199, over 3342975.55 frames. ], batch size: 44, lr: 4.47e-02, grad_scale: 16.0 2024-09-22 11:31:29,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=5366.666666666667, ans=0.24843749999999998 2024-09-22 11:31:52,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5413.333333333333, ans=0.24625000000000002 2024-09-22 11:32:04,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=9.5475 2024-09-22 11:32:05,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=5460.0, ans=0.009682608695652174 2024-09-22 11:32:06,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=6.184 2024-09-22 11:32:14,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=5460.0, ans=0.025 2024-09-22 11:32:22,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=5506.666666666667, ans=0.241875 2024-09-22 11:32:22,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=5506.666666666667, ans=0.241875 2024-09-22 11:32:51,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5553.333333333333, ans=0.7056333333333333 2024-09-22 11:32:51,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=11.665 2024-09-22 11:32:58,123 INFO [train.py:1198] (0/4) Epoch 1, batch 1200, loss[loss=0.5988, ctc_loss=0.5074, cr_loss=0.4567, over 17235.00 frames. ], tot_loss[loss=0.6184, ctc_loss=0.5337, cr_loss=0.4234, over 3351748.56 frames. ], batch size: 55, lr: 4.47e-02, grad_scale: 32.0 2024-09-22 11:33:02,961 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.688e+02 2.423e+02 3.283e+02 4.569e+02 8.108e+02, threshold=6.566e+02, percent-clipped=6.0 2024-09-22 11:33:54,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5740.0, ans=0.23093750000000002 2024-09-22 11:34:11,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5786.666666666667, ans=0.22875 2024-09-22 11:34:23,533 INFO [train.py:1198] (0/4) Epoch 1, batch 1250, loss[loss=0.4692, ctc_loss=0.391, cr_loss=0.3909, over 16950.00 frames. ], tot_loss[loss=0.6004, ctc_loss=0.5151, cr_loss=0.4265, over 3338112.83 frames. ], batch size: 42, lr: 4.47e-02, grad_scale: 32.0 2024-09-22 11:34:48,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=5880.0, ans=0.04216666666666667 2024-09-22 11:34:49,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=9.705 2024-09-22 11:35:02,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=11.945 2024-09-22 11:35:13,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5973.333333333333, ans=0.24026666666666666 2024-09-22 11:35:14,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=9.74 2024-09-22 11:35:17,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=5973.333333333333, ans=0.04177777777777778 2024-09-22 11:35:23,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5973.333333333333, ans=0.0 2024-09-22 11:35:36,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=12.015 2024-09-22 11:35:47,098 INFO [train.py:1198] (0/4) Epoch 1, batch 1300, loss[loss=0.4147, ctc_loss=0.3362, cr_loss=0.3921, over 17039.00 frames. ], tot_loss[loss=0.5821, ctc_loss=0.4966, cr_loss=0.4277, over 3336939.75 frames. ], batch size: 39, lr: 4.47e-02, grad_scale: 32.0 2024-09-22 11:35:52,195 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.553e+02 2.150e+02 2.604e+02 3.500e+02 8.408e+02, threshold=5.208e+02, percent-clipped=5.0 2024-09-22 11:36:26,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=9.81 2024-09-22 11:36:42,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=6206.666666666667, ans=0.04080555555555555 2024-09-22 11:37:08,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=6253.333333333333, ans=0.20687499999999998 2024-09-22 11:37:12,779 INFO [train.py:1198] (0/4) Epoch 1, batch 1350, loss[loss=0.5137, ctc_loss=0.427, cr_loss=0.4332, over 17225.00 frames. ], tot_loss[loss=0.5648, ctc_loss=0.479, cr_loss=0.4294, over 3339484.73 frames. ], batch size: 50, lr: 4.46e-02, grad_scale: 32.0 2024-09-22 11:37:21,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6300.0, ans=0.237 2024-09-22 11:37:42,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=6346.666666666667, ans=0.07 2024-09-22 11:37:58,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=6393.333333333333, ans=0.2003125 2024-09-22 11:38:04,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=6393.333333333333, ans=0.6762333333333334 2024-09-22 11:38:11,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.69 vs. limit=6.61 2024-09-22 11:38:17,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=6440.0, ans=0.198125 2024-09-22 11:38:40,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=12.4 2024-09-22 11:38:41,180 INFO [train.py:1198] (0/4) Epoch 1, batch 1400, loss[loss=0.4137, ctc_loss=0.3341, cr_loss=0.3977, over 17256.00 frames. ], tot_loss[loss=0.5483, ctc_loss=0.4622, cr_loss=0.4305, over 3354862.25 frames. ], batch size: 44, lr: 4.46e-02, grad_scale: 32.0 2024-09-22 11:38:46,073 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+02 2.453e+02 3.278e+02 5.014e+02 1.044e+03, threshold=6.556e+02, percent-clipped=21.0 2024-09-22 11:39:06,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=6580.0, ans=0.19156250000000002 2024-09-22 11:39:09,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=6580.0, ans=0.19156250000000002 2024-09-22 11:39:14,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=9.985 2024-09-22 11:39:46,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=6673.333333333333, ans=0.03886111111111112 2024-09-22 11:40:05,930 INFO [train.py:1198] (0/4) Epoch 1, batch 1450, loss[loss=0.483, ctc_loss=0.3905, cr_loss=0.4627, over 17159.00 frames. ], tot_loss[loss=0.5373, ctc_loss=0.4508, cr_loss=0.4322, over 3359406.28 frames. ], batch size: 45, lr: 4.46e-02, grad_scale: 32.0 2024-09-22 11:40:09,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=6766.666666666667, ans=0.009398550724637682 2024-09-22 11:40:45,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=6860.0, ans=0.17843750000000003 2024-09-22 11:40:50,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=6860.0, ans=0.17843750000000003 2024-09-22 11:40:57,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=6906.666666666667, ans=0.17625000000000002 2024-09-22 11:41:02,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6906.666666666667, ans=0.23093333333333332 2024-09-22 11:41:18,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6953.333333333333, ans=0.1740625 2024-09-22 11:41:24,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=6953.333333333333, ans=0.03769444444444445 2024-09-22 11:41:27,762 INFO [train.py:1198] (0/4) Epoch 1, batch 1500, loss[loss=0.4363, ctc_loss=0.347, cr_loss=0.4466, over 17249.00 frames. ], tot_loss[loss=0.5242, ctc_loss=0.4377, cr_loss=0.4329, over 3367432.73 frames. ], batch size: 42, lr: 4.46e-02, grad_scale: 32.0 2024-09-22 11:41:32,712 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+02 2.217e+02 3.113e+02 4.719e+02 9.117e+02, threshold=6.226e+02, percent-clipped=8.0 2024-09-22 11:41:33,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=7000.0, ans=0.171875 2024-09-22 11:41:33,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=10.125 2024-09-22 11:41:53,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=7046.666666666667, ans=0.03730555555555556 2024-09-22 11:42:07,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=10.16 2024-09-22 11:42:10,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=12.82 2024-09-22 11:42:13,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=7093.333333333333, ans=0.0 2024-09-22 11:42:20,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=10.1775 2024-09-22 11:42:37,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=7186.666666666667, ans=0.16312500000000002 2024-09-22 11:42:56,584 INFO [train.py:1198] (0/4) Epoch 1, batch 1550, loss[loss=0.4352, ctc_loss=0.3481, cr_loss=0.4354, over 17294.00 frames. ], tot_loss[loss=0.5132, ctc_loss=0.4266, cr_loss=0.4332, over 3368003.01 frames. ], batch size: 46, lr: 4.45e-02, grad_scale: 32.0 2024-09-22 11:43:26,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7280.0, ans=0.2272 2024-09-22 11:43:52,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7373.333333333333, ans=0.22626666666666667 2024-09-22 11:44:10,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=7420.0, ans=0.04949747468305833 2024-09-22 11:44:18,842 INFO [train.py:1198] (0/4) Epoch 1, batch 1600, loss[loss=0.4258, ctc_loss=0.3476, cr_loss=0.3912, over 17179.00 frames. ], tot_loss[loss=0.5034, ctc_loss=0.4169, cr_loss=0.4327, over 3364724.40 frames. ], batch size: 41, lr: 4.45e-02, grad_scale: 32.0 2024-09-22 11:44:23,627 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.570e+02 2.041e+02 2.519e+02 3.425e+02 6.490e+02, threshold=5.038e+02, percent-clipped=3.0 2024-09-22 11:44:28,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=7466.666666666667, ans=0.035555555555555556 2024-09-22 11:44:47,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=7513.333333333333, ans=0.6370333333333333 2024-09-22 11:44:47,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=10.317499999999999 2024-09-22 11:45:09,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=7606.666666666667, ans=0.034972222222222224 2024-09-22 11:45:31,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=7653.333333333333, ans=10.370000000000001 2024-09-22 11:45:42,063 INFO [train.py:1198] (0/4) Epoch 1, batch 1650, loss[loss=0.4881, ctc_loss=0.3969, cr_loss=0.456, over 17355.00 frames. ], tot_loss[loss=0.4971, ctc_loss=0.4104, cr_loss=0.4337, over 3360216.94 frames. ], batch size: 48, lr: 4.45e-02, grad_scale: 32.0 2024-09-22 11:45:51,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=7700.0, ans=0.034583333333333334 2024-09-22 11:46:00,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=7746.666666666667, ans=0.034388888888888886 2024-09-22 11:46:20,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=7793.333333333333, ans=0.13468750000000002 2024-09-22 11:46:31,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=7840.0, ans=0.009165217391304348 2024-09-22 11:46:41,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=7840.0, ans=0.1325 2024-09-22 11:46:51,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=7886.666666666667, ans=0.033805555555555554 2024-09-22 11:46:56,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=7886.666666666667, ans=0.04949747468305833 2024-09-22 11:47:00,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=7886.666666666667, ans=0.1303125 2024-09-22 11:47:05,466 INFO [train.py:1198] (0/4) Epoch 1, batch 1700, loss[loss=0.4554, ctc_loss=0.3629, cr_loss=0.4622, over 17303.00 frames. ], tot_loss[loss=0.4895, ctc_loss=0.4025, cr_loss=0.4346, over 3365143.93 frames. ], batch size: 51, lr: 4.44e-02, grad_scale: 32.0 2024-09-22 11:47:10,189 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.533e+02 2.014e+02 2.718e+02 3.727e+02 5.677e+02, threshold=5.436e+02, percent-clipped=4.0 2024-09-22 11:47:24,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7980.0, ans=0.2202 2024-09-22 11:47:31,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=7980.0, ans=0.6207 2024-09-22 11:47:40,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=8026.666666666667, ans=0.125 2024-09-22 11:47:46,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=8026.666666666667, ans=0.03322222222222222 2024-09-22 11:48:17,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=8120.0, ans=0.125 2024-09-22 11:48:30,597 INFO [train.py:1198] (0/4) Epoch 1, batch 1750, loss[loss=0.4221, ctc_loss=0.3375, cr_loss=0.4226, over 17177.00 frames. ], tot_loss[loss=0.4817, ctc_loss=0.3949, cr_loss=0.434, over 3361031.47 frames. ], batch size: 41, lr: 4.44e-02, grad_scale: 32.0 2024-09-22 11:48:48,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=10.58 2024-09-22 11:49:20,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=8306.666666666666, ans=0.6092666666666667 2024-09-22 11:49:48,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=8353.333333333334, ans=0.125 2024-09-22 11:49:50,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=8353.333333333334, ans=0.125 2024-09-22 11:49:53,988 INFO [train.py:1198] (0/4) Epoch 1, batch 1800, loss[loss=0.4653, ctc_loss=0.3784, cr_loss=0.4342, over 17295.00 frames. ], tot_loss[loss=0.4761, ctc_loss=0.3892, cr_loss=0.4347, over 3359743.10 frames. ], batch size: 51, lr: 4.44e-02, grad_scale: 32.0 2024-09-22 11:49:58,879 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.382e+02 1.919e+02 2.429e+02 3.208e+02 6.110e+02, threshold=4.858e+02, percent-clipped=5.0 2024-09-22 11:50:04,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8400.0, ans=0.216 2024-09-22 11:50:11,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=13.834999999999999 2024-09-22 11:50:13,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=8446.666666666666, ans=0.125 2024-09-22 11:50:25,052 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.808e-03 2024-09-22 11:50:34,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8493.333333333334, ans=0.21506666666666666 2024-09-22 11:50:50,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8540.0, ans=0.2146 2024-09-22 11:51:08,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=8586.666666666666, ans=0.04949747468305833 2024-09-22 11:51:08,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=8586.666666666666, ans=0.0 2024-09-22 11:51:14,595 INFO [train.py:1198] (0/4) Epoch 1, batch 1850, loss[loss=0.4395, ctc_loss=0.3612, cr_loss=0.3917, over 17022.00 frames. ], tot_loss[loss=0.472, ctc_loss=0.3848, cr_loss=0.4357, over 3354000.24 frames. ], batch size: 51, lr: 4.43e-02, grad_scale: 32.0 2024-09-22 11:51:14,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=8633.333333333334, ans=0.5978333333333334 2024-09-22 11:51:29,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.94 vs. limit=7.17 2024-09-22 11:51:37,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8680.0, ans=0.2132 2024-09-22 11:51:58,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=14.044999999999998 2024-09-22 11:52:00,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=8726.666666666666, ans=0.008972463768115942 2024-09-22 11:52:00,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=8726.666666666666, ans=0.008972463768115942 2024-09-22 11:52:07,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=8773.333333333334, ans=0.2 2024-09-22 11:52:40,378 INFO [train.py:1198] (0/4) Epoch 1, batch 1900, loss[loss=0.5117, ctc_loss=0.4264, cr_loss=0.4263, over 15022.00 frames. ], tot_loss[loss=0.4673, ctc_loss=0.3801, cr_loss=0.436, over 3357965.60 frames. ], batch size: 89, lr: 4.43e-02, grad_scale: 32.0 2024-09-22 11:52:44,917 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.532e+02 1.950e+02 2.731e+02 3.550e+02 1.054e+03, threshold=5.462e+02, percent-clipped=8.0 2024-09-22 11:53:15,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8960.0, ans=0.2104 2024-09-22 11:53:16,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=5.792 2024-09-22 11:53:23,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=10.86 2024-09-22 11:53:37,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=9006.666666666666, ans=0.20993333333333333 2024-09-22 11:53:43,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=7.602666666666666 2024-09-22 11:54:02,943 INFO [train.py:1198] (0/4) Epoch 1, batch 1950, loss[loss=0.4341, ctc_loss=0.3489, cr_loss=0.4261, over 17299.00 frames. ], tot_loss[loss=0.463, ctc_loss=0.3757, cr_loss=0.4365, over 3360380.76 frames. ], batch size: 49, lr: 4.43e-02, grad_scale: 32.0 2024-09-22 11:54:31,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=9146.666666666666, ans=0.008881159420289855 2024-09-22 11:55:26,420 INFO [train.py:1198] (0/4) Epoch 1, batch 2000, loss[loss=0.4566, ctc_loss=0.3627, cr_loss=0.4693, over 17244.00 frames. ], tot_loss[loss=0.4562, ctc_loss=0.3693, cr_loss=0.4348, over 3361906.82 frames. ], batch size: 55, lr: 4.42e-02, grad_scale: 32.0 2024-09-22 11:55:31,236 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.560e+02 1.918e+02 2.442e+02 3.354e+02 7.763e+02, threshold=4.883e+02, percent-clipped=5.0 2024-09-22 11:55:33,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=9333.333333333334, ans=0.5733333333333334 2024-09-22 11:56:11,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=9426.666666666666, ans=0.008820289855072464 2024-09-22 11:56:24,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=9473.333333333334, ans=0.3421 2024-09-22 11:56:31,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=9520.0, ans=0.5668 2024-09-22 11:56:37,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9520.0, ans=0.125 2024-09-22 11:56:46,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=9520.0, ans=0.025 2024-09-22 11:56:49,512 INFO [train.py:1198] (0/4) Epoch 1, batch 2050, loss[loss=0.416, ctc_loss=0.335, cr_loss=0.4048, over 17263.00 frames. ], tot_loss[loss=0.4528, ctc_loss=0.3657, cr_loss=0.4355, over 3362848.54 frames. ], batch size: 44, lr: 4.42e-02, grad_scale: 32.0 2024-09-22 11:56:56,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=9566.666666666666, ans=0.008789855072463769 2024-09-22 11:57:08,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.47 vs. limit=14.71 2024-09-22 11:57:39,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=9706.666666666666, ans=0.125 2024-09-22 11:57:40,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=9706.666666666666, ans=0.025 2024-09-22 11:57:42,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9706.666666666666, ans=0.125 2024-09-22 11:58:11,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=9753.333333333334, ans=0.026027777777777778 2024-09-22 11:58:13,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=14.85 2024-09-22 11:58:14,744 INFO [train.py:1198] (0/4) Epoch 1, batch 2100, loss[loss=0.4992, ctc_loss=0.4046, cr_loss=0.4728, over 17219.00 frames. ], tot_loss[loss=0.4508, ctc_loss=0.3634, cr_loss=0.4372, over 3366111.13 frames. ], batch size: 55, lr: 4.42e-02, grad_scale: 32.0 2024-09-22 11:58:19,529 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.503e+02 1.883e+02 2.281e+02 3.077e+02 7.464e+02, threshold=4.562e+02, percent-clipped=6.0 2024-09-22 11:58:50,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9893.333333333334, ans=0.20106666666666667 2024-09-22 11:59:12,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=9940.0, ans=0.125 2024-09-22 11:59:15,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9940.0, ans=0.125 2024-09-22 11:59:22,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=9986.666666666666, ans=0.04949747468305833 2024-09-22 11:59:37,508 INFO [train.py:1198] (0/4) Epoch 1, batch 2150, loss[loss=0.4414, ctc_loss=0.348, cr_loss=0.4669, over 17006.00 frames. ], tot_loss[loss=0.4487, ctc_loss=0.361, cr_loss=0.4384, over 3361324.01 frames. ], batch size: 56, lr: 4.41e-02, grad_scale: 32.0 2024-09-22 11:59:41,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=11.2625 2024-09-22 11:59:53,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=10080.0, ans=0.02466666666666667 2024-09-22 12:00:22,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=11.2975 2024-09-22 12:00:30,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10173.333333333334, ans=0.19826666666666665 2024-09-22 12:00:30,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10173.333333333334, ans=0.0 2024-09-22 12:00:43,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10220.0, ans=0.1978 2024-09-22 12:00:58,121 INFO [train.py:1198] (0/4) Epoch 1, batch 2200, loss[loss=0.4445, ctc_loss=0.3541, cr_loss=0.4519, over 17108.00 frames. ], tot_loss[loss=0.4452, ctc_loss=0.3576, cr_loss=0.438, over 3364958.94 frames. ], batch size: 49, lr: 4.41e-02, grad_scale: 32.0 2024-09-22 12:01:00,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=10266.666666666666, ans=0.125 2024-09-22 12:01:02,862 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+02 2.024e+02 2.529e+02 3.777e+02 5.736e+02, threshold=5.059e+02, percent-clipped=14.0 2024-09-22 12:01:18,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=10313.333333333334, ans=0.02369444444444444 2024-09-22 12:01:29,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=10360.0, ans=0.008617391304347827 2024-09-22 12:02:16,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=10453.333333333334, ans=0.5341333333333333 2024-09-22 12:02:21,446 INFO [train.py:1198] (0/4) Epoch 1, batch 2250, loss[loss=0.3749, ctc_loss=0.297, cr_loss=0.3896, over 16936.00 frames. ], tot_loss[loss=0.443, ctc_loss=0.3554, cr_loss=0.4378, over 3358968.89 frames. ], batch size: 42, lr: 4.40e-02, grad_scale: 32.0 2024-09-22 12:03:00,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=10593.333333333334, ans=0.125 2024-09-22 12:03:01,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=10593.333333333334, ans=0.008566666666666667 2024-09-22 12:03:07,046 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:03:18,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.69 vs. limit=15.48 2024-09-22 12:03:38,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10686.666666666666, ans=0.19313333333333332 2024-09-22 12:03:46,762 INFO [train.py:1198] (0/4) Epoch 1, batch 2300, loss[loss=0.4375, ctc_loss=0.3462, cr_loss=0.4566, over 17003.00 frames. ], tot_loss[loss=0.4401, ctc_loss=0.3526, cr_loss=0.4377, over 3364083.14 frames. ], batch size: 51, lr: 4.40e-02, grad_scale: 32.0 2024-09-22 12:03:51,577 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.433e+02 1.850e+02 2.386e+02 2.971e+02 5.038e+02, threshold=4.772e+02, percent-clipped=0.0 2024-09-22 12:04:24,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=11.559999999999999 2024-09-22 12:04:25,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=10826.666666666666, ans=0.125 2024-09-22 12:05:08,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10966.666666666666, ans=0.125 2024-09-22 12:05:09,820 INFO [train.py:1198] (0/4) Epoch 1, batch 2350, loss[loss=0.393, ctc_loss=0.3118, cr_loss=0.406, over 17167.00 frames. ], tot_loss[loss=0.4366, ctc_loss=0.3492, cr_loss=0.4373, over 3369549.44 frames. ], batch size: 45, lr: 4.40e-02, grad_scale: 32.0 2024-09-22 12:05:14,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10966.666666666666, ans=0.0 2024-09-22 12:05:33,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=11013.333333333334, ans=0.125 2024-09-22 12:05:35,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=11013.333333333334, ans=0.020777777777777773 2024-09-22 12:05:53,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=11060.0, ans=0.125 2024-09-22 12:06:06,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11106.666666666666, ans=0.125 2024-09-22 12:06:09,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=11106.666666666666, ans=0.008455072463768117 2024-09-22 12:06:16,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.48 vs. limit=10.576666666666668 2024-09-22 12:06:19,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=11.682500000000001 2024-09-22 12:06:30,102 INFO [train.py:1198] (0/4) Epoch 1, batch 2400, loss[loss=0.4576, ctc_loss=0.3649, cr_loss=0.4638, over 16397.00 frames. ], tot_loss[loss=0.4332, ctc_loss=0.3455, cr_loss=0.4381, over 3374843.45 frames. ], batch size: 66, lr: 4.39e-02, grad_scale: 32.0 2024-09-22 12:06:32,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=11200.0, ans=0.125 2024-09-22 12:06:37,259 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.394e+02 1.777e+02 2.025e+02 2.571e+02 5.493e+02, threshold=4.051e+02, percent-clipped=2.0 2024-09-22 12:06:39,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.08 vs. limit=15.9 2024-09-22 12:06:53,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=11246.666666666666, ans=0.01980555555555556 2024-09-22 12:06:53,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=11246.666666666666, ans=0.125 2024-09-22 12:06:53,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=11246.666666666666, ans=0.125 2024-09-22 12:06:54,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=11246.666666666666, ans=0.01980555555555556 2024-09-22 12:07:15,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=11293.333333333334, ans=0.008414492753623189 2024-09-22 12:07:26,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=11340.0, ans=0.035 2024-09-22 12:07:57,472 INFO [train.py:1198] (0/4) Epoch 1, batch 2450, loss[loss=0.4771, ctc_loss=0.3766, cr_loss=0.5026, over 17015.00 frames. ], tot_loss[loss=0.431, ctc_loss=0.3434, cr_loss=0.438, over 3371262.62 frames. ], batch size: 52, lr: 4.39e-02, grad_scale: 64.0 2024-09-22 12:08:51,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=11573.333333333334, ans=0.125 2024-09-22 12:09:09,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=11.8575 2024-09-22 12:09:18,266 INFO [train.py:1198] (0/4) Epoch 1, batch 2500, loss[loss=0.3709, ctc_loss=0.2853, cr_loss=0.428, over 17306.00 frames. ], tot_loss[loss=0.4273, ctc_loss=0.3401, cr_loss=0.436, over 3360308.79 frames. ], batch size: 46, lr: 4.38e-02, grad_scale: 64.0 2024-09-22 12:09:22,935 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.438e+02 2.074e+02 2.928e+02 4.593e+02 9.871e+02, threshold=5.856e+02, percent-clipped=30.0 2024-09-22 12:09:23,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=11666.666666666666, ans=0.125 2024-09-22 12:09:24,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=11666.666666666666, ans=0.008333333333333333 2024-09-22 12:09:51,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=11760.0, ans=0.01766666666666667 2024-09-22 12:10:04,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=11760.0, ans=0.01766666666666667 2024-09-22 12:10:17,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=11806.666666666666, ans=0.07 2024-09-22 12:10:40,846 INFO [train.py:1198] (0/4) Epoch 1, batch 2550, loss[loss=0.3686, ctc_loss=0.2856, cr_loss=0.4147, over 17026.00 frames. ], tot_loss[loss=0.4253, ctc_loss=0.3381, cr_loss=0.4358, over 3365285.14 frames. ], batch size: 39, lr: 4.38e-02, grad_scale: 64.0 2024-09-22 12:11:27,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=12040.0, ans=0.4786 2024-09-22 12:11:36,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=12040.0, ans=0.00825217391304348 2024-09-22 12:11:41,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=12040.0, ans=0.00825217391304348 2024-09-22 12:12:04,477 INFO [train.py:1198] (0/4) Epoch 1, batch 2600, loss[loss=0.3472, ctc_loss=0.2764, cr_loss=0.3542, over 17034.00 frames. ], tot_loss[loss=0.4204, ctc_loss=0.3338, cr_loss=0.433, over 3364563.63 frames. ], batch size: 39, lr: 4.37e-02, grad_scale: 64.0 2024-09-22 12:12:04,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=12133.333333333334, ans=0.07 2024-09-22 12:12:09,353 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.462e+02 1.984e+02 2.668e+02 3.339e+02 5.918e+02, threshold=5.335e+02, percent-clipped=1.0 2024-09-22 12:12:13,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=12.05 2024-09-22 12:12:21,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=12180.0, ans=0.125 2024-09-22 12:12:36,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=12180.0, ans=0.125 2024-09-22 12:12:42,098 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:12:43,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=12226.666666666666, ans=0.015722222222222228 2024-09-22 12:13:18,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=12320.0, ans=0.008191304347826087 2024-09-22 12:13:29,614 INFO [train.py:1198] (0/4) Epoch 1, batch 2650, loss[loss=0.4564, ctc_loss=0.3596, cr_loss=0.4838, over 16765.00 frames. ], tot_loss[loss=0.4191, ctc_loss=0.3324, cr_loss=0.4331, over 3353623.04 frames. ], batch size: 61, lr: 4.37e-02, grad_scale: 64.0 2024-09-22 12:13:44,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=12413.333333333334, ans=0.125 2024-09-22 12:13:52,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.47 vs. limit=16.810000000000002 2024-09-22 12:14:00,613 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:14:19,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=12506.666666666666, ans=0.014555555555555558 2024-09-22 12:14:25,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=9.002666666666666 2024-09-22 12:14:48,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12553.333333333334, ans=0.17446666666666666 2024-09-22 12:14:52,684 INFO [train.py:1198] (0/4) Epoch 1, batch 2700, loss[loss=0.3967, ctc_loss=0.3114, cr_loss=0.4263, over 17224.00 frames. ], tot_loss[loss=0.4161, ctc_loss=0.3295, cr_loss=0.4331, over 3355988.44 frames. ], batch size: 47, lr: 4.36e-02, grad_scale: 64.0 2024-09-22 12:14:53,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=12600.0, ans=0.04949747468305833 2024-09-22 12:14:57,515 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.460e+02 1.915e+02 2.536e+02 3.410e+02 5.700e+02, threshold=5.072e+02, percent-clipped=2.0 2024-09-22 12:15:02,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=12600.0, ans=0.459 2024-09-22 12:15:18,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=12646.666666666666, ans=0.125 2024-09-22 12:15:28,278 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.154e-03 2024-09-22 12:15:45,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12740.0, ans=0.17259999999999998 2024-09-22 12:15:51,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=12740.0, ans=0.125 2024-09-22 12:16:12,421 INFO [train.py:1198] (0/4) Epoch 1, batch 2750, loss[loss=0.3866, ctc_loss=0.3056, cr_loss=0.4051, over 17078.00 frames. ], tot_loss[loss=0.4159, ctc_loss=0.3292, cr_loss=0.4337, over 3359799.22 frames. ], batch size: 46, lr: 4.36e-02, grad_scale: 64.0 2024-09-22 12:16:47,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12926.666666666666, ans=0.125 2024-09-22 12:17:20,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=13020.0, ans=0.012416666666666666 2024-09-22 12:17:28,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=13020.0, ans=0.125 2024-09-22 12:17:40,889 INFO [train.py:1198] (0/4) Epoch 1, batch 2800, loss[loss=0.4381, ctc_loss=0.3456, cr_loss=0.4622, over 17142.00 frames. ], tot_loss[loss=0.4153, ctc_loss=0.3286, cr_loss=0.4334, over 3347739.41 frames. ], batch size: 48, lr: 4.36e-02, grad_scale: 64.0 2024-09-22 12:17:45,584 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.462e+02 1.867e+02 2.104e+02 2.772e+02 6.258e+02, threshold=4.209e+02, percent-clipped=2.0 2024-09-22 12:17:57,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=13113.333333333334, ans=0.008018840579710146 2024-09-22 12:18:43,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.54 vs. limit=17.439999999999998 2024-09-22 12:18:46,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=13253.333333333334, ans=0.125 2024-09-22 12:19:00,771 INFO [train.py:1198] (0/4) Epoch 1, batch 2850, loss[loss=0.4273, ctc_loss=0.3391, cr_loss=0.4412, over 17030.00 frames. ], tot_loss[loss=0.4128, ctc_loss=0.3263, cr_loss=0.4325, over 3352069.58 frames. ], batch size: 52, lr: 4.35e-02, grad_scale: 32.0 2024-09-22 12:19:09,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.74 vs. limit=12.4875 2024-09-22 12:19:12,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=13300.0, ans=0.125 2024-09-22 12:19:27,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=13346.666666666666, ans=0.125 2024-09-22 12:20:04,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=12.54 2024-09-22 12:20:13,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=13486.666666666666, ans=0.125 2024-09-22 12:20:24,536 INFO [train.py:1198] (0/4) Epoch 1, batch 2900, loss[loss=0.3983, ctc_loss=0.3082, cr_loss=0.4509, over 16990.00 frames. ], tot_loss[loss=0.4116, ctc_loss=0.3251, cr_loss=0.4327, over 3357405.50 frames. ], batch size: 53, lr: 4.35e-02, grad_scale: 32.0 2024-09-22 12:20:26,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=13533.333333333334, ans=0.125 2024-09-22 12:20:31,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.490e+02 1.863e+02 2.342e+02 3.249e+02 5.939e+02, threshold=4.685e+02, percent-clipped=7.0 2024-09-22 12:20:56,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=13626.666666666666, ans=12.61 2024-09-22 12:21:13,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13673.333333333334, ans=0.125 2024-09-22 12:21:48,334 INFO [train.py:1198] (0/4) Epoch 1, batch 2950, loss[loss=0.4499, ctc_loss=0.3566, cr_loss=0.4661, over 15065.00 frames. ], tot_loss[loss=0.4097, ctc_loss=0.3232, cr_loss=0.4325, over 3358093.13 frames. ], batch size: 89, lr: 4.34e-02, grad_scale: 32.0 2024-09-22 12:21:52,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=12.6625 2024-09-22 12:22:32,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=13860.0, ans=0.125 2024-09-22 12:22:38,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=13860.0, ans=0.007856521739130436 2024-09-22 12:22:47,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.71 vs. limit=5.086 2024-09-22 12:22:50,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=13906.666666666666, ans=0.007846376811594204 2024-09-22 12:23:13,841 INFO [train.py:1198] (0/4) Epoch 1, batch 3000, loss[loss=0.4918, ctc_loss=0.3995, cr_loss=0.4615, over 11981.00 frames. ], tot_loss[loss=0.4071, ctc_loss=0.3207, cr_loss=0.4316, over 3351866.15 frames. ], batch size: 123, lr: 4.34e-02, grad_scale: 32.0 2024-09-22 12:23:13,842 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 12:23:29,188 INFO [train.py:1230] (0/4) Epoch 1, validation: loss=0.1235, ctc_loss=0.1235, cr_loss=7.044e-15, over 944034.00 frames. 2024-09-22 12:23:29,189 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 12:23:35,639 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.419e+02 1.886e+02 2.276e+02 2.833e+02 5.148e+02, threshold=4.553e+02, percent-clipped=2.0 2024-09-22 12:24:12,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=14093.333333333334, ans=0.40673333333333334 2024-09-22 12:24:30,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14186.666666666666, ans=0.15813333333333335 2024-09-22 12:24:36,271 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=18.14 2024-09-22 12:24:47,936 INFO [train.py:1198] (0/4) Epoch 1, batch 3050, loss[loss=0.4099, ctc_loss=0.319, cr_loss=0.4549, over 17013.00 frames. ], tot_loss[loss=0.404, ctc_loss=0.318, cr_loss=0.4298, over 3355414.85 frames. ], batch size: 51, lr: 4.33e-02, grad_scale: 32.0 2024-09-22 12:24:49,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=14233.333333333334, ans=0.125 2024-09-22 12:24:54,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14233.333333333334, ans=0.125 2024-09-22 12:25:15,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=14280.0, ans=0.125 2024-09-22 12:25:15,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=14280.0, ans=0.0 2024-09-22 12:25:19,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=18.244999999999997 2024-09-22 12:26:09,730 INFO [train.py:1198] (0/4) Epoch 1, batch 3100, loss[loss=0.3897, ctc_loss=0.306, cr_loss=0.4182, over 16943.00 frames. ], tot_loss[loss=0.4012, ctc_loss=0.3153, cr_loss=0.4299, over 3365160.23 frames. ], batch size: 42, lr: 4.33e-02, grad_scale: 32.0 2024-09-22 12:26:12,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=14466.666666666666, ans=0.025 2024-09-22 12:26:15,852 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.374e+02 1.870e+02 2.371e+02 3.057e+02 5.717e+02, threshold=4.743e+02, percent-clipped=5.0 2024-09-22 12:26:22,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=14466.666666666666, ans=0.09899494936611666 2024-09-22 12:26:33,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=14513.333333333334, ans=0.00619444444444444 2024-09-22 12:26:36,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=14513.333333333334, ans=0.125 2024-09-22 12:26:38,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=14513.333333333334, ans=0.007714492753623188 2024-09-22 12:26:50,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=14560.0, ans=0.125 2024-09-22 12:26:57,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=14606.666666666666, ans=0.125 2024-09-22 12:26:59,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=9.842666666666666 2024-09-22 12:27:07,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=14606.666666666666, ans=0.125 2024-09-22 12:27:28,659 INFO [train.py:1198] (0/4) Epoch 1, batch 3150, loss[loss=0.3929, ctc_loss=0.3077, cr_loss=0.4258, over 17205.00 frames. ], tot_loss[loss=0.4016, ctc_loss=0.3155, cr_loss=0.4304, over 3355858.15 frames. ], batch size: 47, lr: 4.32e-02, grad_scale: 32.0 2024-09-22 12:27:37,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=14700.0, ans=0.125 2024-09-22 12:28:47,525 INFO [train.py:1198] (0/4) Epoch 1, batch 3200, loss[loss=0.382, ctc_loss=0.2956, cr_loss=0.4321, over 17293.00 frames. ], tot_loss[loss=0.4021, ctc_loss=0.3158, cr_loss=0.4314, over 3353869.49 frames. ], batch size: 46, lr: 4.32e-02, grad_scale: 32.0 2024-09-22 12:28:53,531 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.398e+02 1.832e+02 2.280e+02 2.861e+02 6.877e+02, threshold=4.560e+02, percent-clipped=3.0 2024-09-22 12:28:53,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=14933.333333333334, ans=0.125 2024-09-22 12:28:59,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=14933.333333333334, ans=0.004444444444444438 2024-09-22 12:29:03,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=13.1175 2024-09-22 12:30:06,102 INFO [train.py:1198] (0/4) Epoch 1, batch 3250, loss[loss=0.4198, ctc_loss=0.3294, cr_loss=0.452, over 16888.00 frames. ], tot_loss[loss=0.4006, ctc_loss=0.3144, cr_loss=0.431, over 3355547.09 frames. ], batch size: 58, lr: 4.31e-02, grad_scale: 32.0 2024-09-22 12:30:30,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=15213.333333333334, ans=0.3675333333333334 2024-09-22 12:30:32,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.86 vs. limit=12.606666666666667 2024-09-22 12:30:35,115 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:30:41,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15260.0, ans=0.1474 2024-09-22 12:30:52,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.79 vs. limit=12.629999999999999 2024-09-22 12:30:58,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=15306.666666666666, ans=0.125 2024-09-22 12:31:08,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=18.98 2024-09-22 12:31:09,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15353.333333333334, ans=0.0 2024-09-22 12:31:11,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=15353.333333333334, ans=0.125 2024-09-22 12:31:16,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=13.2575 2024-09-22 12:31:17,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=15353.333333333334, ans=0.007531884057971014 2024-09-22 12:31:26,914 INFO [train.py:1198] (0/4) Epoch 1, batch 3300, loss[loss=0.3427, ctc_loss=0.264, cr_loss=0.3937, over 17186.00 frames. ], tot_loss[loss=0.3972, ctc_loss=0.3113, cr_loss=0.4296, over 3362337.51 frames. ], batch size: 41, lr: 4.31e-02, grad_scale: 32.0 2024-09-22 12:31:33,367 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.515e+02 1.835e+02 2.401e+02 3.313e+02 5.174e+02, threshold=4.802e+02, percent-clipped=5.0 2024-09-22 12:32:06,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=13.309999999999999 2024-09-22 12:32:10,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=15493.333333333334, ans=0.125 2024-09-22 12:32:11,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=10.197333333333333 2024-09-22 12:32:35,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=15586.666666666666, ans=0.09413333333333332 2024-09-22 12:32:50,477 INFO [train.py:1198] (0/4) Epoch 1, batch 3350, loss[loss=0.4039, ctc_loss=0.3176, cr_loss=0.4313, over 16874.00 frames. ], tot_loss[loss=0.3983, ctc_loss=0.3122, cr_loss=0.4305, over 3342656.23 frames. ], batch size: 58, lr: 4.30e-02, grad_scale: 32.0 2024-09-22 12:33:11,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=10.272 2024-09-22 12:33:16,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15680.0, ans=0.0 2024-09-22 12:33:42,607 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:33:45,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=15773.333333333334, ans=0.125 2024-09-22 12:34:09,474 INFO [train.py:1198] (0/4) Epoch 1, batch 3400, loss[loss=0.398, ctc_loss=0.313, cr_loss=0.4253, over 16713.00 frames. ], tot_loss[loss=0.3953, ctc_loss=0.3095, cr_loss=0.4291, over 3342082.03 frames. ], batch size: 61, lr: 4.29e-02, grad_scale: 32.0 2024-09-22 12:34:15,798 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.414e+02 1.769e+02 2.096e+02 2.628e+02 4.837e+02, threshold=4.193e+02, percent-clipped=1.0 2024-09-22 12:34:19,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=15866.666666666666, ans=0.125 2024-09-22 12:34:20,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=15866.666666666666, ans=0.125 2024-09-22 12:34:20,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=15866.666666666666, ans=0.025 2024-09-22 12:34:29,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=15913.333333333334, ans=0.125 2024-09-22 12:34:52,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=15960.0, ans=0.00016666666666666913 2024-09-22 12:35:00,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=16006.666666666666, ans=0.125 2024-09-22 12:35:12,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=16053.333333333334, ans=0.007379710144927536 2024-09-22 12:35:28,156 INFO [train.py:1198] (0/4) Epoch 1, batch 3450, loss[loss=0.4013, ctc_loss=0.3155, cr_loss=0.4286, over 17012.00 frames. ], tot_loss[loss=0.3936, ctc_loss=0.3082, cr_loss=0.4273, over 3341087.21 frames. ], batch size: 44, lr: 4.29e-02, grad_scale: 32.0 2024-09-22 12:35:34,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16100.0, ans=0.139 2024-09-22 12:36:01,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=16193.333333333334, ans=0.125 2024-09-22 12:36:12,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=13.5725 2024-09-22 12:36:20,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=16240.0, ans=0.125 2024-09-22 12:36:26,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=16240.0, ans=0.125 2024-09-22 12:36:33,377 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:36:34,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=16286.666666666666, ans=0.0 2024-09-22 12:36:42,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=16286.666666666666, ans=0.025 2024-09-22 12:36:48,697 INFO [train.py:1198] (0/4) Epoch 1, batch 3500, loss[loss=0.4072, ctc_loss=0.3207, cr_loss=0.4325, over 16938.00 frames. ], tot_loss[loss=0.3931, ctc_loss=0.3075, cr_loss=0.4276, over 3338355.81 frames. ], batch size: 58, lr: 4.28e-02, grad_scale: 32.0 2024-09-22 12:36:54,800 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.402e+02 1.781e+02 2.318e+02 3.100e+02 5.527e+02, threshold=4.636e+02, percent-clipped=9.0 2024-09-22 12:38:06,639 INFO [train.py:1198] (0/4) Epoch 1, batch 3550, loss[loss=0.4178, ctc_loss=0.3299, cr_loss=0.4397, over 16658.00 frames. ], tot_loss[loss=0.3942, ctc_loss=0.3087, cr_loss=0.4276, over 3329407.49 frames. ], batch size: 66, lr: 4.28e-02, grad_scale: 32.0 2024-09-22 12:38:11,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=16566.666666666668, ans=0.0 2024-09-22 12:38:25,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=16613.333333333332, ans=0.125 2024-09-22 12:39:09,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=16753.333333333332, ans=0.125 2024-09-22 12:39:13,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16753.333333333332, ans=0.13246666666666668 2024-09-22 12:39:22,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.66 vs. limit=9.188333333333333 2024-09-22 12:39:24,818 INFO [train.py:1198] (0/4) Epoch 1, batch 3600, loss[loss=0.3647, ctc_loss=0.2856, cr_loss=0.3954, over 17100.00 frames. ], tot_loss[loss=0.3926, ctc_loss=0.3071, cr_loss=0.4274, over 3335480.44 frames. ], batch size: 49, lr: 4.27e-02, grad_scale: 32.0 2024-09-22 12:39:25,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=16800.0, ans=0.0 2024-09-22 12:39:25,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=16800.0, ans=0.31200000000000006 2024-09-22 12:39:29,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=16800.0, ans=0.125 2024-09-22 12:39:30,801 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.429e+02 1.842e+02 2.057e+02 2.677e+02 5.057e+02, threshold=4.115e+02, percent-clipped=2.0 2024-09-22 12:39:31,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=13.8 2024-09-22 12:40:10,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=10.757333333333332 2024-09-22 12:40:11,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=16940.0, ans=0.4541 2024-09-22 12:40:16,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=16940.0, ans=0.0 2024-09-22 12:40:22,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16940.0, ans=0.13060000000000002 2024-09-22 12:40:43,994 INFO [train.py:1198] (0/4) Epoch 1, batch 3650, loss[loss=0.3541, ctc_loss=0.2667, cr_loss=0.4368, over 17010.00 frames. ], tot_loss[loss=0.3923, ctc_loss=0.3066, cr_loss=0.4285, over 3341795.75 frames. ], batch size: 44, lr: 4.27e-02, grad_scale: 32.0 2024-09-22 12:40:50,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17033.333333333332, ans=0.125 2024-09-22 12:41:15,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.82 vs. limit=13.54 2024-09-22 12:41:20,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=17126.666666666668, ans=0.125 2024-09-22 12:41:25,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=17126.666666666668, ans=0.04949747468305833 2024-09-22 12:41:44,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=17173.333333333332, ans=0.007136231884057972 2024-09-22 12:42:04,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=17266.666666666668, ans=0.0 2024-09-22 12:42:05,822 INFO [train.py:1198] (0/4) Epoch 1, batch 3700, loss[loss=0.3394, ctc_loss=0.2591, cr_loss=0.4016, over 17055.00 frames. ], tot_loss[loss=0.3892, ctc_loss=0.3038, cr_loss=0.427, over 3352595.98 frames. ], batch size: 39, lr: 4.26e-02, grad_scale: 32.0 2024-09-22 12:42:12,057 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.410e+02 1.892e+02 2.660e+02 3.633e+02 5.715e+02, threshold=5.320e+02, percent-clipped=15.0 2024-09-22 12:43:16,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=17453.333333333332, ans=0.0 2024-09-22 12:43:19,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=17453.333333333332, ans=0.125 2024-09-22 12:43:22,576 INFO [train.py:1198] (0/4) Epoch 1, batch 3750, loss[loss=0.4298, ctc_loss=0.3363, cr_loss=0.4674, over 14942.00 frames. ], tot_loss[loss=0.3877, ctc_loss=0.3024, cr_loss=0.4261, over 3345784.76 frames. ], batch size: 89, lr: 4.26e-02, grad_scale: 32.0 2024-09-22 12:43:29,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.38 vs. limit=13.75 2024-09-22 12:43:42,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=17546.666666666668, ans=0.2858666666666667 2024-09-22 12:43:56,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=17593.333333333332, ans=0.007044927536231885 2024-09-22 12:43:56,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=17593.333333333332, ans=0.28423333333333345 2024-09-22 12:44:01,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.09 vs. limit=5.638999999999999 2024-09-22 12:44:27,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=17686.666666666668, ans=0.007024637681159421 2024-09-22 12:44:37,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=17686.666666666668, ans=0.125 2024-09-22 12:44:40,335 INFO [train.py:1198] (0/4) Epoch 1, batch 3800, loss[loss=0.367, ctc_loss=0.2857, cr_loss=0.4068, over 17208.00 frames. ], tot_loss[loss=0.3877, ctc_loss=0.3026, cr_loss=0.4257, over 3334035.42 frames. ], batch size: 47, lr: 4.25e-02, grad_scale: 32.0 2024-09-22 12:44:46,416 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.363e+02 1.780e+02 2.356e+02 3.200e+02 5.376e+02, threshold=4.713e+02, percent-clipped=1.0 2024-09-22 12:45:04,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=14.1675 2024-09-22 12:45:07,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=17780.0, ans=0.007004347826086957 2024-09-22 12:45:24,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.20 vs. limit=14.184999999999999 2024-09-22 12:45:45,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=17920.0, ans=0.125 2024-09-22 12:45:59,544 INFO [train.py:1198] (0/4) Epoch 1, batch 3850, loss[loss=0.3842, ctc_loss=0.2971, cr_loss=0.4356, over 16893.00 frames. ], tot_loss[loss=0.3925, ctc_loss=0.307, cr_loss=0.4277, over 3283650.17 frames. ], batch size: 58, lr: 4.24e-02, grad_scale: 32.0 2024-09-22 12:46:02,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=11.186666666666667 2024-09-22 12:46:12,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17966.666666666668, ans=0.12033333333333332 2024-09-22 12:47:10,827 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-1.pt 2024-09-22 12:48:02,949 INFO [train.py:1198] (0/4) Epoch 2, batch 0, loss[loss=0.427, ctc_loss=0.3384, cr_loss=0.4433, over 16686.00 frames. ], tot_loss[loss=0.427, ctc_loss=0.3384, cr_loss=0.4433, over 16686.00 frames. ], batch size: 61, lr: 4.16e-02, grad_scale: 32.0 2024-09-22 12:48:02,950 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 12:48:18,079 INFO [train.py:1230] (0/4) Epoch 2, validation: loss=0.1169, ctc_loss=0.1169, cr_loss=1.034e-14, over 944034.00 frames. 2024-09-22 12:48:18,079 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 12:48:30,953 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.378e+02 1.944e+02 2.365e+02 3.007e+02 5.794e+02, threshold=4.731e+02, percent-clipped=1.0 2024-09-22 12:48:40,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18228.0, ans=0.11771999999999999 2024-09-22 12:49:02,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=21.206 2024-09-22 12:49:33,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=18368.0, ans=0.07 2024-09-22 12:49:38,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=18414.666666666668, ans=0.125 2024-09-22 12:49:39,680 INFO [train.py:1198] (0/4) Epoch 2, batch 50, loss[loss=0.3412, ctc_loss=0.2676, cr_loss=0.3679, over 17301.00 frames. ], tot_loss[loss=0.3815, ctc_loss=0.2964, cr_loss=0.4255, over 761320.24 frames. ], batch size: 46, lr: 4.15e-02, grad_scale: 32.0 2024-09-22 12:49:51,266 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:49:51,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=18414.666666666668, ans=0.0 2024-09-22 12:49:51,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.36 vs. limit=14.207333333333334 2024-09-22 12:49:54,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=18461.333333333332, ans=0.0 2024-09-22 12:50:02,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=5.7692 2024-09-22 12:50:03,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=18461.333333333332, ans=0.11538666666666669 2024-09-22 12:50:07,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=18461.333333333332, ans=0.125 2024-09-22 12:50:21,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18508.0, ans=0.11492 2024-09-22 12:50:39,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=18554.666666666668, ans=0.125 2024-09-22 12:50:46,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=18601.333333333332, ans=0.006825797101449276 2024-09-22 12:50:56,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=18601.333333333332, ans=0.125 2024-09-22 12:50:58,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=18648.0, ans=0.125 2024-09-22 12:50:59,412 INFO [train.py:1198] (0/4) Epoch 2, batch 100, loss[loss=0.389, ctc_loss=0.2998, cr_loss=0.4458, over 17147.00 frames. ], tot_loss[loss=0.3765, ctc_loss=0.2918, cr_loss=0.4232, over 1337849.10 frames. ], batch size: 48, lr: 4.15e-02, grad_scale: 32.0 2024-09-22 12:51:06,367 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-4000.pt 2024-09-22 12:51:19,088 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.460e+02 1.796e+02 2.128e+02 2.839e+02 5.119e+02, threshold=4.256e+02, percent-clipped=1.0 2024-09-22 12:51:32,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=18694.666666666668, ans=0.125 2024-09-22 12:51:56,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=18788.0, ans=0.125 2024-09-22 12:51:59,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=18788.0, ans=0.125 2024-09-22 12:52:26,282 INFO [train.py:1198] (0/4) Epoch 2, batch 150, loss[loss=0.4089, ctc_loss=0.3182, cr_loss=0.4535, over 17008.00 frames. ], tot_loss[loss=0.378, ctc_loss=0.2929, cr_loss=0.4252, over 1784428.49 frames. ], batch size: 53, lr: 4.14e-02, grad_scale: 32.0 2024-09-22 12:52:29,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=18881.333333333332, ans=0.125 2024-09-22 12:52:40,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=18881.333333333332, ans=0.125 2024-09-22 12:52:41,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=18881.333333333332, ans=0.23915333333333344 2024-09-22 12:53:01,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18974.666666666668, ans=0.11025333333333331 2024-09-22 12:53:51,162 INFO [train.py:1198] (0/4) Epoch 2, batch 200, loss[loss=0.354, ctc_loss=0.2747, cr_loss=0.3967, over 17237.00 frames. ], tot_loss[loss=0.3776, ctc_loss=0.2927, cr_loss=0.4242, over 2129470.37 frames. ], batch size: 44, lr: 4.13e-02, grad_scale: 32.0 2024-09-22 12:54:00,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=19114.666666666668, ans=0.006714202898550724 2024-09-22 12:54:03,746 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.386e+02 1.786e+02 2.214e+02 2.969e+02 6.338e+02, threshold=4.427e+02, percent-clipped=7.0 2024-09-22 12:54:40,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=19254.666666666668, ans=0.04949747468305833 2024-09-22 12:54:46,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=19254.666666666668, ans=0.125 2024-09-22 12:54:48,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=19254.666666666668, ans=0.125 2024-09-22 12:55:10,188 INFO [train.py:1198] (0/4) Epoch 2, batch 250, loss[loss=0.4016, ctc_loss=0.3184, cr_loss=0.416, over 15897.00 frames. ], tot_loss[loss=0.3782, ctc_loss=0.293, cr_loss=0.4263, over 2409395.43 frames. ], batch size: 74, lr: 4.13e-02, grad_scale: 32.0 2024-09-22 12:55:16,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=19348.0, ans=0.125 2024-09-22 12:55:43,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=19441.333333333332, ans=0.0 2024-09-22 12:56:02,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=19488.0, ans=0.00663304347826087 2024-09-22 12:56:30,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=19534.666666666668, ans=0.025 2024-09-22 12:56:35,318 INFO [train.py:1198] (0/4) Epoch 2, batch 300, loss[loss=0.3782, ctc_loss=0.2955, cr_loss=0.4133, over 16523.00 frames. ], tot_loss[loss=0.377, ctc_loss=0.2918, cr_loss=0.426, over 2621368.22 frames. ], batch size: 66, lr: 4.12e-02, grad_scale: 32.0 2024-09-22 12:56:48,374 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.373e+02 1.759e+02 2.124e+02 2.853e+02 4.892e+02, threshold=4.248e+02, percent-clipped=3.0 2024-09-22 12:56:51,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.46 vs. limit=22.221 2024-09-22 12:56:58,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=14.8605 2024-09-22 12:57:15,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=14.878 2024-09-22 12:57:31,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=11.888533333333333 2024-09-22 12:57:42,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=14.913 2024-09-22 12:57:50,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=19768.0, ans=0.0 2024-09-22 12:57:57,759 INFO [train.py:1198] (0/4) Epoch 2, batch 350, loss[loss=0.3625, ctc_loss=0.2772, cr_loss=0.4266, over 17237.00 frames. ], tot_loss[loss=0.3769, ctc_loss=0.2918, cr_loss=0.4254, over 2776746.59 frames. ], batch size: 50, lr: 4.12e-02, grad_scale: 32.0 2024-09-22 12:58:04,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=14.9305 2024-09-22 12:58:15,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=19861.333333333332, ans=0.125 2024-09-22 12:58:22,024 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-22 12:58:23,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=19861.333333333332, ans=0.125 2024-09-22 12:59:04,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=20001.333333333332, ans=0.0 2024-09-22 12:59:04,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=20001.333333333332, ans=0.2 2024-09-22 12:59:17,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=20001.333333333332, ans=0.125 2024-09-22 12:59:20,248 INFO [train.py:1198] (0/4) Epoch 2, batch 400, loss[loss=0.3353, ctc_loss=0.2598, cr_loss=0.3777, over 17080.00 frames. ], tot_loss[loss=0.374, ctc_loss=0.2892, cr_loss=0.4241, over 2906252.78 frames. ], batch size: 43, lr: 4.11e-02, grad_scale: 32.0 2024-09-22 12:59:21,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.96 vs. limit=15.0 2024-09-22 12:59:24,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-09-22 12:59:27,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=20048.0, ans=0.0 2024-09-22 12:59:27,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=20048.0, ans=0.125 2024-09-22 12:59:32,899 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.308e+02 1.825e+02 2.194e+02 2.995e+02 5.365e+02, threshold=4.388e+02, percent-clipped=4.0 2024-09-22 12:59:37,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20094.666666666668, ans=0.1 2024-09-22 12:59:42,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=20094.666666666668, ans=0.025 2024-09-22 12:59:56,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=20141.333333333332, ans=0.125 2024-09-22 13:00:03,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=20141.333333333332, ans=0.125 2024-09-22 13:00:04,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=20141.333333333332, ans=0.125 2024-09-22 13:00:20,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=20188.0, ans=0.006480869565217392 2024-09-22 13:00:39,703 INFO [train.py:1198] (0/4) Epoch 2, batch 450, loss[loss=0.3346, ctc_loss=0.2537, cr_loss=0.4046, over 17022.00 frames. ], tot_loss[loss=0.3709, ctc_loss=0.2866, cr_loss=0.4216, over 3004629.07 frames. ], batch size: 44, lr: 4.10e-02, grad_scale: 32.0 2024-09-22 13:00:41,700 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:00:43,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=20281.333333333332, ans=0.006460579710144928 2024-09-22 13:01:04,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=15.0 2024-09-22 13:01:09,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20328.0, ans=0.1 2024-09-22 13:01:16,016 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:01:20,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=20374.666666666668, ans=0.0 2024-09-22 13:01:34,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=20421.333333333332, ans=0.025 2024-09-22 13:01:49,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2024-09-22 13:01:54,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=20468.0, ans=0.00642 2024-09-22 13:02:04,848 INFO [train.py:1198] (0/4) Epoch 2, batch 500, loss[loss=0.3706, ctc_loss=0.2829, cr_loss=0.4383, over 17295.00 frames. ], tot_loss[loss=0.3723, ctc_loss=0.2876, cr_loss=0.4236, over 3090722.90 frames. ], batch size: 51, lr: 4.10e-02, grad_scale: 32.0 2024-09-22 13:02:17,797 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.414e+02 1.804e+02 2.205e+02 2.881e+02 5.655e+02, threshold=4.410e+02, percent-clipped=3.0 2024-09-22 13:02:21,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=20561.333333333332, ans=0.0 2024-09-22 13:02:52,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=20608.0, ans=0.125 2024-09-22 13:03:09,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=20701.333333333332, ans=0.0 2024-09-22 13:03:29,950 INFO [train.py:1198] (0/4) Epoch 2, batch 550, loss[loss=0.3836, ctc_loss=0.2955, cr_loss=0.4402, over 17320.00 frames. ], tot_loss[loss=0.3724, ctc_loss=0.2875, cr_loss=0.4243, over 3161759.17 frames. ], batch size: 51, lr: 4.09e-02, grad_scale: 32.0 2024-09-22 13:03:42,113 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.05 vs. limit=10.0 2024-09-22 13:03:42,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=20748.0, ans=0.125 2024-09-22 13:04:06,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=20841.333333333332, ans=0.0 2024-09-22 13:04:24,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=20888.0, ans=0.125 2024-09-22 13:04:24,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=20888.0, ans=0.125 2024-09-22 13:04:37,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=20934.666666666668, ans=0.0 2024-09-22 13:04:46,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=20934.666666666668, ans=0.125 2024-09-22 13:04:49,742 INFO [train.py:1198] (0/4) Epoch 2, batch 600, loss[loss=0.3676, ctc_loss=0.2887, cr_loss=0.3946, over 17103.00 frames. ], tot_loss[loss=0.3733, ctc_loss=0.2884, cr_loss=0.4242, over 3200973.34 frames. ], batch size: 49, lr: 4.09e-02, grad_scale: 32.0 2024-09-22 13:04:50,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=20981.333333333332, ans=0.0 2024-09-22 13:04:53,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=20981.333333333332, ans=0.125 2024-09-22 13:05:02,738 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.397e+02 1.714e+02 2.171e+02 2.480e+02 4.403e+02, threshold=4.342e+02, percent-clipped=0.0 2024-09-22 13:05:03,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=12.0 2024-09-22 13:05:06,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=21028.0, ans=0.2 2024-09-22 13:05:10,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=21028.0, ans=0.2 2024-09-22 13:05:28,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=21074.666666666668, ans=0.006288115942028986 2024-09-22 13:05:30,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2024-09-22 13:05:41,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21121.333333333332, ans=0.0 2024-09-22 13:06:08,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=21168.0, ans=0.006267826086956522 2024-09-22 13:06:15,352 INFO [train.py:1198] (0/4) Epoch 2, batch 650, loss[loss=0.4206, ctc_loss=0.3234, cr_loss=0.4857, over 16786.00 frames. ], tot_loss[loss=0.3745, ctc_loss=0.2893, cr_loss=0.426, over 3234458.76 frames. ], batch size: 61, lr: 4.08e-02, grad_scale: 32.0 2024-09-22 13:07:01,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=15.0 2024-09-22 13:07:09,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=21354.666666666668, ans=0.125 2024-09-22 13:07:16,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=21354.666666666668, ans=0.2 2024-09-22 13:07:37,762 INFO [train.py:1198] (0/4) Epoch 2, batch 700, loss[loss=0.3596, ctc_loss=0.2737, cr_loss=0.4298, over 17224.00 frames. ], tot_loss[loss=0.3717, ctc_loss=0.2868, cr_loss=0.4244, over 3265288.07 frames. ], batch size: 55, lr: 4.07e-02, grad_scale: 32.0 2024-09-22 13:07:50,782 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.427e+02 1.762e+02 2.307e+02 2.806e+02 6.388e+02, threshold=4.614e+02, percent-clipped=12.0 2024-09-22 13:08:11,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=21541.333333333332, ans=0.125 2024-09-22 13:08:13,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21541.333333333332, ans=0.1 2024-09-22 13:08:16,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=21541.333333333332, ans=0.125 2024-09-22 13:08:19,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=21541.333333333332, ans=0.125 2024-09-22 13:08:59,998 INFO [train.py:1198] (0/4) Epoch 2, batch 750, loss[loss=0.2932, ctc_loss=0.2213, cr_loss=0.3591, over 17120.00 frames. ], tot_loss[loss=0.3722, ctc_loss=0.2872, cr_loss=0.4254, over 3283144.49 frames. ], batch size: 40, lr: 4.07e-02, grad_scale: 32.0 2024-09-22 13:09:27,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=21728.0, ans=0.125 2024-09-22 13:09:30,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=21774.666666666668, ans=0.0 2024-09-22 13:09:47,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=21821.333333333332, ans=0.125 2024-09-22 13:10:11,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21868.0, ans=0.1 2024-09-22 13:10:18,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=21914.666666666668, ans=0.125 2024-09-22 13:10:19,247 INFO [train.py:1198] (0/4) Epoch 2, batch 800, loss[loss=0.4793, ctc_loss=0.3895, cr_loss=0.4492, over 12081.00 frames. ], tot_loss[loss=0.3726, ctc_loss=0.2875, cr_loss=0.4253, over 3290814.09 frames. ], batch size: 124, lr: 4.06e-02, grad_scale: 32.0 2024-09-22 13:10:25,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=21914.666666666668, ans=0.125 2024-09-22 13:10:32,122 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.379e+02 1.875e+02 2.284e+02 2.792e+02 4.521e+02, threshold=4.569e+02, percent-clipped=0.0 2024-09-22 13:10:46,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=21961.333333333332, ans=0.125 2024-09-22 13:11:00,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=22008.0, ans=0.125 2024-09-22 13:11:05,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=22.5 2024-09-22 13:11:42,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=22148.0, ans=0.2 2024-09-22 13:11:44,000 INFO [train.py:1198] (0/4) Epoch 2, batch 850, loss[loss=0.3059, ctc_loss=0.2301, cr_loss=0.379, over 17226.00 frames. ], tot_loss[loss=0.3717, ctc_loss=0.2867, cr_loss=0.4251, over 3314486.71 frames. ], batch size: 41, lr: 4.06e-02, grad_scale: 32.0 2024-09-22 13:11:58,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=22194.666666666668, ans=0.0 2024-09-22 13:12:14,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=22241.333333333332, ans=0.05 2024-09-22 13:12:37,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=22288.0, ans=0.0 2024-09-22 13:12:41,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2024-09-22 13:12:44,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=12.0 2024-09-22 13:13:05,749 INFO [train.py:1198] (0/4) Epoch 2, batch 900, loss[loss=0.3717, ctc_loss=0.2903, cr_loss=0.4071, over 17341.00 frames. ], tot_loss[loss=0.3708, ctc_loss=0.2859, cr_loss=0.4244, over 3320219.67 frames. ], batch size: 48, lr: 4.05e-02, grad_scale: 32.0 2024-09-22 13:13:21,507 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.391e+02 1.677e+02 2.003e+02 2.906e+02 6.404e+02, threshold=4.006e+02, percent-clipped=3.0 2024-09-22 13:13:21,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=22381.333333333332, ans=0.125 2024-09-22 13:13:24,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=22428.0, ans=0.1 2024-09-22 13:14:28,988 INFO [train.py:1198] (0/4) Epoch 2, batch 950, loss[loss=0.3529, ctc_loss=0.2737, cr_loss=0.396, over 16984.00 frames. ], tot_loss[loss=0.3694, ctc_loss=0.2848, cr_loss=0.423, over 3331745.08 frames. ], batch size: 39, lr: 4.04e-02, grad_scale: 64.0 2024-09-22 13:15:09,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=15.0 2024-09-22 13:15:13,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=22708.0, ans=0.0 2024-09-22 13:15:35,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=22801.333333333332, ans=0.125 2024-09-22 13:15:47,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=22848.0, ans=0.09899494936611666 2024-09-22 13:15:48,456 INFO [train.py:1198] (0/4) Epoch 2, batch 1000, loss[loss=0.3651, ctc_loss=0.2758, cr_loss=0.4467, over 16944.00 frames. ], tot_loss[loss=0.3691, ctc_loss=0.2842, cr_loss=0.4244, over 3346517.11 frames. ], batch size: 42, lr: 4.04e-02, grad_scale: 64.0 2024-09-22 13:16:03,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=22848.0, ans=0.05 2024-09-22 13:16:07,762 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.375e+02 1.793e+02 2.170e+02 2.654e+02 4.100e+02, threshold=4.339e+02, percent-clipped=2.0 2024-09-22 13:16:52,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=22988.0, ans=0.0 2024-09-22 13:17:15,731 INFO [train.py:1198] (0/4) Epoch 2, batch 1050, loss[loss=0.3492, ctc_loss=0.2652, cr_loss=0.4199, over 16990.00 frames. ], tot_loss[loss=0.3701, ctc_loss=0.2851, cr_loss=0.4249, over 3354023.20 frames. ], batch size: 56, lr: 4.03e-02, grad_scale: 32.0 2024-09-22 13:17:20,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=23081.333333333332, ans=0.005851884057971015 2024-09-22 13:17:32,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=23128.0, ans=0.07 2024-09-22 13:17:58,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-09-22 13:18:12,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=23221.333333333332, ans=0.125 2024-09-22 13:18:38,530 INFO [train.py:1198] (0/4) Epoch 2, batch 1100, loss[loss=0.349, ctc_loss=0.266, cr_loss=0.4153, over 17117.00 frames. ], tot_loss[loss=0.369, ctc_loss=0.2842, cr_loss=0.4243, over 3358761.24 frames. ], batch size: 40, lr: 4.03e-02, grad_scale: 32.0 2024-09-22 13:18:42,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=23314.666666666668, ans=0.035 2024-09-22 13:18:42,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=23314.666666666668, ans=0.125 2024-09-22 13:18:49,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.32 vs. limit=22.5 2024-09-22 13:18:53,229 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.326e+02 1.707e+02 2.038e+02 2.702e+02 5.370e+02, threshold=4.076e+02, percent-clipped=2.0 2024-09-22 13:19:14,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=23408.0, ans=0.125 2024-09-22 13:19:38,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=23454.666666666668, ans=0.2 2024-09-22 13:19:58,507 INFO [train.py:1198] (0/4) Epoch 2, batch 1150, loss[loss=0.3181, ctc_loss=0.237, cr_loss=0.4051, over 17115.00 frames. ], tot_loss[loss=0.3671, ctc_loss=0.2824, cr_loss=0.4236, over 3364595.84 frames. ], batch size: 40, lr: 4.02e-02, grad_scale: 32.0 2024-09-22 13:19:59,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.40 vs. limit=10.0 2024-09-22 13:20:14,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=23594.666666666668, ans=0.125 2024-09-22 13:20:22,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=23594.666666666668, ans=0.005740289855072463 2024-09-22 13:20:25,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=23594.666666666668, ans=0.125 2024-09-22 13:20:46,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=23688.0, ans=0.125 2024-09-22 13:21:19,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23734.666666666668, ans=0.1 2024-09-22 13:21:19,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2024-09-22 13:21:23,361 INFO [train.py:1198] (0/4) Epoch 2, batch 1200, loss[loss=0.3564, ctc_loss=0.2749, cr_loss=0.4076, over 17191.00 frames. ], tot_loss[loss=0.3668, ctc_loss=0.282, cr_loss=0.4237, over 3363804.47 frames. ], batch size: 55, lr: 4.01e-02, grad_scale: 32.0 2024-09-22 13:21:24,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-22 13:21:37,785 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.332e+02 1.890e+02 2.251e+02 2.751e+02 4.727e+02, threshold=4.502e+02, percent-clipped=4.0 2024-09-22 13:22:14,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=23921.333333333332, ans=0.125 2024-09-22 13:22:16,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=23921.333333333332, ans=0.2 2024-09-22 13:22:22,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=23921.333333333332, ans=0.05 2024-09-22 13:22:35,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=23968.0, ans=0.125 2024-09-22 13:22:38,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=23968.0, ans=0.125 2024-09-22 13:22:46,211 INFO [train.py:1198] (0/4) Epoch 2, batch 1250, loss[loss=0.3814, ctc_loss=0.2932, cr_loss=0.4406, over 17003.00 frames. ], tot_loss[loss=0.3665, ctc_loss=0.2818, cr_loss=0.4238, over 3366417.46 frames. ], batch size: 53, lr: 4.01e-02, grad_scale: 32.0 2024-09-22 13:22:48,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=24014.666666666668, ans=0.125 2024-09-22 13:23:14,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=24061.333333333332, ans=0.05 2024-09-22 13:23:26,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=24108.0, ans=0.0 2024-09-22 13:23:27,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=24108.0, ans=0.125 2024-09-22 13:23:45,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=24154.666666666668, ans=0.005618550724637681 2024-09-22 13:24:08,757 INFO [train.py:1198] (0/4) Epoch 2, batch 1300, loss[loss=0.3955, ctc_loss=0.3092, cr_loss=0.4316, over 16807.00 frames. ], tot_loss[loss=0.3669, ctc_loss=0.2821, cr_loss=0.4242, over 3357198.89 frames. ], batch size: 61, lr: 4.00e-02, grad_scale: 32.0 2024-09-22 13:24:12,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=24248.0, ans=0.0 2024-09-22 13:24:23,423 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.394e+02 1.885e+02 2.250e+02 3.074e+02 5.750e+02, threshold=4.500e+02, percent-clipped=7.0 2024-09-22 13:24:27,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=24294.666666666668, ans=10.0 2024-09-22 13:24:30,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=24294.666666666668, ans=0.0055881159420289855 2024-09-22 13:24:34,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=24294.666666666668, ans=0.125 2024-09-22 13:24:35,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=24294.666666666668, ans=0.0055881159420289855 2024-09-22 13:24:40,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.95 vs. limit=22.5 2024-09-22 13:24:52,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=24341.333333333332, ans=0.125 2024-09-22 13:24:54,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=24341.333333333332, ans=0.125 2024-09-22 13:24:58,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=24388.0, ans=10.0 2024-09-22 13:25:28,593 INFO [train.py:1198] (0/4) Epoch 2, batch 1350, loss[loss=0.4019, ctc_loss=0.3089, cr_loss=0.4652, over 16706.00 frames. ], tot_loss[loss=0.3657, ctc_loss=0.2809, cr_loss=0.424, over 3354190.22 frames. ], batch size: 61, lr: 3.99e-02, grad_scale: 32.0 2024-09-22 13:25:31,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2024-09-22 13:26:27,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=24621.333333333332, ans=0.0 2024-09-22 13:26:54,074 INFO [train.py:1198] (0/4) Epoch 2, batch 1400, loss[loss=0.3772, ctc_loss=0.288, cr_loss=0.4463, over 16526.00 frames. ], tot_loss[loss=0.3658, ctc_loss=0.2809, cr_loss=0.4241, over 3349029.50 frames. ], batch size: 66, lr: 3.99e-02, grad_scale: 32.0 2024-09-22 13:26:59,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=24714.666666666668, ans=0.2 2024-09-22 13:27:00,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=24714.666666666668, ans=0.0 2024-09-22 13:27:11,160 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.488e+02 2.006e+02 2.479e+02 3.166e+02 4.715e+02, threshold=4.958e+02, percent-clipped=3.0 2024-09-22 13:27:18,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=24761.333333333332, ans=0.09899494936611666 2024-09-22 13:27:25,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=22.5 2024-09-22 13:27:40,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24808.0, ans=0.1 2024-09-22 13:28:05,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=24901.333333333332, ans=0.0 2024-09-22 13:28:10,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2024-09-22 13:28:19,234 INFO [train.py:1198] (0/4) Epoch 2, batch 1450, loss[loss=0.3994, ctc_loss=0.3072, cr_loss=0.4612, over 17044.00 frames. ], tot_loss[loss=0.3635, ctc_loss=0.2789, cr_loss=0.4228, over 3354055.88 frames. ], batch size: 52, lr: 3.98e-02, grad_scale: 32.0 2024-09-22 13:28:24,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=24948.0, ans=0.0 2024-09-22 13:28:30,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=24948.0, ans=0.125 2024-09-22 13:28:59,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=25041.333333333332, ans=0.07 2024-09-22 13:29:01,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-09-22 13:29:22,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.96 vs. limit=10.0 2024-09-22 13:29:31,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=25134.666666666668, ans=0.2 2024-09-22 13:29:36,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.88 vs. limit=15.0 2024-09-22 13:29:38,895 INFO [train.py:1198] (0/4) Epoch 2, batch 1500, loss[loss=0.379, ctc_loss=0.2912, cr_loss=0.4388, over 17297.00 frames. ], tot_loss[loss=0.3647, ctc_loss=0.2799, cr_loss=0.4241, over 3352243.54 frames. ], batch size: 51, lr: 3.98e-02, grad_scale: 32.0 2024-09-22 13:29:40,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=25181.333333333332, ans=0.0 2024-09-22 13:29:47,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=25181.333333333332, ans=0.025 2024-09-22 13:29:53,536 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.354e+02 1.701e+02 2.138e+02 2.969e+02 5.127e+02, threshold=4.276e+02, percent-clipped=1.0 2024-09-22 13:29:53,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=25228.0, ans=0.0 2024-09-22 13:30:14,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=25274.666666666668, ans=0.0 2024-09-22 13:30:31,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=25321.333333333332, ans=0.0 2024-09-22 13:30:32,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=25321.333333333332, ans=0.125 2024-09-22 13:30:41,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=25368.0, ans=0.025 2024-09-22 13:31:00,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.00 vs. limit=10.0 2024-09-22 13:31:02,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=25414.666666666668, ans=10.0 2024-09-22 13:31:03,515 INFO [train.py:1198] (0/4) Epoch 2, batch 1550, loss[loss=0.4092, ctc_loss=0.314, cr_loss=0.4761, over 17216.00 frames. ], tot_loss[loss=0.3644, ctc_loss=0.2795, cr_loss=0.4248, over 3357092.69 frames. ], batch size: 55, lr: 3.97e-02, grad_scale: 32.0 2024-09-22 13:31:03,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=25414.666666666668, ans=0.2 2024-09-22 13:31:10,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=25414.666666666668, ans=0.125 2024-09-22 13:31:15,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25414.666666666668, ans=0.1 2024-09-22 13:31:17,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.58 vs. limit=10.0 2024-09-22 13:32:06,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2024-09-22 13:32:16,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=25601.333333333332, ans=0.0 2024-09-22 13:32:25,137 INFO [train.py:1198] (0/4) Epoch 2, batch 1600, loss[loss=0.3134, ctc_loss=0.2364, cr_loss=0.3847, over 17292.00 frames. ], tot_loss[loss=0.3667, ctc_loss=0.2812, cr_loss=0.4274, over 3360497.09 frames. ], batch size: 42, lr: 3.96e-02, grad_scale: 32.0 2024-09-22 13:32:39,455 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.451e+02 1.757e+02 1.985e+02 2.306e+02 3.282e+02, threshold=3.970e+02, percent-clipped=0.0 2024-09-22 13:32:41,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=25694.666666666668, ans=0.125 2024-09-22 13:32:42,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=25694.666666666668, ans=0.05 2024-09-22 13:32:55,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=25741.333333333332, ans=0.125 2024-09-22 13:33:47,218 INFO [train.py:1198] (0/4) Epoch 2, batch 1650, loss[loss=0.4145, ctc_loss=0.3158, cr_loss=0.4939, over 17056.00 frames. ], tot_loss[loss=0.3663, ctc_loss=0.281, cr_loss=0.4266, over 3350017.52 frames. ], batch size: 52, lr: 3.96e-02, grad_scale: 32.0 2024-09-22 13:34:13,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=25928.0, ans=0.125 2024-09-22 13:34:18,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=25974.666666666668, ans=0.125 2024-09-22 13:34:42,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=26021.333333333332, ans=0.05 2024-09-22 13:34:44,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=26021.333333333332, ans=0.125 2024-09-22 13:35:01,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=26068.0, ans=0.125 2024-09-22 13:35:07,609 INFO [train.py:1198] (0/4) Epoch 2, batch 1700, loss[loss=0.4955, ctc_loss=0.407, cr_loss=0.4425, over 11263.00 frames. ], tot_loss[loss=0.366, ctc_loss=0.2808, cr_loss=0.426, over 3354317.31 frames. ], batch size: 123, lr: 3.95e-02, grad_scale: 32.0 2024-09-22 13:35:08,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=26114.666666666668, ans=0.125 2024-09-22 13:35:14,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=26114.666666666668, ans=0.025 2024-09-22 13:35:22,186 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.426e+02 1.761e+02 2.144e+02 2.600e+02 4.141e+02, threshold=4.288e+02, percent-clipped=2.0 2024-09-22 13:35:51,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=26208.0, ans=0.125 2024-09-22 13:36:33,105 INFO [train.py:1198] (0/4) Epoch 2, batch 1750, loss[loss=0.3175, ctc_loss=0.2427, cr_loss=0.3741, over 17078.00 frames. ], tot_loss[loss=0.365, ctc_loss=0.2796, cr_loss=0.4265, over 3363357.23 frames. ], batch size: 39, lr: 3.94e-02, grad_scale: 32.0 2024-09-22 13:37:14,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=26441.333333333332, ans=0.125 2024-09-22 13:37:15,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=26441.333333333332, ans=0.125 2024-09-22 13:37:25,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2024-09-22 13:37:57,681 INFO [train.py:1198] (0/4) Epoch 2, batch 1800, loss[loss=0.3366, ctc_loss=0.2579, cr_loss=0.3936, over 17026.00 frames. ], tot_loss[loss=0.3644, ctc_loss=0.2792, cr_loss=0.426, over 3355257.09 frames. ], batch size: 52, lr: 3.94e-02, grad_scale: 32.0 2024-09-22 13:38:11,941 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.435e+02 1.770e+02 2.078e+02 2.651e+02 4.856e+02, threshold=4.156e+02, percent-clipped=2.0 2024-09-22 13:38:21,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=26628.0, ans=0.2 2024-09-22 13:38:31,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=26674.666666666668, ans=0.2 2024-09-22 13:38:34,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=26674.666666666668, ans=0.125 2024-09-22 13:38:50,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=26721.333333333332, ans=0.1 2024-09-22 13:39:16,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=26814.666666666668, ans=0.125 2024-09-22 13:39:17,325 INFO [train.py:1198] (0/4) Epoch 2, batch 1850, loss[loss=0.3807, ctc_loss=0.293, cr_loss=0.4385, over 17019.00 frames. ], tot_loss[loss=0.3645, ctc_loss=0.2794, cr_loss=0.4257, over 3342039.56 frames. ], batch size: 56, lr: 3.93e-02, grad_scale: 32.0 2024-09-22 13:39:24,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=26814.666666666668, ans=0.2 2024-09-22 13:39:27,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=26814.666666666668, ans=0.025 2024-09-22 13:39:33,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=26861.333333333332, ans=0.125 2024-09-22 13:39:35,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=26861.333333333332, ans=0.125 2024-09-22 13:39:35,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=26861.333333333332, ans=0.125 2024-09-22 13:39:46,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=26861.333333333332, ans=0.5 2024-09-22 13:40:00,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=26908.0, ans=0.0 2024-09-22 13:40:07,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=26954.666666666668, ans=0.125 2024-09-22 13:40:11,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=26954.666666666668, ans=0.125 2024-09-22 13:40:31,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=27001.333333333332, ans=0.04949747468305833 2024-09-22 13:40:32,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2024-09-22 13:40:39,969 INFO [train.py:1198] (0/4) Epoch 2, batch 1900, loss[loss=0.366, ctc_loss=0.2814, cr_loss=0.4229, over 17235.00 frames. ], tot_loss[loss=0.3628, ctc_loss=0.2779, cr_loss=0.4244, over 3350699.61 frames. ], batch size: 50, lr: 3.92e-02, grad_scale: 32.0 2024-09-22 13:40:53,063 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:40:57,185 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.334e+02 1.864e+02 2.204e+02 2.855e+02 4.990e+02, threshold=4.407e+02, percent-clipped=2.0 2024-09-22 13:41:07,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=27094.666666666668, ans=0.07 2024-09-22 13:41:11,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.41 vs. limit=15.0 2024-09-22 13:41:19,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-22 13:41:25,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=27141.333333333332, ans=0.125 2024-09-22 13:41:33,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=27188.0, ans=0.125 2024-09-22 13:41:51,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2024-09-22 13:42:05,867 INFO [train.py:1198] (0/4) Epoch 2, batch 1950, loss[loss=0.3403, ctc_loss=0.2529, cr_loss=0.4369, over 17107.00 frames. ], tot_loss[loss=0.3634, ctc_loss=0.2784, cr_loss=0.4249, over 3356949.45 frames. ], batch size: 43, lr: 3.92e-02, grad_scale: 32.0 2024-09-22 13:42:14,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=27281.333333333332, ans=0.125 2024-09-22 13:42:45,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=27374.666666666668, ans=0.125 2024-09-22 13:42:50,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=27374.666666666668, ans=0.004918550724637681 2024-09-22 13:42:57,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=27421.333333333332, ans=0.2 2024-09-22 13:42:59,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=27421.333333333332, ans=0.0 2024-09-22 13:42:59,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=27421.333333333332, ans=0.025 2024-09-22 13:43:19,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.15 vs. limit=10.0 2024-09-22 13:43:27,963 INFO [train.py:1198] (0/4) Epoch 2, batch 2000, loss[loss=0.388, ctc_loss=0.2952, cr_loss=0.4643, over 17093.00 frames. ], tot_loss[loss=0.3627, ctc_loss=0.2776, cr_loss=0.4255, over 3359760.06 frames. ], batch size: 49, lr: 3.91e-02, grad_scale: 32.0 2024-09-22 13:43:33,876 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2024-09-22 13:43:42,255 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.401e+02 1.881e+02 2.197e+02 2.842e+02 5.136e+02, threshold=4.393e+02, percent-clipped=2.0 2024-09-22 13:44:25,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=27654.666666666668, ans=0.0 2024-09-22 13:44:30,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27701.333333333332, ans=0.1 2024-09-22 13:44:34,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.68 vs. limit=6.0 2024-09-22 13:44:47,946 INFO [train.py:1198] (0/4) Epoch 2, batch 2050, loss[loss=0.3154, ctc_loss=0.24, cr_loss=0.3771, over 17173.00 frames. ], tot_loss[loss=0.3625, ctc_loss=0.2774, cr_loss=0.4256, over 3355289.36 frames. ], batch size: 41, lr: 3.91e-02, grad_scale: 32.0 2024-09-22 13:45:02,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=27794.666666666668, ans=0.125 2024-09-22 13:45:48,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=27888.0, ans=0.07 2024-09-22 13:45:53,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.45 vs. limit=22.5 2024-09-22 13:46:05,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=27934.666666666668, ans=0.125 2024-09-22 13:46:13,304 INFO [train.py:1198] (0/4) Epoch 2, batch 2100, loss[loss=0.3365, ctc_loss=0.2504, cr_loss=0.4306, over 17292.00 frames. ], tot_loss[loss=0.3622, ctc_loss=0.277, cr_loss=0.4256, over 3363606.68 frames. ], batch size: 46, lr: 3.90e-02, grad_scale: 32.0 2024-09-22 13:46:27,944 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.428e+02 1.746e+02 2.180e+02 2.623e+02 4.533e+02, threshold=4.360e+02, percent-clipped=2.0 2024-09-22 13:47:20,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=28168.0, ans=0.0 2024-09-22 13:47:36,048 INFO [train.py:1198] (0/4) Epoch 2, batch 2150, loss[loss=0.3455, ctc_loss=0.2622, cr_loss=0.4165, over 16990.00 frames. ], tot_loss[loss=0.3614, ctc_loss=0.2763, cr_loss=0.4254, over 3367126.83 frames. ], batch size: 56, lr: 3.89e-02, grad_scale: 32.0 2024-09-22 13:47:36,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=28214.666666666668, ans=0.125 2024-09-22 13:47:39,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=28214.666666666668, ans=0.0 2024-09-22 13:47:49,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=28214.666666666668, ans=0.125 2024-09-22 13:47:49,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=28214.666666666668, ans=0.0 2024-09-22 13:47:51,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2024-09-22 13:48:15,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=28308.0, ans=0.0 2024-09-22 13:48:23,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=28308.0, ans=0.0 2024-09-22 13:48:23,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=28308.0, ans=0.0047156521739130436 2024-09-22 13:48:38,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=28354.666666666668, ans=0.125 2024-09-22 13:48:57,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=28448.0, ans=0.125 2024-09-22 13:48:58,602 INFO [train.py:1198] (0/4) Epoch 2, batch 2200, loss[loss=0.3788, ctc_loss=0.2905, cr_loss=0.4414, over 17009.00 frames. ], tot_loss[loss=0.359, ctc_loss=0.2743, cr_loss=0.4235, over 3378197.03 frames. ], batch size: 56, lr: 3.89e-02, grad_scale: 32.0 2024-09-22 13:49:01,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=28448.0, ans=0.1 2024-09-22 13:49:03,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=28448.0, ans=0.125 2024-09-22 13:49:12,810 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.723e+02 2.085e+02 2.498e+02 4.255e+02, threshold=4.169e+02, percent-clipped=0.0 2024-09-22 13:49:13,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=28494.666666666668, ans=0.0 2024-09-22 13:49:19,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=28494.666666666668, ans=0.125 2024-09-22 13:49:21,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=28494.666666666668, ans=0.125 2024-09-22 13:49:33,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=28541.333333333332, ans=0.025 2024-09-22 13:50:07,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=28634.666666666668, ans=0.125 2024-09-22 13:50:18,354 INFO [train.py:1198] (0/4) Epoch 2, batch 2250, loss[loss=0.3984, ctc_loss=0.3039, cr_loss=0.4722, over 17013.00 frames. ], tot_loss[loss=0.36, ctc_loss=0.2753, cr_loss=0.4233, over 3358729.45 frames. ], batch size: 56, lr: 3.88e-02, grad_scale: 32.0 2024-09-22 13:50:51,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-09-22 13:51:03,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=28774.666666666668, ans=0.0 2024-09-22 13:51:15,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.47 vs. limit=10.0 2024-09-22 13:51:19,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.52 vs. limit=12.0 2024-09-22 13:51:43,313 INFO [train.py:1198] (0/4) Epoch 2, batch 2300, loss[loss=0.3066, ctc_loss=0.2311, cr_loss=0.3778, over 17208.00 frames. ], tot_loss[loss=0.3587, ctc_loss=0.2742, cr_loss=0.4222, over 3361596.48 frames. ], batch size: 41, lr: 3.87e-02, grad_scale: 32.0 2024-09-22 13:52:00,470 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.412e+02 1.812e+02 2.268e+02 2.899e+02 4.767e+02, threshold=4.537e+02, percent-clipped=4.0 2024-09-22 13:52:07,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=28961.333333333332, ans=0.2 2024-09-22 13:52:22,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-22 13:52:23,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=29008.0, ans=0.125 2024-09-22 13:52:47,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=29054.666666666668, ans=0.004553333333333333 2024-09-22 13:52:50,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=29101.333333333332, ans=0.004543188405797102 2024-09-22 13:53:05,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=29101.333333333332, ans=0.125 2024-09-22 13:53:08,676 INFO [train.py:1198] (0/4) Epoch 2, batch 2350, loss[loss=0.3981, ctc_loss=0.3104, cr_loss=0.4387, over 17139.00 frames. ], tot_loss[loss=0.358, ctc_loss=0.2735, cr_loss=0.4226, over 3367315.37 frames. ], batch size: 48, lr: 3.87e-02, grad_scale: 32.0 2024-09-22 13:53:25,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=29194.666666666668, ans=0.2 2024-09-22 13:53:41,025 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:53:44,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-09-22 13:54:02,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=29288.0, ans=0.004502608695652174 2024-09-22 13:54:04,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=29288.0, ans=0.125 2024-09-22 13:54:27,695 INFO [train.py:1198] (0/4) Epoch 2, batch 2400, loss[loss=0.387, ctc_loss=0.2941, cr_loss=0.4642, over 16889.00 frames. ], tot_loss[loss=0.3597, ctc_loss=0.2749, cr_loss=0.4244, over 3354241.18 frames. ], batch size: 58, lr: 3.86e-02, grad_scale: 32.0 2024-09-22 13:54:41,835 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.409e+02 1.714e+02 1.995e+02 2.694e+02 4.976e+02, threshold=3.990e+02, percent-clipped=1.0 2024-09-22 13:55:15,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=29521.333333333332, ans=0.2 2024-09-22 13:55:31,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2024-09-22 13:55:35,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=29568.0, ans=0.95 2024-09-22 13:55:42,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=29568.0, ans=0.125 2024-09-22 13:55:52,214 INFO [train.py:1198] (0/4) Epoch 2, batch 2450, loss[loss=0.3722, ctc_loss=0.2823, cr_loss=0.4493, over 17195.00 frames. ], tot_loss[loss=0.3604, ctc_loss=0.2753, cr_loss=0.4252, over 3350707.77 frames. ], batch size: 47, lr: 3.86e-02, grad_scale: 32.0 2024-09-22 13:56:02,258 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 13:56:10,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=29661.333333333332, ans=0.0 2024-09-22 13:56:10,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-09-22 13:56:26,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=29708.0, ans=0.95 2024-09-22 13:56:27,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29708.0, ans=0.1 2024-09-22 13:57:10,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=29801.333333333332, ans=0.125 2024-09-22 13:57:11,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=29801.333333333332, ans=0.125 2024-09-22 13:57:14,736 INFO [train.py:1198] (0/4) Epoch 2, batch 2500, loss[loss=0.3475, ctc_loss=0.266, cr_loss=0.4074, over 17236.00 frames. ], tot_loss[loss=0.3601, ctc_loss=0.2751, cr_loss=0.4249, over 3353900.00 frames. ], batch size: 42, lr: 3.85e-02, grad_scale: 32.0 2024-09-22 13:57:24,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=29848.0, ans=0.125 2024-09-22 13:57:29,031 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 1.845e+02 2.309e+02 3.069e+02 4.385e+02, threshold=4.618e+02, percent-clipped=5.0 2024-09-22 13:57:42,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=29894.666666666668, ans=0.2 2024-09-22 13:57:51,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=29941.333333333332, ans=0.0 2024-09-22 13:58:16,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=29988.0, ans=0.125 2024-09-22 13:58:36,498 INFO [train.py:1198] (0/4) Epoch 2, batch 2550, loss[loss=0.3748, ctc_loss=0.2859, cr_loss=0.4448, over 17024.00 frames. ], tot_loss[loss=0.3583, ctc_loss=0.2735, cr_loss=0.4239, over 3365021.74 frames. ], batch size: 52, lr: 3.84e-02, grad_scale: 32.0 2024-09-22 13:59:07,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=30174.666666666668, ans=0.07 2024-09-22 13:59:15,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=30174.666666666668, ans=0.035 2024-09-22 13:59:16,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=30174.666666666668, ans=0.125 2024-09-22 13:59:25,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-22 13:59:34,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=30221.333333333332, ans=0.09899494936611666 2024-09-22 13:59:40,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2024-09-22 13:59:52,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2024-09-22 13:59:56,516 INFO [train.py:1198] (0/4) Epoch 2, batch 2600, loss[loss=0.3743, ctc_loss=0.2892, cr_loss=0.4258, over 17194.00 frames. ], tot_loss[loss=0.3607, ctc_loss=0.2756, cr_loss=0.4254, over 3354924.30 frames. ], batch size: 55, lr: 3.84e-02, grad_scale: 32.0 2024-09-22 14:00:11,074 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.285e+02 1.903e+02 2.207e+02 2.848e+02 4.508e+02, threshold=4.414e+02, percent-clipped=0.0 2024-09-22 14:00:34,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=30408.0, ans=0.1 2024-09-22 14:00:50,919 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2024-09-22 14:01:21,838 INFO [train.py:1198] (0/4) Epoch 2, batch 2650, loss[loss=0.3832, ctc_loss=0.2946, cr_loss=0.4434, over 17211.00 frames. ], tot_loss[loss=0.3611, ctc_loss=0.2759, cr_loss=0.4255, over 3350151.10 frames. ], batch size: 55, lr: 3.83e-02, grad_scale: 32.0 2024-09-22 14:01:31,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=30548.0, ans=0.125 2024-09-22 14:01:33,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=30548.0, ans=0.04949747468305833 2024-09-22 14:01:34,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=30548.0, ans=0.07 2024-09-22 14:01:51,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30594.666666666668, ans=0.1 2024-09-22 14:01:55,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=15.0 2024-09-22 14:02:06,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=30641.333333333332, ans=0.0 2024-09-22 14:02:15,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=30688.0, ans=0.025 2024-09-22 14:02:18,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=30688.0, ans=0.125 2024-09-22 14:02:35,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2024-09-22 14:02:46,376 INFO [train.py:1198] (0/4) Epoch 2, batch 2700, loss[loss=0.3174, ctc_loss=0.2395, cr_loss=0.3898, over 17138.00 frames. ], tot_loss[loss=0.36, ctc_loss=0.275, cr_loss=0.4249, over 3353220.30 frames. ], batch size: 40, lr: 3.82e-02, grad_scale: 32.0 2024-09-22 14:03:00,820 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.428e+02 1.814e+02 2.097e+02 2.446e+02 4.164e+02, threshold=4.194e+02, percent-clipped=0.0 2024-09-22 14:03:07,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=30828.0, ans=0.125 2024-09-22 14:03:10,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30828.0, ans=0.1 2024-09-22 14:03:13,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=30828.0, ans=0.004167826086956521 2024-09-22 14:03:14,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.82 vs. limit=22.5 2024-09-22 14:03:26,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=30874.666666666668, ans=0.015 2024-09-22 14:03:28,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30874.666666666668, ans=0.125 2024-09-22 14:03:31,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.07 vs. limit=22.5 2024-09-22 14:03:39,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=30921.333333333332, ans=0.125 2024-09-22 14:03:52,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30968.0, ans=0.1 2024-09-22 14:03:52,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=22.5 2024-09-22 14:04:03,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=30968.0, ans=0.125 2024-09-22 14:04:05,951 INFO [train.py:1198] (0/4) Epoch 2, batch 2750, loss[loss=0.3829, ctc_loss=0.2945, cr_loss=0.4417, over 17021.00 frames. ], tot_loss[loss=0.3585, ctc_loss=0.2736, cr_loss=0.4246, over 3355900.28 frames. ], batch size: 44, lr: 3.82e-02, grad_scale: 32.0 2024-09-22 14:04:09,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=31014.666666666668, ans=0.125 2024-09-22 14:04:34,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=31061.333333333332, ans=0.125 2024-09-22 14:04:46,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=31108.0, ans=0.125 2024-09-22 14:04:53,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2024-09-22 14:05:02,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=31154.666666666668, ans=0.0 2024-09-22 14:05:04,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=31154.666666666668, ans=0.125 2024-09-22 14:05:08,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=31201.333333333332, ans=0.125 2024-09-22 14:05:12,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=31201.333333333332, ans=0.0 2024-09-22 14:05:13,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=31201.333333333332, ans=0.125 2024-09-22 14:05:31,360 INFO [train.py:1198] (0/4) Epoch 2, batch 2800, loss[loss=0.293, ctc_loss=0.2183, cr_loss=0.3734, over 17135.00 frames. ], tot_loss[loss=0.3563, ctc_loss=0.2716, cr_loss=0.4232, over 3363724.32 frames. ], batch size: 40, lr: 3.81e-02, grad_scale: 32.0 2024-09-22 14:05:34,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=31248.0, ans=0.004076521739130434 2024-09-22 14:05:39,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=31248.0, ans=0.2 2024-09-22 14:05:45,623 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.464e+02 1.818e+02 2.147e+02 2.657e+02 4.230e+02, threshold=4.294e+02, percent-clipped=1.0 2024-09-22 14:05:58,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31294.666666666668, ans=0.0 2024-09-22 14:06:25,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=31388.0, ans=0.05 2024-09-22 14:06:27,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-09-22 14:06:35,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=31434.666666666668, ans=0.125 2024-09-22 14:06:53,504 INFO [train.py:1198] (0/4) Epoch 2, batch 2850, loss[loss=0.4342, ctc_loss=0.3378, cr_loss=0.4821, over 14844.00 frames. ], tot_loss[loss=0.3572, ctc_loss=0.2725, cr_loss=0.4236, over 3362559.53 frames. ], batch size: 89, lr: 3.80e-02, grad_scale: 32.0 2024-09-22 14:08:15,442 INFO [train.py:1198] (0/4) Epoch 2, batch 2900, loss[loss=0.3136, ctc_loss=0.2335, cr_loss=0.4008, over 17069.00 frames. ], tot_loss[loss=0.3558, ctc_loss=0.2713, cr_loss=0.4226, over 3360191.36 frames. ], batch size: 43, lr: 3.80e-02, grad_scale: 32.0 2024-09-22 14:08:21,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=31714.666666666668, ans=0.0039750724637681156 2024-09-22 14:08:29,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2024-09-22 14:08:29,968 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 1.781e+02 2.101e+02 2.670e+02 4.501e+02, threshold=4.202e+02, percent-clipped=1.0 2024-09-22 14:08:32,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2024-09-22 14:08:35,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2024-09-22 14:08:47,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=31808.0, ans=0.0 2024-09-22 14:09:10,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=15.0 2024-09-22 14:09:18,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=31901.333333333332, ans=0.125 2024-09-22 14:09:18,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=31901.333333333332, ans=6.0 2024-09-22 14:09:35,328 INFO [train.py:1198] (0/4) Epoch 2, batch 2950, loss[loss=0.2815, ctc_loss=0.207, cr_loss=0.3726, over 17018.00 frames. ], tot_loss[loss=0.3576, ctc_loss=0.2728, cr_loss=0.4239, over 3351093.39 frames. ], batch size: 39, lr: 3.79e-02, grad_scale: 32.0 2024-09-22 14:10:02,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=31994.666666666668, ans=0.0039142028985507255 2024-09-22 14:10:59,599 INFO [train.py:1198] (0/4) Epoch 2, batch 3000, loss[loss=0.4157, ctc_loss=0.3188, cr_loss=0.4849, over 14745.00 frames. ], tot_loss[loss=0.3565, ctc_loss=0.2718, cr_loss=0.4237, over 3348605.67 frames. ], batch size: 88, lr: 3.79e-02, grad_scale: 32.0 2024-09-22 14:10:59,600 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 14:11:10,199 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0489, 4.7236, 4.5471, 4.6538], device='cuda:0') 2024-09-22 14:11:15,222 INFO [train.py:1230] (0/4) Epoch 2, validation: loss=0.0967, ctc_loss=0.0967, cr_loss=8.169e-15, over 944034.00 frames. 2024-09-22 14:11:15,223 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 14:11:20,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=32181.333333333332, ans=0.125 2024-09-22 14:11:23,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=32181.333333333332, ans=0.125 2024-09-22 14:11:29,192 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.469e+02 1.814e+02 2.374e+02 2.965e+02 6.190e+02, threshold=4.748e+02, percent-clipped=4.0 2024-09-22 14:11:42,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.53 vs. limit=22.5 2024-09-22 14:11:51,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=32274.666666666668, ans=0.125 2024-09-22 14:12:06,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=32321.333333333332, ans=0.05 2024-09-22 14:12:13,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=32321.333333333332, ans=0.125 2024-09-22 14:12:16,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=32321.333333333332, ans=0.125 2024-09-22 14:12:21,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32368.0, ans=0.1 2024-09-22 14:12:23,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2024-09-22 14:12:35,205 INFO [train.py:1198] (0/4) Epoch 2, batch 3050, loss[loss=0.3943, ctc_loss=0.3073, cr_loss=0.4351, over 16783.00 frames. ], tot_loss[loss=0.3549, ctc_loss=0.2703, cr_loss=0.4227, over 3356537.00 frames. ], batch size: 61, lr: 3.78e-02, grad_scale: 32.0 2024-09-22 14:12:44,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=32414.666666666668, ans=0.125 2024-09-22 14:13:05,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=32508.0, ans=0.125 2024-09-22 14:13:06,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=32508.0, ans=0.2 2024-09-22 14:13:15,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=32508.0, ans=0.0 2024-09-22 14:13:23,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=32554.666666666668, ans=0.0037924637681159418 2024-09-22 14:13:23,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=32554.666666666668, ans=0.2 2024-09-22 14:13:23,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-09-22 14:13:52,647 INFO [train.py:1198] (0/4) Epoch 2, batch 3100, loss[loss=0.3596, ctc_loss=0.277, cr_loss=0.4132, over 17226.00 frames. ], tot_loss[loss=0.3561, ctc_loss=0.2713, cr_loss=0.4238, over 3343780.00 frames. ], batch size: 47, lr: 3.77e-02, grad_scale: 32.0 2024-09-22 14:14:10,889 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.354e+02 1.798e+02 2.252e+02 2.859e+02 4.646e+02, threshold=4.504e+02, percent-clipped=0.0 2024-09-22 14:14:14,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=32694.666666666668, ans=0.0 2024-09-22 14:14:48,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=32788.0, ans=0.02 2024-09-22 14:14:57,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=32834.666666666664, ans=0.125 2024-09-22 14:14:57,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=32834.666666666664, ans=0.04949747468305833 2024-09-22 14:15:09,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=32834.666666666664, ans=0.125 2024-09-22 14:15:12,720 INFO [train.py:1198] (0/4) Epoch 2, batch 3150, loss[loss=0.3541, ctc_loss=0.273, cr_loss=0.4052, over 17227.00 frames. ], tot_loss[loss=0.3554, ctc_loss=0.2706, cr_loss=0.4241, over 3349067.73 frames. ], batch size: 47, lr: 3.77e-02, grad_scale: 32.0 2024-09-22 14:15:20,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=32881.333333333336, ans=0.125 2024-09-22 14:15:23,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=32881.333333333336, ans=0.025 2024-09-22 14:15:43,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-22 14:16:16,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-09-22 14:16:30,721 INFO [train.py:1198] (0/4) Epoch 2, batch 3200, loss[loss=0.3112, ctc_loss=0.2339, cr_loss=0.3865, over 17274.00 frames. ], tot_loss[loss=0.3547, ctc_loss=0.27, cr_loss=0.4233, over 3350445.50 frames. ], batch size: 42, lr: 3.76e-02, grad_scale: 32.0 2024-09-22 14:16:35,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=33114.666666666664, ans=0.125 2024-09-22 14:16:39,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.59 vs. limit=5.0 2024-09-22 14:16:46,457 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.436e+02 1.721e+02 1.992e+02 2.305e+02 3.966e+02, threshold=3.983e+02, percent-clipped=0.0 2024-09-22 14:16:55,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=33161.333333333336, ans=0.003660579710144928 2024-09-22 14:17:30,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=22.5 2024-09-22 14:17:31,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=33301.333333333336, ans=0.0 2024-09-22 14:17:48,523 INFO [train.py:1198] (0/4) Epoch 2, batch 3250, loss[loss=0.3291, ctc_loss=0.2418, cr_loss=0.4361, over 16944.00 frames. ], tot_loss[loss=0.3559, ctc_loss=0.2711, cr_loss=0.4242, over 3345192.02 frames. ], batch size: 42, lr: 3.75e-02, grad_scale: 32.0 2024-09-22 14:18:49,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=33534.666666666664, ans=0.0 2024-09-22 14:19:06,344 INFO [train.py:1198] (0/4) Epoch 2, batch 3300, loss[loss=0.3497, ctc_loss=0.2655, cr_loss=0.421, over 17160.00 frames. ], tot_loss[loss=0.3556, ctc_loss=0.2708, cr_loss=0.4237, over 3354073.87 frames. ], batch size: 48, lr: 3.75e-02, grad_scale: 32.0 2024-09-22 14:19:22,039 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.353e+02 1.852e+02 2.253e+02 3.068e+02 5.078e+02, threshold=4.507e+02, percent-clipped=5.0 2024-09-22 14:19:22,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=33628.0, ans=0.125 2024-09-22 14:19:25,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33628.0, ans=0.1 2024-09-22 14:19:42,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=15.0 2024-09-22 14:19:57,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=33721.333333333336, ans=0.003538840579710144 2024-09-22 14:20:28,447 INFO [train.py:1198] (0/4) Epoch 2, batch 3350, loss[loss=0.353, ctc_loss=0.2695, cr_loss=0.4171, over 16991.00 frames. ], tot_loss[loss=0.3535, ctc_loss=0.269, cr_loss=0.4228, over 3357462.86 frames. ], batch size: 53, lr: 3.74e-02, grad_scale: 32.0 2024-09-22 14:20:28,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=33814.666666666664, ans=0.125 2024-09-22 14:20:29,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2024-09-22 14:20:47,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=33861.333333333336, ans=0.0 2024-09-22 14:20:49,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=22.5 2024-09-22 14:20:55,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=22.5 2024-09-22 14:21:24,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33954.666666666664, ans=0.1 2024-09-22 14:21:26,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=33954.666666666664, ans=0.0 2024-09-22 14:21:32,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-09-22 14:21:35,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=34001.333333333336, ans=0.125 2024-09-22 14:21:45,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=34048.0, ans=0.125 2024-09-22 14:21:46,547 INFO [train.py:1198] (0/4) Epoch 2, batch 3400, loss[loss=0.302, ctc_loss=0.2321, cr_loss=0.3491, over 17180.00 frames. ], tot_loss[loss=0.3552, ctc_loss=0.2704, cr_loss=0.4237, over 3354612.50 frames. ], batch size: 41, lr: 3.74e-02, grad_scale: 32.0 2024-09-22 14:22:02,234 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.402e+02 1.761e+02 2.090e+02 2.503e+02 3.941e+02, threshold=4.179e+02, percent-clipped=0.0 2024-09-22 14:22:07,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=34094.666666666664, ans=0.2 2024-09-22 14:22:41,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=34188.0, ans=0.125 2024-09-22 14:22:49,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-22 14:22:54,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=34234.666666666664, ans=0.5 2024-09-22 14:22:58,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.76 vs. limit=15.0 2024-09-22 14:22:59,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=34234.666666666664, ans=0.003427246376811595 2024-09-22 14:23:07,089 INFO [train.py:1198] (0/4) Epoch 2, batch 3450, loss[loss=0.2884, ctc_loss=0.2163, cr_loss=0.3606, over 17018.00 frames. ], tot_loss[loss=0.3537, ctc_loss=0.2693, cr_loss=0.4223, over 3348855.20 frames. ], batch size: 39, lr: 3.73e-02, grad_scale: 16.0 2024-09-22 14:23:10,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=34281.333333333336, ans=0.0034171014492753614 2024-09-22 14:23:21,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=34328.0, ans=0.125 2024-09-22 14:23:49,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=34374.666666666664, ans=0.125 2024-09-22 14:24:16,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=8.0 2024-09-22 14:24:26,672 INFO [train.py:1198] (0/4) Epoch 2, batch 3500, loss[loss=0.3499, ctc_loss=0.2692, cr_loss=0.4032, over 17115.00 frames. ], tot_loss[loss=0.3523, ctc_loss=0.2681, cr_loss=0.4206, over 3353385.19 frames. ], batch size: 49, lr: 3.72e-02, grad_scale: 16.0 2024-09-22 14:24:44,121 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.468e+02 1.790e+02 2.190e+02 2.610e+02 4.602e+02, threshold=4.381e+02, percent-clipped=3.0 2024-09-22 14:24:44,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=34561.333333333336, ans=0.125 2024-09-22 14:24:58,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=34608.0, ans=0.125 2024-09-22 14:25:01,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=34608.0, ans=0.0 2024-09-22 14:25:02,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=34608.0, ans=0.125 2024-09-22 14:25:09,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=34608.0, ans=0.0 2024-09-22 14:25:37,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2024-09-22 14:25:40,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=34701.333333333336, ans=0.0033257971014492745 2024-09-22 14:25:42,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=34701.333333333336, ans=0.5 2024-09-22 14:25:44,828 INFO [train.py:1198] (0/4) Epoch 2, batch 3550, loss[loss=0.3644, ctc_loss=0.2804, cr_loss=0.42, over 16489.00 frames. ], tot_loss[loss=0.3516, ctc_loss=0.2675, cr_loss=0.4207, over 3357397.60 frames. ], batch size: 66, lr: 3.72e-02, grad_scale: 16.0 2024-09-22 14:26:06,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=34794.666666666664, ans=0.025 2024-09-22 14:26:12,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=12.0 2024-09-22 14:26:17,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=34841.333333333336, ans=0.125 2024-09-22 14:26:26,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=34841.333333333336, ans=0.1 2024-09-22 14:26:30,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=34888.0, ans=0.5 2024-09-22 14:26:41,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=34888.0, ans=0.2 2024-09-22 14:26:44,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=34888.0, ans=0.2 2024-09-22 14:27:02,440 INFO [train.py:1198] (0/4) Epoch 2, batch 3600, loss[loss=0.3464, ctc_loss=0.2652, cr_loss=0.406, over 17359.00 frames. ], tot_loss[loss=0.3501, ctc_loss=0.266, cr_loss=0.4204, over 3365286.70 frames. ], batch size: 48, lr: 3.71e-02, grad_scale: 32.0 2024-09-22 14:27:19,633 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.393e+02 1.714e+02 2.123e+02 2.581e+02 4.355e+02, threshold=4.245e+02, percent-clipped=0.0 2024-09-22 14:27:27,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35028.0, ans=0.1 2024-09-22 14:27:32,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=35074.666666666664, ans=0.1 2024-09-22 14:27:44,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=35074.666666666664, ans=0.125 2024-09-22 14:28:06,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=35168.0, ans=0.125 2024-09-22 14:28:17,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2024-09-22 14:28:20,467 INFO [train.py:1198] (0/4) Epoch 2, batch 3650, loss[loss=0.3142, ctc_loss=0.2401, cr_loss=0.3708, over 16929.00 frames. ], tot_loss[loss=0.3517, ctc_loss=0.2674, cr_loss=0.4215, over 3358698.55 frames. ], batch size: 42, lr: 3.70e-02, grad_scale: 32.0 2024-09-22 14:28:55,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=35308.0, ans=0.0 2024-09-22 14:29:10,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=12.0 2024-09-22 14:29:27,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=35401.333333333336, ans=0.2 2024-09-22 14:29:30,855 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-22 14:29:33,633 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:29:34,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=35401.333333333336, ans=0.0 2024-09-22 14:29:36,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=35401.333333333336, ans=0.0 2024-09-22 14:29:40,842 INFO [train.py:1198] (0/4) Epoch 2, batch 3700, loss[loss=0.341, ctc_loss=0.2615, cr_loss=0.3973, over 17254.00 frames. ], tot_loss[loss=0.3537, ctc_loss=0.269, cr_loss=0.4234, over 3351055.31 frames. ], batch size: 44, lr: 3.70e-02, grad_scale: 16.0 2024-09-22 14:29:59,940 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.354e+02 1.763e+02 2.146e+02 2.787e+02 4.998e+02, threshold=4.291e+02, percent-clipped=2.0 2024-09-22 14:30:50,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=35634.666666666664, ans=0.5 2024-09-22 14:30:59,291 INFO [train.py:1198] (0/4) Epoch 2, batch 3750, loss[loss=0.3584, ctc_loss=0.2708, cr_loss=0.4377, over 17147.00 frames. ], tot_loss[loss=0.3543, ctc_loss=0.2696, cr_loss=0.4236, over 3339118.13 frames. ], batch size: 45, lr: 3.69e-02, grad_scale: 16.0 2024-09-22 14:31:21,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=35728.0, ans=0.0 2024-09-22 14:31:24,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=12.0 2024-09-22 14:31:24,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-09-22 14:31:50,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=35821.333333333336, ans=0.2 2024-09-22 14:32:01,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=35868.0, ans=0.125 2024-09-22 14:32:15,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=35868.0, ans=0.125 2024-09-22 14:32:18,708 INFO [train.py:1198] (0/4) Epoch 2, batch 3800, loss[loss=0.3301, ctc_loss=0.2504, cr_loss=0.3983, over 17310.00 frames. ], tot_loss[loss=0.3554, ctc_loss=0.2707, cr_loss=0.4235, over 3327921.26 frames. ], batch size: 51, lr: 3.69e-02, grad_scale: 16.0 2024-09-22 14:32:31,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=35914.666666666664, ans=0.05 2024-09-22 14:32:35,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35961.333333333336, ans=0.1 2024-09-22 14:32:37,260 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.475e+02 1.782e+02 2.185e+02 2.503e+02 5.708e+02, threshold=4.370e+02, percent-clipped=5.0 2024-09-22 14:32:37,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=35961.333333333336, ans=0.0030518840579710146 2024-09-22 14:32:41,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.99 vs. limit=15.0 2024-09-22 14:32:55,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=36008.0, ans=0.125 2024-09-22 14:33:09,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=36054.666666666664, ans=0.125 2024-09-22 14:33:11,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=36054.666666666664, ans=0.125 2024-09-22 14:33:23,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36101.333333333336, ans=0.1 2024-09-22 14:33:29,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=36101.333333333336, ans=0.125 2024-09-22 14:33:35,545 INFO [train.py:1198] (0/4) Epoch 2, batch 3850, loss[loss=0.3558, ctc_loss=0.2674, cr_loss=0.4417, over 16958.00 frames. ], tot_loss[loss=0.3597, ctc_loss=0.2746, cr_loss=0.4256, over 3288249.12 frames. ], batch size: 42, lr: 3.68e-02, grad_scale: 16.0 2024-09-22 14:34:12,207 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:34:31,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=36288.0, ans=0.125 2024-09-22 14:34:45,229 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-2.pt 2024-09-22 14:35:36,877 INFO [train.py:1198] (0/4) Epoch 3, batch 0, loss[loss=0.3347, ctc_loss=0.2571, cr_loss=0.3879, over 17109.00 frames. ], tot_loss[loss=0.3347, ctc_loss=0.2571, cr_loss=0.3879, over 17109.00 frames. ], batch size: 40, lr: 3.49e-02, grad_scale: 32.0 2024-09-22 14:35:36,878 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 14:35:52,233 INFO [train.py:1230] (0/4) Epoch 3, validation: loss=0.1002, ctc_loss=0.1002, cr_loss=7.948e-15, over 944034.00 frames. 2024-09-22 14:35:52,234 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 14:36:11,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2024-09-22 14:36:15,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=36409.333333333336, ans=0.125 2024-09-22 14:36:20,867 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.466e+02 1.866e+02 2.182e+02 2.689e+02 4.735e+02, threshold=4.364e+02, percent-clipped=1.0 2024-09-22 14:36:23,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-09-22 14:36:26,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=36456.0, ans=0.125 2024-09-22 14:36:36,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.01 vs. limit=22.5 2024-09-22 14:37:15,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=36549.333333333336, ans=0.125 2024-09-22 14:37:17,967 INFO [train.py:1198] (0/4) Epoch 3, batch 50, loss[loss=0.3232, ctc_loss=0.2459, cr_loss=0.3861, over 16395.00 frames. ], tot_loss[loss=0.3454, ctc_loss=0.2622, cr_loss=0.4157, over 765203.28 frames. ], batch size: 36, lr: 3.49e-02, grad_scale: 32.0 2024-09-22 14:37:22,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-09-22 14:37:31,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=36596.0, ans=0.5 2024-09-22 14:37:32,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=36596.0, ans=0.2 2024-09-22 14:37:36,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.10 vs. limit=22.5 2024-09-22 14:37:41,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.94 vs. limit=22.5 2024-09-22 14:38:01,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=36689.333333333336, ans=0.125 2024-09-22 14:38:29,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=36782.666666666664, ans=0.5 2024-09-22 14:38:41,849 INFO [train.py:1198] (0/4) Epoch 3, batch 100, loss[loss=0.3718, ctc_loss=0.2735, cr_loss=0.4911, over 16996.00 frames. ], tot_loss[loss=0.3491, ctc_loss=0.2652, cr_loss=0.4193, over 1323348.64 frames. ], batch size: 56, lr: 3.48e-02, grad_scale: 32.0 2024-09-22 14:39:07,121 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.389e+02 1.767e+02 2.113e+02 2.664e+02 5.595e+02, threshold=4.227e+02, percent-clipped=3.0 2024-09-22 14:39:16,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2024-09-22 14:39:50,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=37016.0, ans=0.125 2024-09-22 14:39:58,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.12 vs. limit=15.0 2024-09-22 14:40:01,181 INFO [train.py:1198] (0/4) Epoch 3, batch 150, loss[loss=0.3486, ctc_loss=0.2618, cr_loss=0.4338, over 16861.00 frames. ], tot_loss[loss=0.3484, ctc_loss=0.2641, cr_loss=0.4215, over 1778688.89 frames. ], batch size: 58, lr: 3.47e-02, grad_scale: 32.0 2024-09-22 14:40:09,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37062.666666666664, ans=0.1 2024-09-22 14:40:14,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=37062.666666666664, ans=0.125 2024-09-22 14:40:24,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.91 vs. limit=22.5 2024-09-22 14:40:50,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=37202.666666666664, ans=0.0 2024-09-22 14:41:15,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=37249.333333333336, ans=0.2 2024-09-22 14:41:17,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=37249.333333333336, ans=0.2 2024-09-22 14:41:23,509 INFO [train.py:1198] (0/4) Epoch 3, batch 200, loss[loss=0.3767, ctc_loss=0.2854, cr_loss=0.4565, over 17238.00 frames. ], tot_loss[loss=0.3474, ctc_loss=0.2628, cr_loss=0.4231, over 2140654.68 frames. ], batch size: 55, lr: 3.47e-02, grad_scale: 32.0 2024-09-22 14:41:27,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=37296.0, ans=0.05 2024-09-22 14:41:34,740 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-8000.pt 2024-09-22 14:41:41,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=37342.666666666664, ans=0.07 2024-09-22 14:41:49,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=37342.666666666664, ans=0.025 2024-09-22 14:41:51,078 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.278e+02 1.679e+02 1.968e+02 2.454e+02 4.058e+02, threshold=3.935e+02, percent-clipped=0.0 2024-09-22 14:42:16,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=37436.0, ans=0.125 2024-09-22 14:42:27,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=37436.0, ans=0.0 2024-09-22 14:42:50,684 INFO [train.py:1198] (0/4) Epoch 3, batch 250, loss[loss=0.3564, ctc_loss=0.2731, cr_loss=0.4164, over 15954.00 frames. ], tot_loss[loss=0.3442, ctc_loss=0.2602, cr_loss=0.42, over 2415477.65 frames. ], batch size: 74, lr: 3.46e-02, grad_scale: 32.0 2024-09-22 14:42:51,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=37529.333333333336, ans=0.002711014492753623 2024-09-22 14:42:52,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=37529.333333333336, ans=0.125 2024-09-22 14:42:54,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=37529.333333333336, ans=0.0 2024-09-22 14:43:13,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=37576.0, ans=0.07 2024-09-22 14:43:36,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=37622.666666666664, ans=0.125 2024-09-22 14:43:36,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37622.666666666664, ans=0.125 2024-09-22 14:43:36,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2024-09-22 14:43:52,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-09-22 14:43:53,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=37669.333333333336, ans=0.125 2024-09-22 14:44:09,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-09-22 14:44:12,444 INFO [train.py:1198] (0/4) Epoch 3, batch 300, loss[loss=0.3252, ctc_loss=0.2446, cr_loss=0.4027, over 17073.00 frames. ], tot_loss[loss=0.3436, ctc_loss=0.2597, cr_loss=0.4197, over 2631748.44 frames. ], batch size: 46, lr: 3.46e-02, grad_scale: 32.0 2024-09-22 14:44:22,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=37762.666666666664, ans=0.125 2024-09-22 14:44:23,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=37762.666666666664, ans=0.2 2024-09-22 14:44:28,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=37809.333333333336, ans=0.125 2024-09-22 14:44:37,528 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.369e+02 1.687e+02 1.987e+02 2.495e+02 5.356e+02, threshold=3.975e+02, percent-clipped=4.0 2024-09-22 14:44:39,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.38 vs. limit=10.0 2024-09-22 14:45:00,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-09-22 14:45:08,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=37902.666666666664, ans=0.2 2024-09-22 14:45:12,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=37902.666666666664, ans=0.125 2024-09-22 14:45:31,398 INFO [train.py:1198] (0/4) Epoch 3, batch 350, loss[loss=0.3411, ctc_loss=0.2596, cr_loss=0.4072, over 17297.00 frames. ], tot_loss[loss=0.3429, ctc_loss=0.2591, cr_loss=0.4189, over 2800346.55 frames. ], batch size: 46, lr: 3.45e-02, grad_scale: 32.0 2024-09-22 14:45:50,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38042.666666666664, ans=0.1 2024-09-22 14:45:55,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=38042.666666666664, ans=0.002599420289855072 2024-09-22 14:46:32,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=12.0 2024-09-22 14:46:46,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=38182.666666666664, ans=0.2 2024-09-22 14:46:57,024 INFO [train.py:1198] (0/4) Epoch 3, batch 400, loss[loss=0.3466, ctc_loss=0.2488, cr_loss=0.4887, over 17295.00 frames. ], tot_loss[loss=0.342, ctc_loss=0.2583, cr_loss=0.4184, over 2931856.15 frames. ], batch size: 46, lr: 3.45e-02, grad_scale: 32.0 2024-09-22 14:47:13,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=38276.0, ans=0.002548695652173913 2024-09-22 14:47:24,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38276.0, ans=0.1 2024-09-22 14:47:25,984 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 1.690e+02 1.971e+02 2.820e+02 5.296e+02, threshold=3.942e+02, percent-clipped=6.0 2024-09-22 14:47:51,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=38369.333333333336, ans=0.125 2024-09-22 14:47:55,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.18 vs. limit=22.5 2024-09-22 14:48:19,966 INFO [train.py:1198] (0/4) Epoch 3, batch 450, loss[loss=0.3382, ctc_loss=0.2531, cr_loss=0.4258, over 17294.00 frames. ], tot_loss[loss=0.3423, ctc_loss=0.2586, cr_loss=0.4186, over 3020810.45 frames. ], batch size: 51, lr: 3.44e-02, grad_scale: 32.0 2024-09-22 14:48:20,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=38462.666666666664, ans=0.002508115942028986 2024-09-22 14:48:20,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=12.0 2024-09-22 14:48:22,010 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:48:42,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-22 14:49:11,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=38602.666666666664, ans=0.2 2024-09-22 14:49:15,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=38602.666666666664, ans=0.125 2024-09-22 14:49:16,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=38602.666666666664, ans=0.125 2024-09-22 14:49:34,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=38649.333333333336, ans=0.0 2024-09-22 14:49:39,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2024-09-22 14:49:41,803 INFO [train.py:1198] (0/4) Epoch 3, batch 500, loss[loss=0.3468, ctc_loss=0.2615, cr_loss=0.4263, over 17322.00 frames. ], tot_loss[loss=0.341, ctc_loss=0.2575, cr_loss=0.4173, over 3095611.88 frames. ], batch size: 46, lr: 3.43e-02, grad_scale: 32.0 2024-09-22 14:50:04,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=38742.666666666664, ans=0.125 2024-09-22 14:50:07,129 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.349e+02 1.795e+02 2.310e+02 2.743e+02 4.759e+02, threshold=4.620e+02, percent-clipped=4.0 2024-09-22 14:50:16,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=38789.333333333336, ans=0.0024371014492753614 2024-09-22 14:50:26,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=12.0 2024-09-22 14:50:52,203 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 14:51:01,267 INFO [train.py:1198] (0/4) Epoch 3, batch 550, loss[loss=0.319, ctc_loss=0.237, cr_loss=0.4099, over 17151.00 frames. ], tot_loss[loss=0.342, ctc_loss=0.2582, cr_loss=0.4188, over 3158375.25 frames. ], batch size: 48, lr: 3.43e-02, grad_scale: 32.0 2024-09-22 14:51:29,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=38976.0, ans=0.1 2024-09-22 14:51:31,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=38976.0, ans=0.125 2024-09-22 14:51:32,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=38976.0, ans=0.07 2024-09-22 14:51:36,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=39022.666666666664, ans=0.125 2024-09-22 14:51:44,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=39022.666666666664, ans=0.125 2024-09-22 14:52:05,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=39069.333333333336, ans=0.1 2024-09-22 14:52:09,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-09-22 14:52:20,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=39116.0, ans=0.125 2024-09-22 14:52:25,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39116.0, ans=0.1 2024-09-22 14:52:28,851 INFO [train.py:1198] (0/4) Epoch 3, batch 600, loss[loss=0.3139, ctc_loss=0.2371, cr_loss=0.384, over 17162.00 frames. ], tot_loss[loss=0.3423, ctc_loss=0.2586, cr_loss=0.4186, over 3190149.38 frames. ], batch size: 45, lr: 3.42e-02, grad_scale: 32.0 2024-09-22 14:52:46,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=39209.333333333336, ans=0.125 2024-09-22 14:52:48,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39209.333333333336, ans=0.1 2024-09-22 14:52:54,491 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.307e+02 1.738e+02 2.025e+02 2.578e+02 4.577e+02, threshold=4.049e+02, percent-clipped=0.0 2024-09-22 14:52:58,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-22 14:53:15,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=39302.666666666664, ans=0.2 2024-09-22 14:53:30,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=39302.666666666664, ans=0.0023255072463768123 2024-09-22 14:53:30,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=39302.666666666664, ans=0.2 2024-09-22 14:53:37,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=39349.333333333336, ans=0.0 2024-09-22 14:53:50,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=39396.0, ans=0.125 2024-09-22 14:53:51,256 INFO [train.py:1198] (0/4) Epoch 3, batch 650, loss[loss=0.3265, ctc_loss=0.2413, cr_loss=0.4261, over 16948.00 frames. ], tot_loss[loss=0.3417, ctc_loss=0.258, cr_loss=0.4186, over 3230858.64 frames. ], batch size: 42, lr: 3.42e-02, grad_scale: 32.0 2024-09-22 14:53:52,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2024-09-22 14:54:11,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2024-09-22 14:54:13,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=39442.666666666664, ans=0.0022950724637681164 2024-09-22 14:54:18,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=39442.666666666664, ans=0.0 2024-09-22 14:54:53,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=39582.666666666664, ans=0.09899494936611666 2024-09-22 14:55:10,517 INFO [train.py:1198] (0/4) Epoch 3, batch 700, loss[loss=0.4508, ctc_loss=0.3589, cr_loss=0.4597, over 11871.00 frames. ], tot_loss[loss=0.3419, ctc_loss=0.2582, cr_loss=0.4184, over 3252891.57 frames. ], batch size: 123, lr: 3.41e-02, grad_scale: 32.0 2024-09-22 14:55:16,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2024-09-22 14:55:35,725 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.431e+02 1.707e+02 1.931e+02 2.238e+02 4.906e+02, threshold=3.863e+02, percent-clipped=1.0 2024-09-22 14:55:37,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=39676.0, ans=0.0 2024-09-22 14:56:32,766 INFO [train.py:1198] (0/4) Epoch 3, batch 750, loss[loss=0.3651, ctc_loss=0.275, cr_loss=0.4506, over 17011.00 frames. ], tot_loss[loss=0.3416, ctc_loss=0.2577, cr_loss=0.4192, over 3285820.04 frames. ], batch size: 51, lr: 3.41e-02, grad_scale: 32.0 2024-09-22 14:57:33,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=40002.666666666664, ans=0.125 2024-09-22 14:57:57,150 INFO [train.py:1198] (0/4) Epoch 3, batch 800, loss[loss=0.3438, ctc_loss=0.2566, cr_loss=0.4361, over 16984.00 frames. ], tot_loss[loss=0.3407, ctc_loss=0.2569, cr_loss=0.4191, over 3309184.26 frames. ], batch size: 53, lr: 3.40e-02, grad_scale: 32.0 2024-09-22 14:58:03,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=40096.0, ans=0.125 2024-09-22 14:58:25,182 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.263e+02 1.611e+02 1.885e+02 2.376e+02 4.057e+02, threshold=3.771e+02, percent-clipped=1.0 2024-09-22 14:59:18,684 INFO [train.py:1198] (0/4) Epoch 3, batch 850, loss[loss=0.3085, ctc_loss=0.2281, cr_loss=0.4021, over 17278.00 frames. ], tot_loss[loss=0.3417, ctc_loss=0.2578, cr_loss=0.4194, over 3314662.49 frames. ], batch size: 42, lr: 3.39e-02, grad_scale: 32.0 2024-09-22 14:59:37,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=40376.0, ans=0.125 2024-09-22 15:00:13,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=40469.333333333336, ans=0.0 2024-09-22 15:00:19,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=40469.333333333336, ans=0.2 2024-09-22 15:00:38,410 INFO [train.py:1198] (0/4) Epoch 3, batch 900, loss[loss=0.3878, ctc_loss=0.2941, cr_loss=0.4684, over 16979.00 frames. ], tot_loss[loss=0.342, ctc_loss=0.258, cr_loss=0.4199, over 3321299.86 frames. ], batch size: 56, lr: 3.39e-02, grad_scale: 32.0 2024-09-22 15:00:53,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2024-09-22 15:00:59,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-09-22 15:01:03,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=40609.333333333336, ans=0.0 2024-09-22 15:01:06,321 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.646e+02 1.943e+02 2.321e+02 3.880e+02, threshold=3.887e+02, percent-clipped=1.0 2024-09-22 15:01:11,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=40656.0, ans=0.125 2024-09-22 15:02:03,457 INFO [train.py:1198] (0/4) Epoch 3, batch 950, loss[loss=0.3117, ctc_loss=0.2368, cr_loss=0.3742, over 17079.00 frames. ], tot_loss[loss=0.3417, ctc_loss=0.2578, cr_loss=0.4195, over 3328127.09 frames. ], batch size: 43, lr: 3.38e-02, grad_scale: 32.0 2024-09-22 15:02:27,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=40842.666666666664, ans=0.0 2024-09-22 15:02:42,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-22 15:03:08,963 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:03:14,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2024-09-22 15:03:28,700 INFO [train.py:1198] (0/4) Epoch 3, batch 1000, loss[loss=0.3353, ctc_loss=0.2572, cr_loss=0.3906, over 17020.00 frames. ], tot_loss[loss=0.3425, ctc_loss=0.2584, cr_loss=0.4202, over 3339820.56 frames. ], batch size: 51, lr: 3.38e-02, grad_scale: 32.0 2024-09-22 15:03:33,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=41029.333333333336, ans=0.0 2024-09-22 15:03:54,235 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.429e+02 1.784e+02 2.139e+02 2.624e+02 4.654e+02, threshold=4.278e+02, percent-clipped=1.0 2024-09-22 15:04:13,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=41122.666666666664, ans=0.1 2024-09-22 15:04:31,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=41216.0, ans=0.2 2024-09-22 15:04:48,224 INFO [train.py:1198] (0/4) Epoch 3, batch 1050, loss[loss=0.3455, ctc_loss=0.2579, cr_loss=0.4379, over 17061.00 frames. ], tot_loss[loss=0.3422, ctc_loss=0.2582, cr_loss=0.4203, over 3344243.90 frames. ], batch size: 52, lr: 3.37e-02, grad_scale: 32.0 2024-09-22 15:05:07,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41309.333333333336, ans=0.1 2024-09-22 15:06:10,769 INFO [train.py:1198] (0/4) Epoch 3, batch 1100, loss[loss=0.3109, ctc_loss=0.2274, cr_loss=0.4174, over 17082.00 frames. ], tot_loss[loss=0.3421, ctc_loss=0.2581, cr_loss=0.42, over 3352864.83 frames. ], batch size: 46, lr: 3.37e-02, grad_scale: 32.0 2024-09-22 15:06:11,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=41496.0, ans=0.125 2024-09-22 15:06:14,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=41496.0, ans=0.125 2024-09-22 15:06:28,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=41542.666666666664, ans=0.2 2024-09-22 15:06:35,970 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.301e+02 1.612e+02 1.917e+02 2.437e+02 4.278e+02, threshold=3.834e+02, percent-clipped=2.0 2024-09-22 15:06:42,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41542.666666666664, ans=0.1 2024-09-22 15:06:42,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=41542.666666666664, ans=0.125 2024-09-22 15:07:35,513 INFO [train.py:1198] (0/4) Epoch 3, batch 1150, loss[loss=0.2692, ctc_loss=0.2001, cr_loss=0.3456, over 17264.00 frames. ], tot_loss[loss=0.3408, ctc_loss=0.2569, cr_loss=0.4194, over 3362590.91 frames. ], batch size: 42, lr: 3.36e-02, grad_scale: 32.0 2024-09-22 15:07:35,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=41729.333333333336, ans=0.125 2024-09-22 15:07:37,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=41729.333333333336, ans=0.125 2024-09-22 15:07:52,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=41776.0, ans=0.2 2024-09-22 15:07:55,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=41776.0, ans=0.2 2024-09-22 15:08:12,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=41822.666666666664, ans=0.125 2024-09-22 15:08:14,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.20 vs. limit=10.0 2024-09-22 15:08:47,644 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:08:53,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=41916.0, ans=0.0017573913043478252 2024-09-22 15:08:57,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=41962.666666666664, ans=0.09899494936611666 2024-09-22 15:08:58,268 INFO [train.py:1198] (0/4) Epoch 3, batch 1200, loss[loss=0.2754, ctc_loss=0.2006, cr_loss=0.3744, over 17102.00 frames. ], tot_loss[loss=0.3396, ctc_loss=0.2558, cr_loss=0.4191, over 3365710.01 frames. ], batch size: 40, lr: 3.36e-02, grad_scale: 32.0 2024-09-22 15:09:23,540 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.305e+02 1.644e+02 1.881e+02 2.304e+02 4.141e+02, threshold=3.762e+02, percent-clipped=3.0 2024-09-22 15:09:31,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=42056.0, ans=0.025 2024-09-22 15:10:17,334 INFO [train.py:1198] (0/4) Epoch 3, batch 1250, loss[loss=0.3386, ctc_loss=0.2557, cr_loss=0.4144, over 17290.00 frames. ], tot_loss[loss=0.3401, ctc_loss=0.2563, cr_loss=0.4189, over 3360086.82 frames. ], batch size: 46, lr: 3.35e-02, grad_scale: 32.0 2024-09-22 15:10:33,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=42242.666666666664, ans=0.2 2024-09-22 15:10:35,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-22 15:11:00,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-09-22 15:11:41,930 INFO [train.py:1198] (0/4) Epoch 3, batch 1300, loss[loss=0.3483, ctc_loss=0.2643, cr_loss=0.4199, over 17007.00 frames. ], tot_loss[loss=0.3388, ctc_loss=0.2552, cr_loss=0.4182, over 3361626.19 frames. ], batch size: 52, lr: 3.34e-02, grad_scale: 32.0 2024-09-22 15:11:42,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=42429.333333333336, ans=0.2 2024-09-22 15:11:42,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=42429.333333333336, ans=0.125 2024-09-22 15:11:56,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-09-22 15:12:01,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=42476.0, ans=0.0 2024-09-22 15:12:09,822 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.394e+02 1.777e+02 1.981e+02 2.466e+02 4.544e+02, threshold=3.962e+02, percent-clipped=3.0 2024-09-22 15:12:11,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=42476.0, ans=0.125 2024-09-22 15:12:16,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=42522.666666666664, ans=0.125 2024-09-22 15:13:02,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2024-09-22 15:13:06,080 INFO [train.py:1198] (0/4) Epoch 3, batch 1350, loss[loss=0.3536, ctc_loss=0.2655, cr_loss=0.4407, over 17347.00 frames. ], tot_loss[loss=0.3405, ctc_loss=0.2564, cr_loss=0.4204, over 3367102.94 frames. ], batch size: 48, lr: 3.34e-02, grad_scale: 32.0 2024-09-22 15:13:22,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=42709.333333333336, ans=0.125 2024-09-22 15:13:59,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=42802.666666666664, ans=0.125 2024-09-22 15:14:02,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=42802.666666666664, ans=0.0 2024-09-22 15:14:03,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=42802.666666666664, ans=0.0 2024-09-22 15:14:25,893 INFO [train.py:1198] (0/4) Epoch 3, batch 1400, loss[loss=0.3276, ctc_loss=0.2469, cr_loss=0.4035, over 17238.00 frames. ], tot_loss[loss=0.3402, ctc_loss=0.2563, cr_loss=0.4198, over 3368843.71 frames. ], batch size: 50, lr: 3.33e-02, grad_scale: 32.0 2024-09-22 15:14:45,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=42942.666666666664, ans=0.125 2024-09-22 15:14:51,784 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.368e+02 1.674e+02 1.940e+02 2.218e+02 3.917e+02, threshold=3.881e+02, percent-clipped=0.0 2024-09-22 15:15:03,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=42989.333333333336, ans=0.0 2024-09-22 15:15:26,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=43036.0, ans=0.0015139130434782613 2024-09-22 15:15:26,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=43036.0, ans=0.1 2024-09-22 15:15:47,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=43129.333333333336, ans=0.2 2024-09-22 15:15:49,162 INFO [train.py:1198] (0/4) Epoch 3, batch 1450, loss[loss=0.3591, ctc_loss=0.2677, cr_loss=0.4569, over 17023.00 frames. ], tot_loss[loss=0.3396, ctc_loss=0.2557, cr_loss=0.4195, over 3373387.17 frames. ], batch size: 53, lr: 3.33e-02, grad_scale: 32.0 2024-09-22 15:16:26,308 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:16:49,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=43269.333333333336, ans=0.05 2024-09-22 15:17:13,619 INFO [train.py:1198] (0/4) Epoch 3, batch 1500, loss[loss=0.3592, ctc_loss=0.275, cr_loss=0.4209, over 16560.00 frames. ], tot_loss[loss=0.3398, ctc_loss=0.2558, cr_loss=0.4201, over 3370076.27 frames. ], batch size: 66, lr: 3.32e-02, grad_scale: 32.0 2024-09-22 15:17:35,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=43409.333333333336, ans=0.125 2024-09-22 15:17:38,773 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.398e+02 1.638e+02 2.002e+02 2.354e+02 3.823e+02, threshold=4.005e+02, percent-clipped=0.0 2024-09-22 15:17:50,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=43456.0, ans=0.125 2024-09-22 15:17:57,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=43456.0, ans=0.0014226086956521736 2024-09-22 15:17:59,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=43502.666666666664, ans=10.0 2024-09-22 15:18:35,372 INFO [train.py:1198] (0/4) Epoch 3, batch 1550, loss[loss=0.33, ctc_loss=0.2492, cr_loss=0.4039, over 17098.00 frames. ], tot_loss[loss=0.3417, ctc_loss=0.2576, cr_loss=0.4206, over 3351799.01 frames. ], batch size: 43, lr: 3.32e-02, grad_scale: 32.0 2024-09-22 15:18:40,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=43596.0, ans=0.125 2024-09-22 15:18:47,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=43596.0, ans=0.125 2024-09-22 15:18:48,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2024-09-22 15:18:53,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=43642.666666666664, ans=0.1 2024-09-22 15:19:34,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=43736.0, ans=0.0 2024-09-22 15:19:42,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43782.666666666664, ans=0.1 2024-09-22 15:19:47,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=43782.666666666664, ans=0.125 2024-09-22 15:19:55,281 INFO [train.py:1198] (0/4) Epoch 3, batch 1600, loss[loss=0.3604, ctc_loss=0.2719, cr_loss=0.4428, over 17037.00 frames. ], tot_loss[loss=0.3399, ctc_loss=0.2561, cr_loss=0.4191, over 3349474.04 frames. ], batch size: 56, lr: 3.31e-02, grad_scale: 32.0 2024-09-22 15:20:13,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2024-09-22 15:20:16,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=43876.0, ans=0.2 2024-09-22 15:20:20,845 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.341e+02 1.634e+02 1.960e+02 2.382e+02 4.201e+02, threshold=3.920e+02, percent-clipped=2.0 2024-09-22 15:20:22,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=43876.0, ans=0.125 2024-09-22 15:20:37,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=43922.666666666664, ans=0.125 2024-09-22 15:20:45,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=43969.333333333336, ans=0.125 2024-09-22 15:21:12,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=44016.0, ans=0.125 2024-09-22 15:21:17,087 INFO [train.py:1198] (0/4) Epoch 3, batch 1650, loss[loss=0.2974, ctc_loss=0.2222, cr_loss=0.376, over 17245.00 frames. ], tot_loss[loss=0.3387, ctc_loss=0.2551, cr_loss=0.4184, over 3353864.01 frames. ], batch size: 44, lr: 3.31e-02, grad_scale: 32.0 2024-09-22 15:21:21,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=44062.666666666664, ans=0.125 2024-09-22 15:21:24,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-22 15:21:30,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=44062.666666666664, ans=0.0 2024-09-22 15:21:46,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=44109.333333333336, ans=0.2 2024-09-22 15:21:54,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=44156.0, ans=0.125 2024-09-22 15:22:11,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44202.666666666664, ans=0.1 2024-09-22 15:22:24,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=44249.333333333336, ans=0.2 2024-09-22 15:22:37,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=44249.333333333336, ans=0.035 2024-09-22 15:22:41,999 INFO [train.py:1198] (0/4) Epoch 3, batch 1700, loss[loss=0.2787, ctc_loss=0.2078, cr_loss=0.3545, over 17177.00 frames. ], tot_loss[loss=0.3375, ctc_loss=0.2541, cr_loss=0.417, over 3360519.32 frames. ], batch size: 45, lr: 3.30e-02, grad_scale: 32.0 2024-09-22 15:23:09,725 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.265e+02 1.715e+02 2.359e+02 2.941e+02 4.631e+02, threshold=4.717e+02, percent-clipped=4.0 2024-09-22 15:23:35,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=44436.0, ans=0.2 2024-09-22 15:23:54,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=44482.666666666664, ans=0.125 2024-09-22 15:23:54,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=44482.666666666664, ans=0.0 2024-09-22 15:24:03,724 INFO [train.py:1198] (0/4) Epoch 3, batch 1750, loss[loss=0.373, ctc_loss=0.2818, cr_loss=0.4556, over 17217.00 frames. ], tot_loss[loss=0.3364, ctc_loss=0.2532, cr_loss=0.4163, over 3362501.11 frames. ], batch size: 55, lr: 3.30e-02, grad_scale: 32.0 2024-09-22 15:24:14,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=15.0 2024-09-22 15:24:19,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=44576.0, ans=0.125 2024-09-22 15:24:40,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44622.666666666664, ans=0.1 2024-09-22 15:25:04,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44669.333333333336, ans=0.1 2024-09-22 15:25:09,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44716.0, ans=0.1 2024-09-22 15:25:25,621 INFO [train.py:1198] (0/4) Epoch 3, batch 1800, loss[loss=0.3545, ctc_loss=0.2714, cr_loss=0.4158, over 16653.00 frames. ], tot_loss[loss=0.3365, ctc_loss=0.2529, cr_loss=0.4178, over 3368449.78 frames. ], batch size: 66, lr: 3.29e-02, grad_scale: 64.0 2024-09-22 15:25:47,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=12.0 2024-09-22 15:25:53,003 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.338e+02 1.793e+02 2.251e+02 2.697e+02 4.483e+02, threshold=4.502e+02, percent-clipped=0.0 2024-09-22 15:26:06,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2024-09-22 15:26:30,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=44949.333333333336, ans=0.125 2024-09-22 15:26:39,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=44949.333333333336, ans=0.125 2024-09-22 15:26:45,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.51 vs. limit=15.0 2024-09-22 15:26:47,446 INFO [train.py:1198] (0/4) Epoch 3, batch 1850, loss[loss=0.4383, ctc_loss=0.3464, cr_loss=0.4598, over 12027.00 frames. ], tot_loss[loss=0.3373, ctc_loss=0.2536, cr_loss=0.4189, over 3367408.28 frames. ], batch size: 123, lr: 3.29e-02, grad_scale: 32.0 2024-09-22 15:27:09,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-09-22 15:27:33,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=45089.333333333336, ans=0.0 2024-09-22 15:27:36,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=45136.0, ans=0.125 2024-09-22 15:28:12,190 INFO [train.py:1198] (0/4) Epoch 3, batch 1900, loss[loss=0.2969, ctc_loss=0.2239, cr_loss=0.3651, over 17169.00 frames. ], tot_loss[loss=0.3369, ctc_loss=0.2532, cr_loss=0.4185, over 3359795.39 frames. ], batch size: 41, lr: 3.28e-02, grad_scale: 32.0 2024-09-22 15:28:23,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=45229.333333333336, ans=0.001037101449275362 2024-09-22 15:28:38,955 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 1.740e+02 2.164e+02 2.812e+02 4.193e+02, threshold=4.328e+02, percent-clipped=0.0 2024-09-22 15:28:50,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=45322.666666666664, ans=0.0 2024-09-22 15:29:32,030 INFO [train.py:1198] (0/4) Epoch 3, batch 1950, loss[loss=0.2569, ctc_loss=0.1883, cr_loss=0.3434, over 16386.00 frames. ], tot_loss[loss=0.3358, ctc_loss=0.2522, cr_loss=0.4179, over 3349811.49 frames. ], batch size: 36, lr: 3.27e-02, grad_scale: 32.0 2024-09-22 15:29:42,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-09-22 15:29:54,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=45509.333333333336, ans=0.125 2024-09-22 15:30:02,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45556.0, ans=0.1 2024-09-22 15:30:07,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=45556.0, ans=0.0 2024-09-22 15:30:13,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=45556.0, ans=0.0 2024-09-22 15:30:45,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2024-09-22 15:30:50,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2024-09-22 15:30:54,282 INFO [train.py:1198] (0/4) Epoch 3, batch 2000, loss[loss=0.349, ctc_loss=0.262, cr_loss=0.4353, over 17117.00 frames. ], tot_loss[loss=0.3353, ctc_loss=0.2517, cr_loss=0.4178, over 3350528.68 frames. ], batch size: 49, lr: 3.27e-02, grad_scale: 32.0 2024-09-22 15:31:05,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45696.0, ans=0.1 2024-09-22 15:31:23,957 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.430e+02 1.768e+02 1.993e+02 2.472e+02 5.161e+02, threshold=3.986e+02, percent-clipped=2.0 2024-09-22 15:32:13,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45882.666666666664, ans=0.1 2024-09-22 15:32:14,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=45882.666666666664, ans=0.125 2024-09-22 15:32:19,442 INFO [train.py:1198] (0/4) Epoch 3, batch 2050, loss[loss=0.3044, ctc_loss=0.2269, cr_loss=0.3871, over 17206.00 frames. ], tot_loss[loss=0.3377, ctc_loss=0.2538, cr_loss=0.4195, over 3339941.40 frames. ], batch size: 41, lr: 3.26e-02, grad_scale: 32.0 2024-09-22 15:32:45,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-22 15:33:09,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=46069.333333333336, ans=0.125 2024-09-22 15:33:33,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=46116.0, ans=0.125 2024-09-22 15:33:40,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=46162.666666666664, ans=0.0 2024-09-22 15:33:41,382 INFO [train.py:1198] (0/4) Epoch 3, batch 2100, loss[loss=0.3463, ctc_loss=0.26, cr_loss=0.4313, over 17014.00 frames. ], tot_loss[loss=0.3398, ctc_loss=0.2556, cr_loss=0.4207, over 3338065.93 frames. ], batch size: 53, lr: 3.26e-02, grad_scale: 32.0 2024-09-22 15:34:04,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=46209.333333333336, ans=0.125 2024-09-22 15:34:08,497 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.358e+02 1.827e+02 2.117e+02 2.620e+02 4.403e+02, threshold=4.235e+02, percent-clipped=1.0 2024-09-22 15:34:28,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2024-09-22 15:34:51,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=46349.333333333336, ans=0.0 2024-09-22 15:34:54,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=46349.333333333336, ans=0.125 2024-09-22 15:35:01,055 INFO [train.py:1198] (0/4) Epoch 3, batch 2150, loss[loss=0.2998, ctc_loss=0.2208, cr_loss=0.3948, over 16978.00 frames. ], tot_loss[loss=0.3395, ctc_loss=0.2554, cr_loss=0.4204, over 3335630.77 frames. ], batch size: 42, lr: 3.25e-02, grad_scale: 32.0 2024-09-22 15:35:16,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=46396.0, ans=0.025 2024-09-22 15:35:16,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46396.0, ans=0.1 2024-09-22 15:35:56,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=46536.0, ans=0.0007530434782608703 2024-09-22 15:36:05,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=46582.666666666664, ans=0.125 2024-09-22 15:36:17,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46582.666666666664, ans=0.1 2024-09-22 15:36:25,582 INFO [train.py:1198] (0/4) Epoch 3, batch 2200, loss[loss=0.3731, ctc_loss=0.2877, cr_loss=0.427, over 17342.00 frames. ], tot_loss[loss=0.3397, ctc_loss=0.2556, cr_loss=0.4207, over 3341211.91 frames. ], batch size: 48, lr: 3.25e-02, grad_scale: 32.0 2024-09-22 15:36:55,288 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.307e+02 1.636e+02 2.012e+02 2.486e+02 4.697e+02, threshold=4.025e+02, percent-clipped=4.0 2024-09-22 15:37:30,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=46816.0, ans=0.025 2024-09-22 15:37:43,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=15.0 2024-09-22 15:37:46,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=46816.0, ans=0.125 2024-09-22 15:37:50,690 INFO [train.py:1198] (0/4) Epoch 3, batch 2250, loss[loss=0.3775, ctc_loss=0.2862, cr_loss=0.4568, over 16538.00 frames. ], tot_loss[loss=0.3389, ctc_loss=0.255, cr_loss=0.4194, over 3340909.26 frames. ], batch size: 66, lr: 3.24e-02, grad_scale: 32.0 2024-09-22 15:37:56,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-22 15:37:59,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2024-09-22 15:38:20,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2024-09-22 15:38:43,660 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 15:38:53,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=47049.333333333336, ans=0.125 2024-09-22 15:38:54,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=47049.333333333336, ans=0.05 2024-09-22 15:39:07,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=47049.333333333336, ans=0.0 2024-09-22 15:39:10,367 INFO [train.py:1198] (0/4) Epoch 3, batch 2300, loss[loss=0.335, ctc_loss=0.2514, cr_loss=0.418, over 17096.00 frames. ], tot_loss[loss=0.3368, ctc_loss=0.2532, cr_loss=0.4181, over 3341453.71 frames. ], batch size: 43, lr: 3.24e-02, grad_scale: 32.0 2024-09-22 15:39:20,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47096.0, ans=0.1 2024-09-22 15:39:24,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=47142.666666666664, ans=0.025 2024-09-22 15:39:37,771 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.373e+02 1.712e+02 2.236e+02 2.750e+02 4.925e+02, threshold=4.473e+02, percent-clipped=7.0 2024-09-22 15:39:57,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=47236.0, ans=0.0006008695652173907 2024-09-22 15:39:58,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.64 vs. limit=10.0 2024-09-22 15:40:28,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=47282.666666666664, ans=0.1 2024-09-22 15:40:30,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=47282.666666666664, ans=0.025 2024-09-22 15:40:33,162 INFO [train.py:1198] (0/4) Epoch 3, batch 2350, loss[loss=0.3018, ctc_loss=0.2262, cr_loss=0.3778, over 17227.00 frames. ], tot_loss[loss=0.3385, ctc_loss=0.2547, cr_loss=0.419, over 3321862.10 frames. ], batch size: 50, lr: 3.23e-02, grad_scale: 32.0 2024-09-22 15:40:41,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47329.333333333336, ans=0.1 2024-09-22 15:41:05,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=47422.666666666664, ans=0.05 2024-09-22 15:41:24,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=47469.333333333336, ans=0.035 2024-09-22 15:41:24,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2024-09-22 15:41:58,590 INFO [train.py:1198] (0/4) Epoch 3, batch 2400, loss[loss=0.3184, ctc_loss=0.2363, cr_loss=0.4107, over 17005.00 frames. ], tot_loss[loss=0.3372, ctc_loss=0.2535, cr_loss=0.4186, over 3328736.87 frames. ], batch size: 53, lr: 3.23e-02, grad_scale: 32.0 2024-09-22 15:42:18,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.19 vs. limit=10.0 2024-09-22 15:42:25,879 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.420e+02 1.787e+02 2.027e+02 2.376e+02 4.296e+02, threshold=4.054e+02, percent-clipped=0.0 2024-09-22 15:42:36,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=47656.0, ans=0.04949747468305833 2024-09-22 15:42:38,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=47656.0, ans=0.0 2024-09-22 15:42:46,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=47656.0, ans=0.0 2024-09-22 15:43:01,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=47702.666666666664, ans=0.0004994202898550727 2024-09-22 15:43:06,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2024-09-22 15:43:09,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=47749.333333333336, ans=0.1 2024-09-22 15:43:17,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2024-09-22 15:43:21,635 INFO [train.py:1198] (0/4) Epoch 3, batch 2450, loss[loss=0.301, ctc_loss=0.2234, cr_loss=0.3882, over 17181.00 frames. ], tot_loss[loss=0.3345, ctc_loss=0.251, cr_loss=0.4176, over 3345249.29 frames. ], batch size: 45, lr: 3.22e-02, grad_scale: 32.0 2024-09-22 15:43:42,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=47842.666666666664, ans=0.125 2024-09-22 15:43:51,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47889.333333333336, ans=0.1 2024-09-22 15:43:58,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47889.333333333336, ans=0.1 2024-09-22 15:44:06,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=47889.333333333336, ans=0.125 2024-09-22 15:44:16,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2024-09-22 15:44:19,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=47936.0, ans=0.125 2024-09-22 15:44:41,205 INFO [train.py:1198] (0/4) Epoch 3, batch 2500, loss[loss=0.3935, ctc_loss=0.3017, cr_loss=0.4591, over 16989.00 frames. ], tot_loss[loss=0.3342, ctc_loss=0.2505, cr_loss=0.4181, over 3351585.59 frames. ], batch size: 53, lr: 3.22e-02, grad_scale: 32.0 2024-09-22 15:45:08,320 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.319e+02 1.703e+02 1.872e+02 2.246e+02 3.567e+02, threshold=3.744e+02, percent-clipped=0.0 2024-09-22 15:45:12,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2024-09-22 15:45:33,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=48169.333333333336, ans=0.1 2024-09-22 15:45:36,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=48169.333333333336, ans=0.5 2024-09-22 15:45:53,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=48216.0, ans=0.2 2024-09-22 15:45:59,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=48216.0, ans=0.025 2024-09-22 15:46:00,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=48216.0, ans=0.125 2024-09-22 15:46:01,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2024-09-22 15:46:03,939 INFO [train.py:1198] (0/4) Epoch 3, batch 2550, loss[loss=0.3246, ctc_loss=0.2424, cr_loss=0.4111, over 17294.00 frames. ], tot_loss[loss=0.3347, ctc_loss=0.2509, cr_loss=0.4187, over 3353156.90 frames. ], batch size: 49, lr: 3.21e-02, grad_scale: 32.0 2024-09-22 15:46:14,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=48262.666666666664, ans=0.2 2024-09-22 15:46:44,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=48356.0, ans=0.0 2024-09-22 15:46:52,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=48356.0, ans=0.125 2024-09-22 15:47:02,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.63 vs. limit=22.5 2024-09-22 15:47:03,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=48402.666666666664, ans=0.025 2024-09-22 15:47:05,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=48402.666666666664, ans=0.125 2024-09-22 15:47:14,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-22 15:47:31,518 INFO [train.py:1198] (0/4) Epoch 3, batch 2600, loss[loss=0.3307, ctc_loss=0.2453, cr_loss=0.4272, over 17210.00 frames. ], tot_loss[loss=0.3358, ctc_loss=0.252, cr_loss=0.4191, over 3345433.78 frames. ], batch size: 47, lr: 3.21e-02, grad_scale: 32.0 2024-09-22 15:47:48,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2024-09-22 15:47:58,749 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.408e+02 1.709e+02 1.993e+02 2.394e+02 4.094e+02, threshold=3.987e+02, percent-clipped=1.0 2024-09-22 15:48:18,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=48636.0, ans=0.0 2024-09-22 15:48:21,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2024-09-22 15:48:32,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=48636.0, ans=0.125 2024-09-22 15:48:48,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=48682.666666666664, ans=0.125 2024-09-22 15:48:51,832 INFO [train.py:1198] (0/4) Epoch 3, batch 2650, loss[loss=0.3468, ctc_loss=0.2616, cr_loss=0.4263, over 16992.00 frames. ], tot_loss[loss=0.3355, ctc_loss=0.2517, cr_loss=0.4191, over 3347517.44 frames. ], batch size: 56, lr: 3.20e-02, grad_scale: 32.0 2024-09-22 15:49:29,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=48822.666666666664, ans=0.0 2024-09-22 15:49:58,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48916.0, ans=0.1 2024-09-22 15:50:01,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=48916.0, ans=0.2 2024-09-22 15:50:03,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=48916.0, ans=0.2 2024-09-22 15:50:06,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=48916.0, ans=0.2 2024-09-22 15:50:14,730 INFO [train.py:1198] (0/4) Epoch 3, batch 2700, loss[loss=0.2759, ctc_loss=0.2052, cr_loss=0.3534, over 17134.00 frames. ], tot_loss[loss=0.3351, ctc_loss=0.2514, cr_loss=0.4187, over 3344668.47 frames. ], batch size: 40, lr: 3.20e-02, grad_scale: 32.0 2024-09-22 15:50:33,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49009.333333333336, ans=0.1 2024-09-22 15:50:41,576 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.331e+02 1.782e+02 2.096e+02 2.443e+02 4.661e+02, threshold=4.192e+02, percent-clipped=1.0 2024-09-22 15:50:43,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=49009.333333333336, ans=0.00021536231884057913 2024-09-22 15:51:07,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=49102.666666666664, ans=0.125 2024-09-22 15:51:39,723 INFO [train.py:1198] (0/4) Epoch 3, batch 2750, loss[loss=0.3851, ctc_loss=0.2866, cr_loss=0.4923, over 17339.00 frames. ], tot_loss[loss=0.3363, ctc_loss=0.2523, cr_loss=0.4198, over 3344028.95 frames. ], batch size: 48, lr: 3.19e-02, grad_scale: 32.0 2024-09-22 15:51:41,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49196.0, ans=0.1 2024-09-22 15:51:54,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=49242.666666666664, ans=0.125 2024-09-22 15:52:07,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=49242.666666666664, ans=0.125 2024-09-22 15:52:19,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.70 vs. limit=15.0 2024-09-22 15:52:32,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=49336.0, ans=0.07 2024-09-22 15:52:35,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49336.0, ans=0.1 2024-09-22 15:52:41,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=49336.0, ans=0.0 2024-09-22 15:52:45,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=49382.666666666664, ans=0.0 2024-09-22 15:53:01,838 INFO [train.py:1198] (0/4) Epoch 3, batch 2800, loss[loss=0.3845, ctc_loss=0.2951, cr_loss=0.4471, over 17001.00 frames. ], tot_loss[loss=0.3358, ctc_loss=0.2518, cr_loss=0.4197, over 3356319.75 frames. ], batch size: 56, lr: 3.19e-02, grad_scale: 32.0 2024-09-22 15:53:29,476 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.403e+02 1.759e+02 2.001e+02 2.340e+02 4.757e+02, threshold=4.003e+02, percent-clipped=1.0 2024-09-22 15:53:32,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=49522.666666666664, ans=0.0 2024-09-22 15:53:37,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=49522.666666666664, ans=0.125 2024-09-22 15:53:48,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=49569.333333333336, ans=0.125 2024-09-22 15:53:52,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=49569.333333333336, ans=0.2 2024-09-22 15:53:56,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=49569.333333333336, ans=9.362318840579545e-05 2024-09-22 15:54:22,391 INFO [train.py:1198] (0/4) Epoch 3, batch 2850, loss[loss=0.3011, ctc_loss=0.2209, cr_loss=0.4011, over 17006.00 frames. ], tot_loss[loss=0.3356, ctc_loss=0.2518, cr_loss=0.4193, over 3336855.71 frames. ], batch size: 44, lr: 3.18e-02, grad_scale: 32.0 2024-09-22 15:54:35,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-22 15:54:38,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=49709.333333333336, ans=0.125 2024-09-22 15:55:07,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=49756.0, ans=0.0 2024-09-22 15:55:08,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=49756.0, ans=0.125 2024-09-22 15:55:15,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=49802.666666666664, ans=4.289855072463905e-05 2024-09-22 15:55:45,109 INFO [train.py:1198] (0/4) Epoch 3, batch 2900, loss[loss=0.3513, ctc_loss=0.2622, cr_loss=0.4457, over 17144.00 frames. ], tot_loss[loss=0.3354, ctc_loss=0.2517, cr_loss=0.4186, over 3342450.09 frames. ], batch size: 48, lr: 3.18e-02, grad_scale: 32.0 2024-09-22 15:55:45,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=49896.0, ans=2.260869565217337e-05 2024-09-22 15:55:46,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=49896.0, ans=0.125 2024-09-22 15:55:51,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=49896.0, ans=0.125 2024-09-22 15:56:14,387 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.376e+02 1.745e+02 2.067e+02 2.607e+02 4.355e+02, threshold=4.133e+02, percent-clipped=1.0 2024-09-22 15:56:42,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=50036.0, ans=0.0 2024-09-22 15:56:53,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=50082.666666666664, ans=0.0 2024-09-22 15:57:00,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50082.666666666664, ans=0.1 2024-09-22 15:57:09,737 INFO [train.py:1198] (0/4) Epoch 3, batch 2950, loss[loss=0.4015, ctc_loss=0.3191, cr_loss=0.412, over 11586.00 frames. ], tot_loss[loss=0.3366, ctc_loss=0.2527, cr_loss=0.4196, over 3339171.80 frames. ], batch size: 123, lr: 3.17e-02, grad_scale: 32.0 2024-09-22 15:57:19,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=50129.333333333336, ans=10.0 2024-09-22 15:57:48,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2024-09-22 15:58:14,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=50316.0, ans=0.125 2024-09-22 15:58:15,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=50316.0, ans=0.0 2024-09-22 15:58:31,267 INFO [train.py:1198] (0/4) Epoch 3, batch 3000, loss[loss=0.2727, ctc_loss=0.1951, cr_loss=0.388, over 17272.00 frames. ], tot_loss[loss=0.3351, ctc_loss=0.2513, cr_loss=0.4193, over 3345477.26 frames. ], batch size: 42, lr: 3.17e-02, grad_scale: 32.0 2024-09-22 15:58:31,267 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 15:58:46,491 INFO [train.py:1230] (0/4) Epoch 3, validation: loss=0.08436, ctc_loss=0.08436, cr_loss=7.957e-15, over 944034.00 frames. 2024-09-22 15:58:46,492 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 15:59:12,854 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.406e+02 1.636e+02 1.924e+02 2.171e+02 3.615e+02, threshold=3.848e+02, percent-clipped=0.0 2024-09-22 15:59:15,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-09-22 16:00:04,593 INFO [train.py:1198] (0/4) Epoch 3, batch 3050, loss[loss=0.3517, ctc_loss=0.2644, cr_loss=0.4362, over 17296.00 frames. ], tot_loss[loss=0.3353, ctc_loss=0.2512, cr_loss=0.4206, over 3352031.12 frames. ], batch size: 46, lr: 3.16e-02, grad_scale: 32.0 2024-09-22 16:00:18,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=50642.666666666664, ans=0.125 2024-09-22 16:00:56,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=50736.0, ans=0.04949747468305833 2024-09-22 16:00:58,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2024-09-22 16:01:07,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=50782.666666666664, ans=0.0 2024-09-22 16:01:08,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=50782.666666666664, ans=0.125 2024-09-22 16:01:16,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=50782.666666666664, ans=0.0 2024-09-22 16:01:21,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-09-22 16:01:22,490 INFO [train.py:1198] (0/4) Epoch 3, batch 3100, loss[loss=0.2984, ctc_loss=0.2204, cr_loss=0.3903, over 17064.00 frames. ], tot_loss[loss=0.3338, ctc_loss=0.25, cr_loss=0.4192, over 3348363.11 frames. ], batch size: 43, lr: 3.16e-02, grad_scale: 32.0 2024-09-22 16:01:24,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=50829.333333333336, ans=0.125 2024-09-22 16:01:27,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=50829.333333333336, ans=0.05 2024-09-22 16:01:33,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50829.333333333336, ans=0.1 2024-09-22 16:01:39,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=50876.0, ans=0.0 2024-09-22 16:01:49,066 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.630e+02 1.881e+02 2.229e+02 3.253e+02, threshold=3.762e+02, percent-clipped=0.0 2024-09-22 16:02:10,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=50969.333333333336, ans=0.125 2024-09-22 16:02:28,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=51016.0, ans=0.5 2024-09-22 16:02:38,747 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:02:41,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=51062.666666666664, ans=10.0 2024-09-22 16:02:43,157 INFO [train.py:1198] (0/4) Epoch 3, batch 3150, loss[loss=0.3189, ctc_loss=0.2389, cr_loss=0.3999, over 17363.00 frames. ], tot_loss[loss=0.3335, ctc_loss=0.2496, cr_loss=0.4192, over 3352097.47 frames. ], batch size: 48, lr: 3.15e-02, grad_scale: 32.0 2024-09-22 16:02:49,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=51062.666666666664, ans=0.125 2024-09-22 16:02:56,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.08 vs. limit=22.5 2024-09-22 16:02:59,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=51109.333333333336, ans=0.09899494936611666 2024-09-22 16:03:06,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=51109.333333333336, ans=0.0 2024-09-22 16:03:09,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=51109.333333333336, ans=0.025 2024-09-22 16:03:19,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=51156.0, ans=0.0 2024-09-22 16:03:36,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=51202.666666666664, ans=0.2 2024-09-22 16:03:44,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=51249.333333333336, ans=0.125 2024-09-22 16:03:54,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.91 vs. limit=22.5 2024-09-22 16:03:59,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=51249.333333333336, ans=0.0 2024-09-22 16:04:03,497 INFO [train.py:1198] (0/4) Epoch 3, batch 3200, loss[loss=0.318, ctc_loss=0.2319, cr_loss=0.4307, over 17173.00 frames. ], tot_loss[loss=0.3319, ctc_loss=0.2483, cr_loss=0.4179, over 3358923.05 frames. ], batch size: 45, lr: 3.15e-02, grad_scale: 32.0 2024-09-22 16:04:24,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=51342.666666666664, ans=0.0 2024-09-22 16:04:27,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51342.666666666664, ans=0.1 2024-09-22 16:04:30,052 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.332e+02 1.638e+02 1.865e+02 2.186e+02 5.181e+02, threshold=3.729e+02, percent-clipped=1.0 2024-09-22 16:04:36,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=51389.333333333336, ans=0.035 2024-09-22 16:04:38,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2024-09-22 16:04:52,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=51436.0, ans=0.125 2024-09-22 16:04:54,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.70 vs. limit=10.0 2024-09-22 16:04:58,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=51436.0, ans=0.0 2024-09-22 16:04:58,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=51436.0, ans=0.125 2024-09-22 16:05:01,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=51436.0, ans=0.125 2024-09-22 16:05:05,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=51482.666666666664, ans=0.2 2024-09-22 16:05:23,509 INFO [train.py:1198] (0/4) Epoch 3, batch 3250, loss[loss=0.3484, ctc_loss=0.2558, cr_loss=0.4631, over 17274.00 frames. ], tot_loss[loss=0.3326, ctc_loss=0.2491, cr_loss=0.4179, over 3340710.61 frames. ], batch size: 55, lr: 3.14e-02, grad_scale: 32.0 2024-09-22 16:05:25,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=51529.333333333336, ans=0.2 2024-09-22 16:05:31,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=51529.333333333336, ans=0.025 2024-09-22 16:05:36,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=51529.333333333336, ans=0.125 2024-09-22 16:06:43,311 INFO [train.py:1198] (0/4) Epoch 3, batch 3300, loss[loss=0.3112, ctc_loss=0.2285, cr_loss=0.4134, over 16738.00 frames. ], tot_loss[loss=0.3324, ctc_loss=0.2489, cr_loss=0.4177, over 3350868.08 frames. ], batch size: 37, lr: 3.14e-02, grad_scale: 32.0 2024-09-22 16:06:48,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=51762.666666666664, ans=0.0 2024-09-22 16:06:51,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=51762.666666666664, ans=0.2 2024-09-22 16:07:00,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=51809.333333333336, ans=0.0 2024-09-22 16:07:08,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51809.333333333336, ans=0.1 2024-09-22 16:07:09,976 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.328e+02 1.647e+02 1.953e+02 2.348e+02 4.155e+02, threshold=3.905e+02, percent-clipped=1.0 2024-09-22 16:07:16,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=51856.0, ans=0.125 2024-09-22 16:07:28,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=51902.666666666664, ans=0.125 2024-09-22 16:08:01,373 INFO [train.py:1198] (0/4) Epoch 3, batch 3350, loss[loss=0.2992, ctc_loss=0.2215, cr_loss=0.3881, over 17258.00 frames. ], tot_loss[loss=0.3317, ctc_loss=0.2483, cr_loss=0.4169, over 3357148.63 frames. ], batch size: 44, lr: 3.13e-02, grad_scale: 32.0 2024-09-22 16:08:07,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51996.0, ans=0.1 2024-09-22 16:08:18,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=52042.666666666664, ans=0.125 2024-09-22 16:08:24,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52042.666666666664, ans=0.1 2024-09-22 16:08:35,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=52089.333333333336, ans=0.0 2024-09-22 16:08:56,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.06 vs. limit=22.5 2024-09-22 16:09:10,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=52182.666666666664, ans=0.125 2024-09-22 16:09:11,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52182.666666666664, ans=0.1 2024-09-22 16:09:18,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2024-09-22 16:09:19,278 INFO [train.py:1198] (0/4) Epoch 3, batch 3400, loss[loss=0.3876, ctc_loss=0.2984, cr_loss=0.446, over 15157.00 frames. ], tot_loss[loss=0.3315, ctc_loss=0.248, cr_loss=0.4175, over 3354792.39 frames. ], batch size: 90, lr: 3.13e-02, grad_scale: 32.0 2024-09-22 16:09:24,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=52229.333333333336, ans=0.125 2024-09-22 16:09:41,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=52276.0, ans=0.0 2024-09-22 16:09:43,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52276.0, ans=0.1 2024-09-22 16:09:45,958 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.323e+02 1.677e+02 1.939e+02 2.464e+02 4.534e+02, threshold=3.878e+02, percent-clipped=3.0 2024-09-22 16:10:25,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52416.0, ans=0.1 2024-09-22 16:10:37,705 INFO [train.py:1198] (0/4) Epoch 3, batch 3450, loss[loss=0.3598, ctc_loss=0.2679, cr_loss=0.4597, over 17133.00 frames. ], tot_loss[loss=0.3314, ctc_loss=0.2478, cr_loss=0.4182, over 3355278.81 frames. ], batch size: 48, lr: 3.12e-02, grad_scale: 32.0 2024-09-22 16:10:38,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52462.666666666664, ans=0.1 2024-09-22 16:10:56,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=52509.333333333336, ans=0.2 2024-09-22 16:11:04,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=52509.333333333336, ans=0.125 2024-09-22 16:11:10,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52556.0, ans=0.1 2024-09-22 16:11:26,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=52602.666666666664, ans=0.2 2024-09-22 16:11:31,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=52602.666666666664, ans=0.0 2024-09-22 16:11:57,413 INFO [train.py:1198] (0/4) Epoch 3, batch 3500, loss[loss=0.3228, ctc_loss=0.246, cr_loss=0.3839, over 17193.00 frames. ], tot_loss[loss=0.3323, ctc_loss=0.2485, cr_loss=0.4188, over 3349451.20 frames. ], batch size: 55, lr: 3.12e-02, grad_scale: 32.0 2024-09-22 16:11:59,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52696.0, ans=0.1 2024-09-22 16:12:24,122 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.446e+02 1.808e+02 2.110e+02 2.675e+02 4.151e+02, threshold=4.220e+02, percent-clipped=2.0 2024-09-22 16:12:57,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=52836.0, ans=0.0 2024-09-22 16:13:15,558 INFO [train.py:1198] (0/4) Epoch 3, batch 3550, loss[loss=0.375, ctc_loss=0.2792, cr_loss=0.4794, over 17219.00 frames. ], tot_loss[loss=0.3321, ctc_loss=0.2484, cr_loss=0.4187, over 3350156.80 frames. ], batch size: 47, lr: 3.11e-02, grad_scale: 32.0 2024-09-22 16:13:34,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=52976.0, ans=0.125 2024-09-22 16:14:20,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=53116.0, ans=0.125 2024-09-22 16:14:20,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=53116.0, ans=0.025 2024-09-22 16:14:35,694 INFO [train.py:1198] (0/4) Epoch 3, batch 3600, loss[loss=0.4001, ctc_loss=0.3098, cr_loss=0.4515, over 14959.00 frames. ], tot_loss[loss=0.3333, ctc_loss=0.2495, cr_loss=0.419, over 3340095.14 frames. ], batch size: 89, lr: 3.11e-02, grad_scale: 32.0 2024-09-22 16:14:51,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-09-22 16:14:58,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=53209.333333333336, ans=0.2 2024-09-22 16:15:04,305 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.276e+02 1.804e+02 2.185e+02 2.634e+02 3.942e+02, threshold=4.371e+02, percent-clipped=0.0 2024-09-22 16:15:05,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=12.0 2024-09-22 16:15:39,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=53349.333333333336, ans=0.2 2024-09-22 16:15:40,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=53349.333333333336, ans=0.025 2024-09-22 16:15:46,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=53349.333333333336, ans=0.0 2024-09-22 16:15:57,760 INFO [train.py:1198] (0/4) Epoch 3, batch 3650, loss[loss=0.2913, ctc_loss=0.2182, cr_loss=0.3651, over 17124.00 frames. ], tot_loss[loss=0.3327, ctc_loss=0.249, cr_loss=0.4185, over 3344911.70 frames. ], batch size: 40, lr: 3.10e-02, grad_scale: 32.0 2024-09-22 16:16:20,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2024-09-22 16:17:12,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-09-22 16:17:16,109 INFO [train.py:1198] (0/4) Epoch 3, batch 3700, loss[loss=0.3697, ctc_loss=0.2883, cr_loss=0.4071, over 15200.00 frames. ], tot_loss[loss=0.3321, ctc_loss=0.2484, cr_loss=0.4188, over 3355041.61 frames. ], batch size: 89, lr: 3.10e-02, grad_scale: 32.0 2024-09-22 16:17:34,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=53676.0, ans=0.1 2024-09-22 16:17:42,218 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.264e+02 1.642e+02 1.878e+02 2.249e+02 4.018e+02, threshold=3.757e+02, percent-clipped=0.0 2024-09-22 16:17:55,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=53722.666666666664, ans=0.0 2024-09-22 16:17:59,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=53722.666666666664, ans=0.125 2024-09-22 16:18:34,048 INFO [train.py:1198] (0/4) Epoch 3, batch 3750, loss[loss=0.3836, ctc_loss=0.2882, cr_loss=0.4768, over 15970.00 frames. ], tot_loss[loss=0.3331, ctc_loss=0.2491, cr_loss=0.4197, over 3349130.69 frames. ], batch size: 74, lr: 3.10e-02, grad_scale: 32.0 2024-09-22 16:18:45,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=53862.666666666664, ans=0.125 2024-09-22 16:19:02,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=53909.333333333336, ans=0.125 2024-09-22 16:19:43,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=54049.333333333336, ans=0.125 2024-09-22 16:19:50,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=54096.0, ans=0.2 2024-09-22 16:19:51,720 INFO [train.py:1198] (0/4) Epoch 3, batch 3800, loss[loss=0.4039, ctc_loss=0.3197, cr_loss=0.421, over 11696.00 frames. ], tot_loss[loss=0.3357, ctc_loss=0.2517, cr_loss=0.4202, over 3320234.65 frames. ], batch size: 123, lr: 3.09e-02, grad_scale: 32.0 2024-09-22 16:20:17,985 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.396e+02 1.642e+02 1.883e+02 2.367e+02 4.025e+02, threshold=3.766e+02, percent-clipped=5.0 2024-09-22 16:20:19,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-22 16:20:30,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=54189.333333333336, ans=0.125 2024-09-22 16:20:37,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=54236.0, ans=0.2 2024-09-22 16:20:41,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=54236.0, ans=0.125 2024-09-22 16:20:52,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-09-22 16:21:10,250 INFO [train.py:1198] (0/4) Epoch 3, batch 3850, loss[loss=0.3351, ctc_loss=0.2458, cr_loss=0.4464, over 16893.00 frames. ], tot_loss[loss=0.3342, ctc_loss=0.2506, cr_loss=0.4183, over 3310598.45 frames. ], batch size: 58, lr: 3.09e-02, grad_scale: 64.0 2024-09-22 16:21:26,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2024-09-22 16:21:47,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=54422.666666666664, ans=0.0 2024-09-22 16:21:54,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2024-09-22 16:21:59,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=54469.333333333336, ans=0.025 2024-09-22 16:22:01,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=54469.333333333336, ans=0.0 2024-09-22 16:22:08,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=54469.333333333336, ans=0.125 2024-09-22 16:22:20,278 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-3.pt 2024-09-22 16:23:12,168 INFO [train.py:1198] (0/4) Epoch 4, batch 0, loss[loss=0.3629, ctc_loss=0.2724, cr_loss=0.4522, over 17223.00 frames. ], tot_loss[loss=0.3629, ctc_loss=0.2724, cr_loss=0.4522, over 17223.00 frames. ], batch size: 55, lr: 2.88e-02, grad_scale: 32.0 2024-09-22 16:23:12,169 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 16:23:27,759 INFO [train.py:1230] (0/4) Epoch 4, validation: loss=0.08466, ctc_loss=0.08466, cr_loss=9.003e-15, over 944034.00 frames. 2024-09-22 16:23:27,760 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 16:24:06,069 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.466e+02 1.944e+02 2.324e+02 2.751e+02 6.786e+02, threshold=4.649e+02, percent-clipped=3.0 2024-09-22 16:24:07,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=54637.333333333336, ans=0.125 2024-09-22 16:24:18,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=54684.0, ans=0.125 2024-09-22 16:24:20,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=54684.0, ans=0.0 2024-09-22 16:24:28,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54684.0, ans=0.1 2024-09-22 16:24:35,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-22 16:24:50,276 INFO [train.py:1198] (0/4) Epoch 4, batch 50, loss[loss=0.3456, ctc_loss=0.2561, cr_loss=0.4478, over 17207.00 frames. ], tot_loss[loss=0.3286, ctc_loss=0.2451, cr_loss=0.4176, over 756611.61 frames. ], batch size: 55, lr: 2.88e-02, grad_scale: 32.0 2024-09-22 16:24:53,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=54777.333333333336, ans=0.2 2024-09-22 16:25:15,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=54824.0, ans=0.125 2024-09-22 16:25:30,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-09-22 16:25:33,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=54870.666666666664, ans=0.125 2024-09-22 16:25:50,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=54917.333333333336, ans=0.2 2024-09-22 16:25:57,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-22 16:26:01,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=54964.0, ans=0.125 2024-09-22 16:26:06,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=54964.0, ans=0.0 2024-09-22 16:26:12,372 INFO [train.py:1198] (0/4) Epoch 4, batch 100, loss[loss=0.316, ctc_loss=0.2332, cr_loss=0.4142, over 17358.00 frames. ], tot_loss[loss=0.3264, ctc_loss=0.2432, cr_loss=0.4163, over 1333632.55 frames. ], batch size: 48, lr: 2.87e-02, grad_scale: 32.0 2024-09-22 16:26:12,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=55010.666666666664, ans=0.0 2024-09-22 16:26:23,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=55010.666666666664, ans=0.125 2024-09-22 16:26:44,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=55057.333333333336, ans=0.125 2024-09-22 16:26:50,367 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.359e+02 1.670e+02 1.866e+02 2.190e+02 3.249e+02, threshold=3.731e+02, percent-clipped=0.0 2024-09-22 16:27:25,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=55197.333333333336, ans=0.0 2024-09-22 16:27:29,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=55197.333333333336, ans=0.5 2024-09-22 16:27:35,225 INFO [train.py:1198] (0/4) Epoch 4, batch 150, loss[loss=0.4274, ctc_loss=0.3334, cr_loss=0.4703, over 15139.00 frames. ], tot_loss[loss=0.3289, ctc_loss=0.2449, cr_loss=0.4195, over 1778247.38 frames. ], batch size: 89, lr: 2.87e-02, grad_scale: 32.0 2024-09-22 16:27:48,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=55244.0, ans=0.125 2024-09-22 16:27:52,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=55290.666666666664, ans=0.09899494936611666 2024-09-22 16:28:05,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=55290.666666666664, ans=0.125 2024-09-22 16:28:12,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=55337.333333333336, ans=0.125 2024-09-22 16:28:20,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=55337.333333333336, ans=0.02 2024-09-22 16:28:32,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=55384.0, ans=0.125 2024-09-22 16:28:34,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=55384.0, ans=0.05 2024-09-22 16:28:58,936 INFO [train.py:1198] (0/4) Epoch 4, batch 200, loss[loss=0.3417, ctc_loss=0.2566, cr_loss=0.4256, over 17181.00 frames. ], tot_loss[loss=0.3258, ctc_loss=0.2425, cr_loss=0.4162, over 2131774.39 frames. ], batch size: 45, lr: 2.86e-02, grad_scale: 32.0 2024-09-22 16:29:02,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=55477.333333333336, ans=0.025 2024-09-22 16:29:33,618 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.559e+02 1.682e+02 1.957e+02 2.989e+02, threshold=3.363e+02, percent-clipped=0.0 2024-09-22 16:29:51,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=55617.333333333336, ans=10.0 2024-09-22 16:30:17,537 INFO [train.py:1198] (0/4) Epoch 4, batch 250, loss[loss=0.267, ctc_loss=0.1968, cr_loss=0.3513, over 16953.00 frames. ], tot_loss[loss=0.3261, ctc_loss=0.2427, cr_loss=0.4173, over 2407081.47 frames. ], batch size: 42, lr: 2.86e-02, grad_scale: 32.0 2024-09-22 16:30:17,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=55710.666666666664, ans=0.035 2024-09-22 16:30:32,908 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2024-09-22 16:30:51,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=55804.0, ans=0.125 2024-09-22 16:31:09,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2024-09-22 16:31:10,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=55850.666666666664, ans=0.0 2024-09-22 16:31:16,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.08 vs. limit=22.5 2024-09-22 16:31:25,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55897.333333333336, ans=0.1 2024-09-22 16:31:26,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=55897.333333333336, ans=0.125 2024-09-22 16:31:38,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=55897.333333333336, ans=0.5 2024-09-22 16:31:43,125 INFO [train.py:1198] (0/4) Epoch 4, batch 300, loss[loss=0.3162, ctc_loss=0.2314, cr_loss=0.4239, over 17091.00 frames. ], tot_loss[loss=0.3266, ctc_loss=0.2431, cr_loss=0.4173, over 2600969.25 frames. ], batch size: 49, lr: 2.86e-02, grad_scale: 32.0 2024-09-22 16:32:00,584 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-12000.pt 2024-09-22 16:32:20,153 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.659e+02 1.864e+02 2.226e+02 3.223e+02, threshold=3.728e+02, percent-clipped=0.0 2024-09-22 16:32:22,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=56037.333333333336, ans=0.125 2024-09-22 16:32:26,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-09-22 16:33:07,320 INFO [train.py:1198] (0/4) Epoch 4, batch 350, loss[loss=0.3368, ctc_loss=0.2523, cr_loss=0.4223, over 17046.00 frames. ], tot_loss[loss=0.3248, ctc_loss=0.2415, cr_loss=0.4164, over 2774134.37 frames. ], batch size: 52, lr: 2.85e-02, grad_scale: 32.0 2024-09-22 16:33:15,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=56177.333333333336, ans=0.0 2024-09-22 16:33:31,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2024-09-22 16:34:12,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2024-09-22 16:34:14,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=56364.0, ans=0.1 2024-09-22 16:34:29,677 INFO [train.py:1198] (0/4) Epoch 4, batch 400, loss[loss=0.3436, ctc_loss=0.2561, cr_loss=0.4378, over 16907.00 frames. ], tot_loss[loss=0.326, ctc_loss=0.2426, cr_loss=0.4167, over 2886640.70 frames. ], batch size: 58, lr: 2.85e-02, grad_scale: 32.0 2024-09-22 16:34:30,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-09-22 16:34:33,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2024-09-22 16:34:39,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=56410.666666666664, ans=0.2 2024-09-22 16:34:41,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=56410.666666666664, ans=0.09899494936611666 2024-09-22 16:34:50,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=56457.333333333336, ans=0.125 2024-09-22 16:35:04,962 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 1.655e+02 1.851e+02 2.343e+02 4.879e+02, threshold=3.703e+02, percent-clipped=2.0 2024-09-22 16:35:21,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=56550.666666666664, ans=0.125 2024-09-22 16:35:52,503 INFO [train.py:1198] (0/4) Epoch 4, batch 450, loss[loss=0.337, ctc_loss=0.2445, cr_loss=0.4626, over 17216.00 frames. ], tot_loss[loss=0.3274, ctc_loss=0.2438, cr_loss=0.4177, over 2991313.49 frames. ], batch size: 50, lr: 2.84e-02, grad_scale: 32.0 2024-09-22 16:35:59,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-09-22 16:36:27,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=56737.333333333336, ans=0.025 2024-09-22 16:36:31,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-09-22 16:36:32,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=56737.333333333336, ans=0.125 2024-09-22 16:36:39,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-22 16:36:41,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=56784.0, ans=0.125 2024-09-22 16:37:00,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=56830.666666666664, ans=0.0 2024-09-22 16:37:14,899 INFO [train.py:1198] (0/4) Epoch 4, batch 500, loss[loss=0.2954, ctc_loss=0.2213, cr_loss=0.3704, over 17308.00 frames. ], tot_loss[loss=0.327, ctc_loss=0.2434, cr_loss=0.4179, over 3079011.67 frames. ], batch size: 51, lr: 2.84e-02, grad_scale: 32.0 2024-09-22 16:37:29,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=56924.0, ans=0.0 2024-09-22 16:37:35,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=15.0 2024-09-22 16:37:53,151 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.629e+02 1.949e+02 2.165e+02 3.477e+02, threshold=3.897e+02, percent-clipped=0.0 2024-09-22 16:37:55,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=56970.666666666664, ans=0.0 2024-09-22 16:38:01,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=56970.666666666664, ans=0.125 2024-09-22 16:38:18,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=57017.333333333336, ans=0.0 2024-09-22 16:38:20,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-22 16:38:39,922 INFO [train.py:1198] (0/4) Epoch 4, batch 550, loss[loss=0.3265, ctc_loss=0.2436, cr_loss=0.4147, over 17238.00 frames. ], tot_loss[loss=0.3243, ctc_loss=0.2412, cr_loss=0.4154, over 3132734.60 frames. ], batch size: 50, lr: 2.83e-02, grad_scale: 32.0 2024-09-22 16:38:40,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=57110.666666666664, ans=0.5 2024-09-22 16:38:42,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-22 16:38:51,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2024-09-22 16:38:57,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=57157.333333333336, ans=0.125 2024-09-22 16:39:13,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=57204.0, ans=0.0 2024-09-22 16:39:26,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2024-09-22 16:39:31,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=57250.666666666664, ans=0.025 2024-09-22 16:39:59,407 INFO [train.py:1198] (0/4) Epoch 4, batch 600, loss[loss=0.3196, ctc_loss=0.2411, cr_loss=0.3922, over 16870.00 frames. ], tot_loss[loss=0.3234, ctc_loss=0.2403, cr_loss=0.4155, over 3194868.93 frames. ], batch size: 58, lr: 2.83e-02, grad_scale: 32.0 2024-09-22 16:40:23,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=57390.666666666664, ans=0.125 2024-09-22 16:40:23,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=57390.666666666664, ans=0.0 2024-09-22 16:40:34,388 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.589e+02 1.770e+02 2.208e+02 4.389e+02, threshold=3.540e+02, percent-clipped=1.0 2024-09-22 16:40:40,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57437.333333333336, ans=0.1 2024-09-22 16:40:41,101 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:40:41,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=57437.333333333336, ans=0.2 2024-09-22 16:40:48,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=57484.0, ans=0.125 2024-09-22 16:40:51,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=57484.0, ans=0.2 2024-09-22 16:40:53,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=57484.0, ans=0.125 2024-09-22 16:41:13,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=57530.666666666664, ans=0.1 2024-09-22 16:41:24,123 INFO [train.py:1198] (0/4) Epoch 4, batch 650, loss[loss=0.3464, ctc_loss=0.2565, cr_loss=0.4494, over 17354.00 frames. ], tot_loss[loss=0.3222, ctc_loss=0.2393, cr_loss=0.4148, over 3234942.33 frames. ], batch size: 48, lr: 2.83e-02, grad_scale: 32.0 2024-09-22 16:41:26,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=57577.333333333336, ans=0.0 2024-09-22 16:42:25,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=57764.0, ans=0.1 2024-09-22 16:42:32,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=57764.0, ans=0.025 2024-09-22 16:42:45,632 INFO [train.py:1198] (0/4) Epoch 4, batch 700, loss[loss=0.2983, ctc_loss=0.2156, cr_loss=0.4135, over 17318.00 frames. ], tot_loss[loss=0.3225, ctc_loss=0.2395, cr_loss=0.4147, over 3255063.93 frames. ], batch size: 46, lr: 2.82e-02, grad_scale: 32.0 2024-09-22 16:43:10,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=57857.333333333336, ans=0.125 2024-09-22 16:43:23,507 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.245e+02 1.612e+02 1.882e+02 2.294e+02 3.695e+02, threshold=3.764e+02, percent-clipped=3.0 2024-09-22 16:43:35,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=57950.666666666664, ans=0.0 2024-09-22 16:44:08,399 INFO [train.py:1198] (0/4) Epoch 4, batch 750, loss[loss=0.3276, ctc_loss=0.2471, cr_loss=0.4025, over 17127.00 frames. ], tot_loss[loss=0.3239, ctc_loss=0.2405, cr_loss=0.4172, over 3282649.52 frames. ], batch size: 48, lr: 2.82e-02, grad_scale: 32.0 2024-09-22 16:44:10,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=58044.0, ans=0.1 2024-09-22 16:44:14,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=58044.0, ans=0.0 2024-09-22 16:44:29,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=22.5 2024-09-22 16:44:35,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-22 16:44:41,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=58137.333333333336, ans=0.0 2024-09-22 16:44:41,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=58137.333333333336, ans=0.2 2024-09-22 16:44:43,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=58137.333333333336, ans=0.125 2024-09-22 16:44:44,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=58137.333333333336, ans=0.025 2024-09-22 16:44:54,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=58184.0, ans=0.1 2024-09-22 16:45:27,577 INFO [train.py:1198] (0/4) Epoch 4, batch 800, loss[loss=0.2804, ctc_loss=0.2041, cr_loss=0.3817, over 17248.00 frames. ], tot_loss[loss=0.3222, ctc_loss=0.2391, cr_loss=0.4158, over 3298310.30 frames. ], batch size: 44, lr: 2.81e-02, grad_scale: 32.0 2024-09-22 16:46:07,412 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.352e+02 1.727e+02 1.869e+02 2.216e+02 3.268e+02, threshold=3.738e+02, percent-clipped=0.0 2024-09-22 16:46:15,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=58370.666666666664, ans=0.2 2024-09-22 16:46:51,996 INFO [train.py:1198] (0/4) Epoch 4, batch 850, loss[loss=0.2915, ctc_loss=0.2091, cr_loss=0.4116, over 17012.00 frames. ], tot_loss[loss=0.3215, ctc_loss=0.2384, cr_loss=0.4157, over 3313140.56 frames. ], batch size: 39, lr: 2.81e-02, grad_scale: 32.0 2024-09-22 16:47:04,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2024-09-22 16:47:08,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=58557.333333333336, ans=0.0 2024-09-22 16:47:53,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58650.666666666664, ans=0.1 2024-09-22 16:48:11,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=12.0 2024-09-22 16:48:16,703 INFO [train.py:1198] (0/4) Epoch 4, batch 900, loss[loss=0.2857, ctc_loss=0.2049, cr_loss=0.4041, over 17049.00 frames. ], tot_loss[loss=0.3202, ctc_loss=0.2372, cr_loss=0.4148, over 3333489.53 frames. ], batch size: 39, lr: 2.81e-02, grad_scale: 32.0 2024-09-22 16:48:31,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=58790.666666666664, ans=0.125 2024-09-22 16:48:40,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=58790.666666666664, ans=15.0 2024-09-22 16:48:52,230 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.310e+02 1.664e+02 1.989e+02 2.596e+02 4.339e+02, threshold=3.979e+02, percent-clipped=2.0 2024-09-22 16:49:00,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=58837.333333333336, ans=0.125 2024-09-22 16:49:06,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=58884.0, ans=0.125 2024-09-22 16:49:18,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=58884.0, ans=0.07 2024-09-22 16:49:24,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=58930.666666666664, ans=0.125 2024-09-22 16:49:32,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=58930.666666666664, ans=0.025 2024-09-22 16:49:36,963 INFO [train.py:1198] (0/4) Epoch 4, batch 950, loss[loss=0.2549, ctc_loss=0.1871, cr_loss=0.3387, over 17244.00 frames. ], tot_loss[loss=0.3208, ctc_loss=0.2379, cr_loss=0.4147, over 3333664.07 frames. ], batch size: 44, lr: 2.80e-02, grad_scale: 32.0 2024-09-22 16:49:51,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=59024.0, ans=0.025 2024-09-22 16:50:21,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=59070.666666666664, ans=0.2 2024-09-22 16:50:29,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=59117.333333333336, ans=0.0 2024-09-22 16:50:29,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=59117.333333333336, ans=0.025 2024-09-22 16:51:01,451 INFO [train.py:1198] (0/4) Epoch 4, batch 1000, loss[loss=0.3242, ctc_loss=0.2404, cr_loss=0.4189, over 17347.00 frames. ], tot_loss[loss=0.3201, ctc_loss=0.2373, cr_loss=0.4141, over 3337568.99 frames. ], batch size: 48, lr: 2.80e-02, grad_scale: 32.0 2024-09-22 16:51:15,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=59257.333333333336, ans=0.0 2024-09-22 16:51:34,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=59304.0, ans=0.125 2024-09-22 16:51:36,100 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.266e+02 1.574e+02 1.735e+02 2.105e+02 3.870e+02, threshold=3.470e+02, percent-clipped=0.0 2024-09-22 16:51:57,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=59350.666666666664, ans=0.125 2024-09-22 16:51:58,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-09-22 16:52:02,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.73 vs. limit=10.0 2024-09-22 16:52:05,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2024-09-22 16:52:17,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=59397.333333333336, ans=0.025 2024-09-22 16:52:17,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=59397.333333333336, ans=10.0 2024-09-22 16:52:17,643 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:52:20,427 INFO [train.py:1198] (0/4) Epoch 4, batch 1050, loss[loss=0.2526, ctc_loss=0.1835, cr_loss=0.3453, over 17066.00 frames. ], tot_loss[loss=0.3195, ctc_loss=0.2367, cr_loss=0.4141, over 3345750.97 frames. ], batch size: 39, lr: 2.79e-02, grad_scale: 32.0 2024-09-22 16:52:55,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=59537.333333333336, ans=0.0 2024-09-22 16:53:07,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=59537.333333333336, ans=0.125 2024-09-22 16:53:15,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=59584.0, ans=0.0 2024-09-22 16:53:34,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=22.5 2024-09-22 16:53:40,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=59630.666666666664, ans=0.1 2024-09-22 16:53:44,832 INFO [train.py:1198] (0/4) Epoch 4, batch 1100, loss[loss=0.3646, ctc_loss=0.2707, cr_loss=0.4694, over 17009.00 frames. ], tot_loss[loss=0.3207, ctc_loss=0.2376, cr_loss=0.4156, over 3346708.22 frames. ], batch size: 56, lr: 2.79e-02, grad_scale: 32.0 2024-09-22 16:53:45,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=59677.333333333336, ans=0.0 2024-09-22 16:54:20,014 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.292e+02 1.576e+02 1.850e+02 2.274e+02 3.544e+02, threshold=3.699e+02, percent-clipped=1.0 2024-09-22 16:54:21,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59770.666666666664, ans=0.1 2024-09-22 16:54:26,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=59770.666666666664, ans=0.125 2024-09-22 16:54:52,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=59864.0, ans=0.125 2024-09-22 16:54:52,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=59864.0, ans=0.2 2024-09-22 16:54:53,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=59864.0, ans=0.2 2024-09-22 16:55:04,529 INFO [train.py:1198] (0/4) Epoch 4, batch 1150, loss[loss=0.3125, ctc_loss=0.2296, cr_loss=0.4144, over 17338.00 frames. ], tot_loss[loss=0.3195, ctc_loss=0.2367, cr_loss=0.4141, over 3358734.73 frames. ], batch size: 48, lr: 2.78e-02, grad_scale: 32.0 2024-09-22 16:55:23,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=59957.333333333336, ans=0.125 2024-09-22 16:56:05,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60050.666666666664, ans=0.1 2024-09-22 16:56:10,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=60050.666666666664, ans=0.125 2024-09-22 16:56:28,659 INFO [train.py:1198] (0/4) Epoch 4, batch 1200, loss[loss=0.318, ctc_loss=0.2361, cr_loss=0.4093, over 17151.00 frames. ], tot_loss[loss=0.3192, ctc_loss=0.2364, cr_loss=0.4142, over 3369510.08 frames. ], batch size: 48, lr: 2.78e-02, grad_scale: 32.0 2024-09-22 16:57:03,613 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.315e+02 1.578e+02 1.757e+02 2.030e+02 3.618e+02, threshold=3.514e+02, percent-clipped=0.0 2024-09-22 16:57:22,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2024-09-22 16:57:51,021 INFO [train.py:1198] (0/4) Epoch 4, batch 1250, loss[loss=0.3188, ctc_loss=0.2352, cr_loss=0.4182, over 17227.00 frames. ], tot_loss[loss=0.3182, ctc_loss=0.2356, cr_loss=0.4131, over 3376029.04 frames. ], batch size: 50, lr: 2.78e-02, grad_scale: 32.0 2024-09-22 16:57:58,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2024-09-22 16:58:05,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2024-09-22 16:58:12,089 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-22 16:58:12,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=60424.0, ans=0.2 2024-09-22 16:58:16,051 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 16:59:12,920 INFO [train.py:1198] (0/4) Epoch 4, batch 1300, loss[loss=0.3055, ctc_loss=0.2231, cr_loss=0.4119, over 17078.00 frames. ], tot_loss[loss=0.3182, ctc_loss=0.2355, cr_loss=0.4131, over 3367173.20 frames. ], batch size: 43, lr: 2.77e-02, grad_scale: 32.0 2024-09-22 16:59:13,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2024-09-22 16:59:39,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=60657.333333333336, ans=0.04949747468305833 2024-09-22 16:59:48,140 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.293e+02 1.594e+02 1.780e+02 2.076e+02 4.160e+02, threshold=3.560e+02, percent-clipped=3.0 2024-09-22 17:00:26,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.52 vs. limit=15.0 2024-09-22 17:00:29,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60797.333333333336, ans=0.1 2024-09-22 17:00:32,386 INFO [train.py:1198] (0/4) Epoch 4, batch 1350, loss[loss=0.3496, ctc_loss=0.2595, cr_loss=0.4508, over 16992.00 frames. ], tot_loss[loss=0.3165, ctc_loss=0.2342, cr_loss=0.4117, over 3367792.70 frames. ], batch size: 53, lr: 2.77e-02, grad_scale: 32.0 2024-09-22 17:00:32,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=60844.0, ans=0.125 2024-09-22 17:00:41,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=60844.0, ans=0.125 2024-09-22 17:00:44,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=60844.0, ans=0.0 2024-09-22 17:01:00,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=60890.666666666664, ans=0.1 2024-09-22 17:01:27,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=60984.0, ans=0.125 2024-09-22 17:01:44,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=61030.666666666664, ans=0.125 2024-09-22 17:01:57,166 INFO [train.py:1198] (0/4) Epoch 4, batch 1400, loss[loss=0.3263, ctc_loss=0.2413, cr_loss=0.4254, over 17152.00 frames. ], tot_loss[loss=0.316, ctc_loss=0.2337, cr_loss=0.4111, over 3375206.69 frames. ], batch size: 45, lr: 2.76e-02, grad_scale: 32.0 2024-09-22 17:02:15,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.95 vs. limit=10.0 2024-09-22 17:02:19,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=61124.0, ans=0.5 2024-09-22 17:02:34,721 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.284e+02 1.619e+02 1.850e+02 2.266e+02 3.949e+02, threshold=3.701e+02, percent-clipped=2.0 2024-09-22 17:03:20,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=61310.666666666664, ans=0.125 2024-09-22 17:03:22,330 INFO [train.py:1198] (0/4) Epoch 4, batch 1450, loss[loss=0.3023, ctc_loss=0.2212, cr_loss=0.4056, over 17046.00 frames. ], tot_loss[loss=0.3167, ctc_loss=0.2342, cr_loss=0.4127, over 3377455.15 frames. ], batch size: 52, lr: 2.76e-02, grad_scale: 32.0 2024-09-22 17:03:50,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=61357.333333333336, ans=0.2 2024-09-22 17:04:03,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=61404.0, ans=0.0 2024-09-22 17:04:06,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=61404.0, ans=0.125 2024-09-22 17:04:10,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=61450.666666666664, ans=0.125 2024-09-22 17:04:11,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=61450.666666666664, ans=0.0 2024-09-22 17:04:39,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=61497.333333333336, ans=0.1 2024-09-22 17:04:41,898 INFO [train.py:1198] (0/4) Epoch 4, batch 1500, loss[loss=0.3448, ctc_loss=0.2545, cr_loss=0.4517, over 16911.00 frames. ], tot_loss[loss=0.3174, ctc_loss=0.2347, cr_loss=0.4135, over 3376073.62 frames. ], batch size: 58, lr: 2.76e-02, grad_scale: 32.0 2024-09-22 17:04:52,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2024-09-22 17:04:54,960 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:05:17,065 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 1.547e+02 1.744e+02 2.028e+02 3.491e+02, threshold=3.489e+02, percent-clipped=0.0 2024-09-22 17:05:35,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=61684.0, ans=0.125 2024-09-22 17:05:37,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=61684.0, ans=0.0 2024-09-22 17:06:06,038 INFO [train.py:1198] (0/4) Epoch 4, batch 1550, loss[loss=0.3353, ctc_loss=0.2537, cr_loss=0.4081, over 17045.00 frames. ], tot_loss[loss=0.316, ctc_loss=0.2337, cr_loss=0.4117, over 3368151.10 frames. ], batch size: 52, lr: 2.75e-02, grad_scale: 32.0 2024-09-22 17:06:23,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=61824.0, ans=0.0 2024-09-22 17:06:28,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=61824.0, ans=0.125 2024-09-22 17:07:07,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.58 vs. limit=22.5 2024-09-22 17:07:27,835 INFO [train.py:1198] (0/4) Epoch 4, batch 1600, loss[loss=0.3377, ctc_loss=0.244, cr_loss=0.4688, over 17348.00 frames. ], tot_loss[loss=0.3164, ctc_loss=0.234, cr_loss=0.412, over 3367210.36 frames. ], batch size: 48, lr: 2.75e-02, grad_scale: 32.0 2024-09-22 17:07:28,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=62010.666666666664, ans=0.025 2024-09-22 17:08:05,556 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.293e+02 1.652e+02 1.885e+02 2.249e+02 4.170e+02, threshold=3.770e+02, percent-clipped=2.0 2024-09-22 17:08:18,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=62150.666666666664, ans=0.1 2024-09-22 17:08:26,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=62150.666666666664, ans=0.0 2024-09-22 17:08:47,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=22.5 2024-09-22 17:08:50,127 INFO [train.py:1198] (0/4) Epoch 4, batch 1650, loss[loss=0.3647, ctc_loss=0.2783, cr_loss=0.4319, over 17367.00 frames. ], tot_loss[loss=0.3167, ctc_loss=0.2343, cr_loss=0.412, over 3371600.26 frames. ], batch size: 48, lr: 2.75e-02, grad_scale: 32.0 2024-09-22 17:09:31,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=62337.333333333336, ans=0.125 2024-09-22 17:09:38,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=12.0 2024-09-22 17:09:58,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=62430.666666666664, ans=0.1 2024-09-22 17:10:00,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=62430.666666666664, ans=0.0 2024-09-22 17:10:09,474 INFO [train.py:1198] (0/4) Epoch 4, batch 1700, loss[loss=0.2922, ctc_loss=0.2178, cr_loss=0.372, over 16033.00 frames. ], tot_loss[loss=0.3168, ctc_loss=0.2345, cr_loss=0.4116, over 3374751.34 frames. ], batch size: 74, lr: 2.74e-02, grad_scale: 32.0 2024-09-22 17:10:33,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=62524.0, ans=0.125 2024-09-22 17:10:49,657 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.211e+02 1.564e+02 1.856e+02 2.208e+02 3.257e+02, threshold=3.711e+02, percent-clipped=0.0 2024-09-22 17:11:04,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=62617.333333333336, ans=0.125 2024-09-22 17:11:16,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=62664.0, ans=0.2 2024-09-22 17:11:20,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=62664.0, ans=0.0 2024-09-22 17:11:34,081 INFO [train.py:1198] (0/4) Epoch 4, batch 1750, loss[loss=0.3154, ctc_loss=0.2307, cr_loss=0.4232, over 17253.00 frames. ], tot_loss[loss=0.3169, ctc_loss=0.2347, cr_loss=0.4108, over 3359686.72 frames. ], batch size: 44, lr: 2.74e-02, grad_scale: 32.0 2024-09-22 17:11:38,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-09-22 17:11:50,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-09-22 17:11:55,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=62757.333333333336, ans=0.125 2024-09-22 17:11:58,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=62757.333333333336, ans=0.125 2024-09-22 17:12:07,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=62804.0, ans=0.1 2024-09-22 17:12:26,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=62850.666666666664, ans=0.0 2024-09-22 17:12:43,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=62897.333333333336, ans=0.125 2024-09-22 17:12:49,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=62897.333333333336, ans=0.2 2024-09-22 17:12:57,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=62944.0, ans=0.0 2024-09-22 17:12:58,854 INFO [train.py:1198] (0/4) Epoch 4, batch 1800, loss[loss=0.3123, ctc_loss=0.2388, cr_loss=0.3673, over 16957.00 frames. ], tot_loss[loss=0.3155, ctc_loss=0.2336, cr_loss=0.4095, over 3357778.35 frames. ], batch size: 42, lr: 2.73e-02, grad_scale: 32.0 2024-09-22 17:13:05,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=62944.0, ans=0.125 2024-09-22 17:13:25,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=62990.666666666664, ans=0.125 2024-09-22 17:13:33,471 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 1.628e+02 1.997e+02 2.584e+02 3.622e+02, threshold=3.995e+02, percent-clipped=0.0 2024-09-22 17:13:33,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=63037.333333333336, ans=0.125 2024-09-22 17:13:35,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=63037.333333333336, ans=0.125 2024-09-22 17:14:12,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2024-09-22 17:14:17,813 INFO [train.py:1198] (0/4) Epoch 4, batch 1850, loss[loss=0.3191, ctc_loss=0.2343, cr_loss=0.4239, over 17198.00 frames. ], tot_loss[loss=0.3171, ctc_loss=0.2347, cr_loss=0.4117, over 3348560.58 frames. ], batch size: 47, lr: 2.73e-02, grad_scale: 32.0 2024-09-22 17:14:18,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=63177.333333333336, ans=0.2 2024-09-22 17:15:28,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=63364.0, ans=0.125 2024-09-22 17:15:39,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2024-09-22 17:15:41,412 INFO [train.py:1198] (0/4) Epoch 4, batch 1900, loss[loss=0.3712, ctc_loss=0.279, cr_loss=0.4612, over 17230.00 frames. ], tot_loss[loss=0.3188, ctc_loss=0.2361, cr_loss=0.4137, over 3356714.71 frames. ], batch size: 55, lr: 2.73e-02, grad_scale: 32.0 2024-09-22 17:15:47,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=63410.666666666664, ans=0.125 2024-09-22 17:16:16,407 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.329e+02 1.606e+02 1.832e+02 2.161e+02 3.717e+02, threshold=3.664e+02, percent-clipped=0.0 2024-09-22 17:16:21,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=63504.0, ans=0.025 2024-09-22 17:16:47,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=63597.333333333336, ans=0.125 2024-09-22 17:17:01,344 INFO [train.py:1198] (0/4) Epoch 4, batch 1950, loss[loss=0.2876, ctc_loss=0.2127, cr_loss=0.3747, over 17186.00 frames. ], tot_loss[loss=0.3185, ctc_loss=0.2359, cr_loss=0.4132, over 3355989.48 frames. ], batch size: 41, lr: 2.72e-02, grad_scale: 32.0 2024-09-22 17:17:04,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=63644.0, ans=0.125 2024-09-22 17:17:26,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=63690.666666666664, ans=0.125 2024-09-22 17:17:30,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=63690.666666666664, ans=10.0 2024-09-22 17:17:30,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=63690.666666666664, ans=0.025 2024-09-22 17:17:35,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2024-09-22 17:17:54,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=63784.0, ans=0.0 2024-09-22 17:18:12,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-09-22 17:18:26,132 INFO [train.py:1198] (0/4) Epoch 4, batch 2000, loss[loss=0.3939, ctc_loss=0.3064, cr_loss=0.4376, over 11556.00 frames. ], tot_loss[loss=0.3176, ctc_loss=0.235, cr_loss=0.413, over 3358866.59 frames. ], batch size: 123, lr: 2.72e-02, grad_scale: 64.0 2024-09-22 17:18:41,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=63924.0, ans=0.125 2024-09-22 17:18:44,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=63924.0, ans=0.125 2024-09-22 17:18:46,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2024-09-22 17:18:48,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=63924.0, ans=0.0 2024-09-22 17:19:01,203 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.280e+02 1.569e+02 1.789e+02 2.368e+02 3.802e+02, threshold=3.577e+02, percent-clipped=1.0 2024-09-22 17:19:12,685 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:19:17,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=64017.333333333336, ans=0.0 2024-09-22 17:19:44,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=64110.666666666664, ans=0.0 2024-09-22 17:19:45,618 INFO [train.py:1198] (0/4) Epoch 4, batch 2050, loss[loss=0.3158, ctc_loss=0.2343, cr_loss=0.4075, over 17288.00 frames. ], tot_loss[loss=0.3169, ctc_loss=0.2344, cr_loss=0.4121, over 3359057.78 frames. ], batch size: 49, lr: 2.71e-02, grad_scale: 64.0 2024-09-22 17:19:57,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=12.0 2024-09-22 17:20:13,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2024-09-22 17:20:36,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=64250.666666666664, ans=0.125 2024-09-22 17:20:37,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=64250.666666666664, ans=0.0 2024-09-22 17:20:39,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=64250.666666666664, ans=0.5 2024-09-22 17:20:45,675 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:20:52,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=64297.333333333336, ans=0.125 2024-09-22 17:20:56,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=64297.333333333336, ans=0.0 2024-09-22 17:21:07,690 INFO [train.py:1198] (0/4) Epoch 4, batch 2100, loss[loss=0.3085, ctc_loss=0.2275, cr_loss=0.4048, over 17137.00 frames. ], tot_loss[loss=0.3158, ctc_loss=0.2335, cr_loss=0.4113, over 3364552.85 frames. ], batch size: 48, lr: 2.71e-02, grad_scale: 32.0 2024-09-22 17:21:11,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=64344.0, ans=0.0 2024-09-22 17:21:25,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=64390.666666666664, ans=0.0 2024-09-22 17:21:41,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=64437.333333333336, ans=0.2 2024-09-22 17:21:44,600 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.356e+02 1.633e+02 1.973e+02 2.304e+02 3.408e+02, threshold=3.946e+02, percent-clipped=0.0 2024-09-22 17:21:59,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.48 vs. limit=15.0 2024-09-22 17:22:00,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=64484.0, ans=0.125 2024-09-22 17:22:03,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=64484.0, ans=0.125 2024-09-22 17:22:26,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=64530.666666666664, ans=0.2 2024-09-22 17:22:33,063 INFO [train.py:1198] (0/4) Epoch 4, batch 2150, loss[loss=0.3313, ctc_loss=0.248, cr_loss=0.4169, over 17137.00 frames. ], tot_loss[loss=0.3175, ctc_loss=0.2347, cr_loss=0.414, over 3359872.88 frames. ], batch size: 48, lr: 2.71e-02, grad_scale: 32.0 2024-09-22 17:23:06,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=64670.666666666664, ans=0.0 2024-09-22 17:23:17,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=64670.666666666664, ans=0.125 2024-09-22 17:23:44,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-22 17:23:48,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2024-09-22 17:23:51,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=64810.666666666664, ans=0.125 2024-09-22 17:23:52,468 INFO [train.py:1198] (0/4) Epoch 4, batch 2200, loss[loss=0.3699, ctc_loss=0.2797, cr_loss=0.4511, over 17031.00 frames. ], tot_loss[loss=0.3172, ctc_loss=0.2344, cr_loss=0.4144, over 3369137.12 frames. ], batch size: 52, lr: 2.70e-02, grad_scale: 32.0 2024-09-22 17:23:59,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64810.666666666664, ans=0.1 2024-09-22 17:24:29,089 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.320e+02 1.691e+02 2.009e+02 2.410e+02 3.639e+02, threshold=4.017e+02, percent-clipped=0.0 2024-09-22 17:24:31,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=64904.0, ans=0.125 2024-09-22 17:24:57,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=64997.333333333336, ans=0.07 2024-09-22 17:24:59,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=64997.333333333336, ans=0.125 2024-09-22 17:25:11,835 INFO [train.py:1198] (0/4) Epoch 4, batch 2250, loss[loss=0.3328, ctc_loss=0.2488, cr_loss=0.4199, over 17305.00 frames. ], tot_loss[loss=0.3166, ctc_loss=0.2338, cr_loss=0.414, over 3363252.77 frames. ], batch size: 51, lr: 2.70e-02, grad_scale: 32.0 2024-09-22 17:25:25,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=65044.0, ans=0.125 2024-09-22 17:25:48,171 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:25:51,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=65137.333333333336, ans=0.2 2024-09-22 17:26:02,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65184.0, ans=0.1 2024-09-22 17:26:33,875 INFO [train.py:1198] (0/4) Epoch 4, batch 2300, loss[loss=0.3372, ctc_loss=0.2497, cr_loss=0.4374, over 17219.00 frames. ], tot_loss[loss=0.3165, ctc_loss=0.2336, cr_loss=0.4142, over 3366856.17 frames. ], batch size: 50, lr: 2.70e-02, grad_scale: 32.0 2024-09-22 17:26:43,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65277.333333333336, ans=0.1 2024-09-22 17:26:52,229 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:27:02,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-09-22 17:27:09,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.43 vs. limit=15.0 2024-09-22 17:27:13,235 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.343e+02 1.664e+02 1.971e+02 2.314e+02 3.882e+02, threshold=3.942e+02, percent-clipped=0.0 2024-09-22 17:27:58,575 INFO [train.py:1198] (0/4) Epoch 4, batch 2350, loss[loss=0.3148, ctc_loss=0.2289, cr_loss=0.4292, over 17319.00 frames. ], tot_loss[loss=0.3166, ctc_loss=0.2337, cr_loss=0.4147, over 3369980.64 frames. ], batch size: 51, lr: 2.69e-02, grad_scale: 32.0 2024-09-22 17:28:38,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=65604.0, ans=0.07 2024-09-22 17:28:42,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=65604.0, ans=0.125 2024-09-22 17:29:18,257 INFO [train.py:1198] (0/4) Epoch 4, batch 2400, loss[loss=0.3291, ctc_loss=0.2469, cr_loss=0.4108, over 17090.00 frames. ], tot_loss[loss=0.3181, ctc_loss=0.235, cr_loss=0.4158, over 3364310.24 frames. ], batch size: 43, lr: 2.69e-02, grad_scale: 32.0 2024-09-22 17:29:23,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=22.5 2024-09-22 17:29:54,855 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.273e+02 1.593e+02 1.794e+02 2.177e+02 3.793e+02, threshold=3.589e+02, percent-clipped=0.0 2024-09-22 17:30:00,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=65837.33333333333, ans=0.2 2024-09-22 17:30:01,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=65837.33333333333, ans=0.05 2024-09-22 17:30:24,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=65930.66666666667, ans=0.125 2024-09-22 17:30:31,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=65930.66666666667, ans=0.2 2024-09-22 17:30:31,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=65930.66666666667, ans=0.95 2024-09-22 17:30:35,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=65930.66666666667, ans=0.025 2024-09-22 17:30:40,224 INFO [train.py:1198] (0/4) Epoch 4, batch 2450, loss[loss=0.285, ctc_loss=0.2156, cr_loss=0.3472, over 17020.00 frames. ], tot_loss[loss=0.3156, ctc_loss=0.233, cr_loss=0.4134, over 3372983.90 frames. ], batch size: 44, lr: 2.68e-02, grad_scale: 32.0 2024-09-22 17:30:54,859 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:31:44,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=66164.0, ans=0.07 2024-09-22 17:31:45,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=66164.0, ans=0.125 2024-09-22 17:31:57,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=66164.0, ans=0.125 2024-09-22 17:32:02,193 INFO [train.py:1198] (0/4) Epoch 4, batch 2500, loss[loss=0.3098, ctc_loss=0.2294, cr_loss=0.4019, over 17225.00 frames. ], tot_loss[loss=0.3149, ctc_loss=0.2324, cr_loss=0.4126, over 3377321.53 frames. ], batch size: 47, lr: 2.68e-02, grad_scale: 32.0 2024-09-22 17:32:41,650 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.315e+02 1.715e+02 1.996e+02 2.438e+02 3.886e+02, threshold=3.992e+02, percent-clipped=3.0 2024-09-22 17:32:57,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66350.66666666667, ans=0.1 2024-09-22 17:32:57,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66350.66666666667, ans=0.1 2024-09-22 17:32:59,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=12.0 2024-09-22 17:33:02,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=66350.66666666667, ans=0.125 2024-09-22 17:33:24,681 INFO [train.py:1198] (0/4) Epoch 4, batch 2550, loss[loss=0.2678, ctc_loss=0.1938, cr_loss=0.3704, over 17057.00 frames. ], tot_loss[loss=0.315, ctc_loss=0.2326, cr_loss=0.4122, over 3370950.99 frames. ], batch size: 39, lr: 2.68e-02, grad_scale: 32.0 2024-09-22 17:33:28,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2024-09-22 17:33:49,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2024-09-22 17:34:03,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=66537.33333333333, ans=0.09899494936611666 2024-09-22 17:34:44,461 INFO [train.py:1198] (0/4) Epoch 4, batch 2600, loss[loss=0.3007, ctc_loss=0.2212, cr_loss=0.3973, over 17148.00 frames. ], tot_loss[loss=0.315, ctc_loss=0.2328, cr_loss=0.4108, over 3346310.81 frames. ], batch size: 48, lr: 2.67e-02, grad_scale: 32.0 2024-09-22 17:35:24,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=66770.66666666667, ans=0.125 2024-09-22 17:35:25,831 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.648e+02 1.831e+02 2.212e+02 5.606e+02, threshold=3.662e+02, percent-clipped=1.0 2024-09-22 17:35:48,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=66817.33333333333, ans=0.125 2024-09-22 17:35:48,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=66817.33333333333, ans=0.125 2024-09-22 17:35:59,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=66864.0, ans=0.2 2024-09-22 17:36:05,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=66864.0, ans=0.0 2024-09-22 17:36:08,652 INFO [train.py:1198] (0/4) Epoch 4, batch 2650, loss[loss=0.3047, ctc_loss=0.2237, cr_loss=0.4048, over 17176.00 frames. ], tot_loss[loss=0.3162, ctc_loss=0.234, cr_loss=0.4113, over 3335786.02 frames. ], batch size: 45, lr: 2.67e-02, grad_scale: 32.0 2024-09-22 17:36:55,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67004.0, ans=0.1 2024-09-22 17:37:07,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=67050.66666666667, ans=0.125 2024-09-22 17:37:09,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=67050.66666666667, ans=0.0 2024-09-22 17:37:15,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=67097.33333333333, ans=0.025 2024-09-22 17:37:32,621 INFO [train.py:1198] (0/4) Epoch 4, batch 2700, loss[loss=0.3506, ctc_loss=0.2626, cr_loss=0.4397, over 17000.00 frames. ], tot_loss[loss=0.3166, ctc_loss=0.2342, cr_loss=0.4119, over 3340823.27 frames. ], batch size: 53, lr: 2.67e-02, grad_scale: 32.0 2024-09-22 17:38:09,013 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.345e+02 1.678e+02 1.945e+02 2.373e+02 3.767e+02, threshold=3.890e+02, percent-clipped=1.0 2024-09-22 17:38:12,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=67237.33333333333, ans=0.95 2024-09-22 17:38:15,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67237.33333333333, ans=0.1 2024-09-22 17:38:31,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=67284.0, ans=0.125 2024-09-22 17:38:52,041 INFO [train.py:1198] (0/4) Epoch 4, batch 2750, loss[loss=0.3106, ctc_loss=0.2275, cr_loss=0.4152, over 17262.00 frames. ], tot_loss[loss=0.3163, ctc_loss=0.2339, cr_loss=0.412, over 3345169.84 frames. ], batch size: 44, lr: 2.66e-02, grad_scale: 32.0 2024-09-22 17:40:02,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2024-09-22 17:40:15,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=67610.66666666667, ans=0.0 2024-09-22 17:40:17,122 INFO [train.py:1198] (0/4) Epoch 4, batch 2800, loss[loss=0.3033, ctc_loss=0.2188, cr_loss=0.4228, over 17300.00 frames. ], tot_loss[loss=0.3145, ctc_loss=0.2324, cr_loss=0.4106, over 3343451.94 frames. ], batch size: 49, lr: 2.66e-02, grad_scale: 32.0 2024-09-22 17:40:21,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=67610.66666666667, ans=0.0 2024-09-22 17:40:53,564 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.604e+02 1.847e+02 2.270e+02 3.677e+02, threshold=3.693e+02, percent-clipped=0.0 2024-09-22 17:41:33,167 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:41:38,545 INFO [train.py:1198] (0/4) Epoch 4, batch 2850, loss[loss=0.3006, ctc_loss=0.2203, cr_loss=0.4017, over 17311.00 frames. ], tot_loss[loss=0.3135, ctc_loss=0.2315, cr_loss=0.4098, over 3341661.71 frames. ], batch size: 51, lr: 2.65e-02, grad_scale: 32.0 2024-09-22 17:41:41,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=67844.0, ans=0.125 2024-09-22 17:42:20,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2024-09-22 17:42:43,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=68030.66666666667, ans=0.0 2024-09-22 17:42:45,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=68030.66666666667, ans=0.125 2024-09-22 17:42:51,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2024-09-22 17:43:00,659 INFO [train.py:1198] (0/4) Epoch 4, batch 2900, loss[loss=0.2866, ctc_loss=0.2077, cr_loss=0.3949, over 16965.00 frames. ], tot_loss[loss=0.3153, ctc_loss=0.2332, cr_loss=0.4108, over 3328998.34 frames. ], batch size: 42, lr: 2.65e-02, grad_scale: 32.0 2024-09-22 17:43:01,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=68077.33333333333, ans=0.025 2024-09-22 17:43:05,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=68077.33333333333, ans=0.125 2024-09-22 17:43:28,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=68124.0, ans=0.2 2024-09-22 17:43:37,713 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.237e+02 1.627e+02 1.915e+02 2.361e+02 4.224e+02, threshold=3.831e+02, percent-clipped=1.0 2024-09-22 17:44:11,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=68264.0, ans=0.0 2024-09-22 17:44:20,378 INFO [train.py:1198] (0/4) Epoch 4, batch 2950, loss[loss=0.2705, ctc_loss=0.198, cr_loss=0.3624, over 16968.00 frames. ], tot_loss[loss=0.3141, ctc_loss=0.2321, cr_loss=0.4096, over 3337990.16 frames. ], batch size: 42, lr: 2.65e-02, grad_scale: 32.0 2024-09-22 17:44:33,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=68310.66666666667, ans=0.125 2024-09-22 17:44:41,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=68357.33333333333, ans=0.0 2024-09-22 17:44:54,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=68357.33333333333, ans=0.0 2024-09-22 17:44:56,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=68404.0, ans=0.125 2024-09-22 17:44:59,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=68404.0, ans=0.125 2024-09-22 17:45:20,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2024-09-22 17:45:32,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=68497.33333333333, ans=0.125 2024-09-22 17:45:44,886 INFO [train.py:1198] (0/4) Epoch 4, batch 3000, loss[loss=0.3222, ctc_loss=0.2331, cr_loss=0.445, over 16041.00 frames. ], tot_loss[loss=0.3139, ctc_loss=0.232, cr_loss=0.4093, over 3331448.89 frames. ], batch size: 74, lr: 2.64e-02, grad_scale: 32.0 2024-09-22 17:45:44,887 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 17:45:56,852 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3948, 5.1970, 4.5829, 5.1280], device='cuda:0') 2024-09-22 17:46:00,414 INFO [train.py:1230] (0/4) Epoch 4, validation: loss=0.07263, ctc_loss=0.07263, cr_loss=7.17e-15, over 944034.00 frames. 2024-09-22 17:46:00,415 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 17:46:36,347 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.618e+02 1.808e+02 2.126e+02 4.273e+02, threshold=3.616e+02, percent-clipped=2.0 2024-09-22 17:47:02,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=68730.66666666667, ans=0.0 2024-09-22 17:47:18,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=15.0 2024-09-22 17:47:19,107 INFO [train.py:1198] (0/4) Epoch 4, batch 3050, loss[loss=0.3362, ctc_loss=0.2451, cr_loss=0.4554, over 17221.00 frames. ], tot_loss[loss=0.3143, ctc_loss=0.2322, cr_loss=0.4105, over 3339260.20 frames. ], batch size: 55, lr: 2.64e-02, grad_scale: 32.0 2024-09-22 17:47:27,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=68777.33333333333, ans=0.125 2024-09-22 17:47:31,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=68777.33333333333, ans=0.07 2024-09-22 17:47:37,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68824.0, ans=0.1 2024-09-22 17:47:45,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=68824.0, ans=0.125 2024-09-22 17:47:46,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2024-09-22 17:48:04,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=68870.66666666667, ans=0.025 2024-09-22 17:48:04,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=68870.66666666667, ans=0.0 2024-09-22 17:48:12,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-09-22 17:48:38,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68964.0, ans=0.1 2024-09-22 17:48:43,001 INFO [train.py:1198] (0/4) Epoch 4, batch 3100, loss[loss=0.3027, ctc_loss=0.2211, cr_loss=0.4083, over 17006.00 frames. ], tot_loss[loss=0.314, ctc_loss=0.2318, cr_loss=0.4107, over 3335477.75 frames. ], batch size: 51, lr: 2.64e-02, grad_scale: 32.0 2024-09-22 17:49:02,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=69057.33333333333, ans=0.025 2024-09-22 17:49:05,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=69057.33333333333, ans=0.125 2024-09-22 17:49:13,486 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:49:17,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=69104.0, ans=0.0 2024-09-22 17:49:19,309 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.276e+02 1.562e+02 1.773e+02 2.272e+02 4.016e+02, threshold=3.545e+02, percent-clipped=1.0 2024-09-22 17:49:39,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69150.66666666667, ans=0.125 2024-09-22 17:50:00,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=69244.0, ans=0.025 2024-09-22 17:50:00,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-09-22 17:50:01,898 INFO [train.py:1198] (0/4) Epoch 4, batch 3150, loss[loss=0.3177, ctc_loss=0.2325, cr_loss=0.4264, over 15899.00 frames. ], tot_loss[loss=0.3127, ctc_loss=0.2307, cr_loss=0.4101, over 3335615.16 frames. ], batch size: 74, lr: 2.63e-02, grad_scale: 32.0 2024-09-22 17:50:02,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=69244.0, ans=0.2 2024-09-22 17:50:08,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-09-22 17:50:10,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2024-09-22 17:50:15,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69290.66666666667, ans=0.1 2024-09-22 17:50:31,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=69337.33333333333, ans=0.125 2024-09-22 17:50:35,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2024-09-22 17:50:41,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2024-09-22 17:50:59,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=69384.0, ans=0.125 2024-09-22 17:51:04,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-09-22 17:51:10,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=69430.66666666667, ans=0.0 2024-09-22 17:51:19,754 INFO [train.py:1198] (0/4) Epoch 4, batch 3200, loss[loss=0.269, ctc_loss=0.194, cr_loss=0.3748, over 17048.00 frames. ], tot_loss[loss=0.3125, ctc_loss=0.2305, cr_loss=0.4098, over 3338398.45 frames. ], batch size: 39, lr: 2.63e-02, grad_scale: 32.0 2024-09-22 17:51:20,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=69477.33333333333, ans=0.1 2024-09-22 17:51:48,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=69524.0, ans=0.0 2024-09-22 17:51:55,751 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.313e+02 1.584e+02 1.798e+02 2.186e+02 3.575e+02, threshold=3.596e+02, percent-clipped=1.0 2024-09-22 17:52:13,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=69617.33333333333, ans=0.07 2024-09-22 17:52:22,695 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:52:37,933 INFO [train.py:1198] (0/4) Epoch 4, batch 3250, loss[loss=0.3264, ctc_loss=0.2435, cr_loss=0.4144, over 17271.00 frames. ], tot_loss[loss=0.3129, ctc_loss=0.2308, cr_loss=0.4107, over 3345755.90 frames. ], batch size: 44, lr: 2.63e-02, grad_scale: 32.0 2024-09-22 17:53:11,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.65 vs. limit=22.5 2024-09-22 17:53:25,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=69850.66666666667, ans=0.0 2024-09-22 17:53:25,244 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 17:53:32,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=69850.66666666667, ans=0.1 2024-09-22 17:53:54,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=69944.0, ans=0.125 2024-09-22 17:53:56,010 INFO [train.py:1198] (0/4) Epoch 4, batch 3300, loss[loss=0.2758, ctc_loss=0.1992, cr_loss=0.3829, over 17063.00 frames. ], tot_loss[loss=0.312, ctc_loss=0.23, cr_loss=0.41, over 3350791.12 frames. ], batch size: 39, lr: 2.62e-02, grad_scale: 32.0 2024-09-22 17:53:57,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=69944.0, ans=0.125 2024-09-22 17:54:10,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=69990.66666666667, ans=0.07 2024-09-22 17:54:32,126 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 1.544e+02 1.731e+02 2.121e+02 3.523e+02, threshold=3.462e+02, percent-clipped=0.0 2024-09-22 17:54:54,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=70084.0, ans=0.125 2024-09-22 17:55:14,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=70130.66666666667, ans=0.0 2024-09-22 17:55:19,214 INFO [train.py:1198] (0/4) Epoch 4, batch 3350, loss[loss=0.3035, ctc_loss=0.2215, cr_loss=0.4096, over 17314.00 frames. ], tot_loss[loss=0.3134, ctc_loss=0.2312, cr_loss=0.4112, over 3346709.93 frames. ], batch size: 46, lr: 2.62e-02, grad_scale: 32.0 2024-09-22 17:55:19,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=70177.33333333333, ans=0.125 2024-09-22 17:55:25,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=70177.33333333333, ans=0.125 2024-09-22 17:55:43,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=70224.0, ans=0.125 2024-09-22 17:55:48,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-09-22 17:56:18,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=70317.33333333333, ans=0.1 2024-09-22 17:56:31,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=70364.0, ans=0.125 2024-09-22 17:56:32,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=70364.0, ans=0.0 2024-09-22 17:56:34,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70364.0, ans=0.1 2024-09-22 17:56:36,973 INFO [train.py:1198] (0/4) Epoch 4, batch 3400, loss[loss=0.3598, ctc_loss=0.2703, cr_loss=0.4473, over 17231.00 frames. ], tot_loss[loss=0.3138, ctc_loss=0.2314, cr_loss=0.4119, over 3344756.42 frames. ], batch size: 55, lr: 2.62e-02, grad_scale: 32.0 2024-09-22 17:56:43,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=70410.66666666667, ans=0.0 2024-09-22 17:57:05,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=70457.33333333333, ans=0.125 2024-09-22 17:57:12,457 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.333e+02 1.518e+02 1.665e+02 2.014e+02 3.333e+02, threshold=3.330e+02, percent-clipped=0.0 2024-09-22 17:57:33,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2024-09-22 17:57:51,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=70597.33333333333, ans=0.09899494936611666 2024-09-22 17:57:54,209 INFO [train.py:1198] (0/4) Epoch 4, batch 3450, loss[loss=0.2995, ctc_loss=0.2228, cr_loss=0.3838, over 17128.00 frames. ], tot_loss[loss=0.3135, ctc_loss=0.231, cr_loss=0.4123, over 3353859.41 frames. ], batch size: 40, lr: 2.61e-02, grad_scale: 32.0 2024-09-22 17:58:17,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=70690.66666666667, ans=0.125 2024-09-22 17:58:41,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=70737.33333333333, ans=0.0 2024-09-22 17:58:44,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=70784.0, ans=0.125 2024-09-22 17:58:50,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=70784.0, ans=0.125 2024-09-22 17:58:53,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=70784.0, ans=12.0 2024-09-22 17:59:08,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=70830.66666666667, ans=0.125 2024-09-22 17:59:13,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=70830.66666666667, ans=0.025 2024-09-22 17:59:15,856 INFO [train.py:1198] (0/4) Epoch 4, batch 3500, loss[loss=0.3017, ctc_loss=0.2144, cr_loss=0.4365, over 17160.00 frames. ], tot_loss[loss=0.3138, ctc_loss=0.2311, cr_loss=0.4133, over 3359421.62 frames. ], batch size: 45, lr: 2.61e-02, grad_scale: 32.0 2024-09-22 17:59:27,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-09-22 17:59:34,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=70924.0, ans=0.0 2024-09-22 17:59:53,260 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.280e+02 1.551e+02 1.744e+02 2.057e+02 3.215e+02, threshold=3.488e+02, percent-clipped=0.0 2024-09-22 17:59:58,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=70970.66666666667, ans=0.125 2024-09-22 18:00:06,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71017.33333333333, ans=0.1 2024-09-22 18:00:26,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71064.0, ans=0.1 2024-09-22 18:00:32,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=71110.66666666667, ans=0.2 2024-09-22 18:00:33,940 INFO [train.py:1198] (0/4) Epoch 4, batch 3550, loss[loss=0.2792, ctc_loss=0.2087, cr_loss=0.3524, over 17199.00 frames. ], tot_loss[loss=0.3137, ctc_loss=0.2311, cr_loss=0.4133, over 3364797.28 frames. ], batch size: 41, lr: 2.61e-02, grad_scale: 16.0 2024-09-22 18:00:40,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=71110.66666666667, ans=0.125 2024-09-22 18:00:45,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=71110.66666666667, ans=0.125 2024-09-22 18:00:45,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.37 vs. limit=10.0 2024-09-22 18:01:37,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=71297.33333333333, ans=0.025 2024-09-22 18:01:51,387 INFO [train.py:1198] (0/4) Epoch 4, batch 3600, loss[loss=0.2808, ctc_loss=0.205, cr_loss=0.3787, over 17248.00 frames. ], tot_loss[loss=0.3115, ctc_loss=0.2291, cr_loss=0.4122, over 3373488.66 frames. ], batch size: 42, lr: 2.60e-02, grad_scale: 32.0 2024-09-22 18:02:02,418 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:02:22,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=71437.33333333333, ans=0.035 2024-09-22 18:02:28,362 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.219e+02 1.639e+02 2.046e+02 2.808e+02 4.325e+02, threshold=4.093e+02, percent-clipped=8.0 2024-09-22 18:02:33,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=71437.33333333333, ans=0.125 2024-09-22 18:02:45,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71484.0, ans=0.1 2024-09-22 18:02:47,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=71484.0, ans=0.05 2024-09-22 18:03:09,093 INFO [train.py:1198] (0/4) Epoch 4, batch 3650, loss[loss=0.3121, ctc_loss=0.2282, cr_loss=0.4193, over 17315.00 frames. ], tot_loss[loss=0.3119, ctc_loss=0.2295, cr_loss=0.4121, over 3369912.23 frames. ], batch size: 51, lr: 2.60e-02, grad_scale: 32.0 2024-09-22 18:03:26,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=71624.0, ans=0.125 2024-09-22 18:03:41,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.23 vs. limit=22.5 2024-09-22 18:03:56,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=71717.33333333333, ans=0.125 2024-09-22 18:04:28,662 INFO [train.py:1198] (0/4) Epoch 4, batch 3700, loss[loss=0.3367, ctc_loss=0.2495, cr_loss=0.436, over 17218.00 frames. ], tot_loss[loss=0.3127, ctc_loss=0.23, cr_loss=0.4134, over 3367996.60 frames. ], batch size: 55, lr: 2.60e-02, grad_scale: 32.0 2024-09-22 18:04:44,240 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:04:55,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=71857.33333333333, ans=0.025 2024-09-22 18:05:00,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2024-09-22 18:05:07,669 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.267e+02 1.579e+02 1.797e+02 2.044e+02 5.255e+02, threshold=3.594e+02, percent-clipped=1.0 2024-09-22 18:05:11,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2024-09-22 18:05:27,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-09-22 18:05:48,768 INFO [train.py:1198] (0/4) Epoch 4, batch 3750, loss[loss=0.3111, ctc_loss=0.2264, cr_loss=0.4235, over 16068.00 frames. ], tot_loss[loss=0.3137, ctc_loss=0.231, cr_loss=0.4136, over 3341111.09 frames. ], batch size: 74, lr: 2.59e-02, grad_scale: 32.0 2024-09-22 18:06:51,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=72230.66666666667, ans=0.125 2024-09-22 18:07:06,331 INFO [train.py:1198] (0/4) Epoch 4, batch 3800, loss[loss=0.2877, ctc_loss=0.2098, cr_loss=0.3893, over 16951.00 frames. ], tot_loss[loss=0.3153, ctc_loss=0.2324, cr_loss=0.4143, over 3334932.75 frames. ], batch size: 42, lr: 2.59e-02, grad_scale: 32.0 2024-09-22 18:07:33,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=15.0 2024-09-22 18:07:38,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=72370.66666666667, ans=0.125 2024-09-22 18:07:44,154 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.309e+02 1.589e+02 1.725e+02 2.068e+02 4.482e+02, threshold=3.450e+02, percent-clipped=2.0 2024-09-22 18:07:54,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=72417.33333333333, ans=0.025 2024-09-22 18:07:55,938 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:08:00,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=22.5 2024-09-22 18:08:24,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=72510.66666666667, ans=0.125 2024-09-22 18:08:25,303 INFO [train.py:1198] (0/4) Epoch 4, batch 3850, loss[loss=0.3666, ctc_loss=0.2827, cr_loss=0.4194, over 11982.00 frames. ], tot_loss[loss=0.3161, ctc_loss=0.2334, cr_loss=0.4136, over 3291165.84 frames. ], batch size: 123, lr: 2.59e-02, grad_scale: 32.0 2024-09-22 18:08:28,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=72510.66666666667, ans=0.125 2024-09-22 18:08:33,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=72510.66666666667, ans=0.1 2024-09-22 18:08:59,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=72604.0, ans=0.125 2024-09-22 18:09:01,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=72604.0, ans=0.125 2024-09-22 18:09:35,883 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-4.pt 2024-09-22 18:10:26,643 INFO [train.py:1198] (0/4) Epoch 5, batch 0, loss[loss=0.3375, ctc_loss=0.2513, cr_loss=0.4307, over 17234.00 frames. ], tot_loss[loss=0.3375, ctc_loss=0.2513, cr_loss=0.4307, over 17234.00 frames. ], batch size: 55, lr: 2.40e-02, grad_scale: 32.0 2024-09-22 18:10:26,644 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 18:10:42,120 INFO [train.py:1230] (0/4) Epoch 5, validation: loss=0.07551, ctc_loss=0.07551, cr_loss=6.915e-15, over 944034.00 frames. 2024-09-22 18:10:42,121 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 18:10:51,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=72725.33333333333, ans=0.125 2024-09-22 18:11:08,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=72772.0, ans=0.0 2024-09-22 18:11:27,092 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.296e+02 1.660e+02 1.848e+02 2.232e+02 4.613e+02, threshold=3.696e+02, percent-clipped=4.0 2024-09-22 18:11:41,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.16 vs. limit=22.5 2024-09-22 18:11:49,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=72912.0, ans=0.125 2024-09-22 18:12:02,273 INFO [train.py:1198] (0/4) Epoch 5, batch 50, loss[loss=0.3028, ctc_loss=0.2218, cr_loss=0.4053, over 17311.00 frames. ], tot_loss[loss=0.3112, ctc_loss=0.2291, cr_loss=0.4109, over 758055.68 frames. ], batch size: 46, lr: 2.40e-02, grad_scale: 32.0 2024-09-22 18:12:31,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=73005.33333333333, ans=0.95 2024-09-22 18:12:34,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=73052.0, ans=0.0 2024-09-22 18:12:59,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=73098.66666666667, ans=0.0 2024-09-22 18:13:02,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2024-09-22 18:13:13,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=73145.33333333333, ans=0.125 2024-09-22 18:13:23,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=73145.33333333333, ans=0.2 2024-09-22 18:13:26,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=73192.0, ans=10.0 2024-09-22 18:13:27,567 INFO [train.py:1198] (0/4) Epoch 5, batch 100, loss[loss=0.3322, ctc_loss=0.2408, cr_loss=0.4572, over 16613.00 frames. ], tot_loss[loss=0.3076, ctc_loss=0.2259, cr_loss=0.4089, over 1332221.65 frames. ], batch size: 66, lr: 2.40e-02, grad_scale: 32.0 2024-09-22 18:13:42,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=73238.66666666667, ans=0.125 2024-09-22 18:13:48,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=73238.66666666667, ans=0.0 2024-09-22 18:13:49,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=73238.66666666667, ans=0.2 2024-09-22 18:14:12,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.270e+02 1.507e+02 1.769e+02 2.148e+02 4.396e+02, threshold=3.538e+02, percent-clipped=1.0 2024-09-22 18:14:15,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=73332.0, ans=0.0 2024-09-22 18:14:20,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=73332.0, ans=0.0 2024-09-22 18:14:46,990 INFO [train.py:1198] (0/4) Epoch 5, batch 150, loss[loss=0.2666, ctc_loss=0.19, cr_loss=0.3829, over 17029.00 frames. ], tot_loss[loss=0.3081, ctc_loss=0.226, cr_loss=0.4108, over 1789380.14 frames. ], batch size: 39, lr: 2.40e-02, grad_scale: 32.0 2024-09-22 18:14:48,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=73425.33333333333, ans=0.125 2024-09-22 18:14:50,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-09-22 18:14:53,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=73425.33333333333, ans=0.2 2024-09-22 18:14:56,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=73425.33333333333, ans=0.125 2024-09-22 18:15:29,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2024-09-22 18:15:31,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73518.66666666667, ans=0.1 2024-09-22 18:16:12,904 INFO [train.py:1198] (0/4) Epoch 5, batch 200, loss[loss=0.2804, ctc_loss=0.2004, cr_loss=0.3999, over 17264.00 frames. ], tot_loss[loss=0.3065, ctc_loss=0.2247, cr_loss=0.409, over 2142995.55 frames. ], batch size: 44, lr: 2.39e-02, grad_scale: 32.0 2024-09-22 18:16:21,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=73658.66666666667, ans=0.125 2024-09-22 18:16:55,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2024-09-22 18:16:57,602 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.221e+02 1.454e+02 1.719e+02 2.223e+02 3.373e+02, threshold=3.438e+02, percent-clipped=0.0 2024-09-22 18:17:10,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73798.66666666667, ans=0.1 2024-09-22 18:17:32,286 INFO [train.py:1198] (0/4) Epoch 5, batch 250, loss[loss=0.3174, ctc_loss=0.2332, cr_loss=0.421, over 17254.00 frames. ], tot_loss[loss=0.3066, ctc_loss=0.2249, cr_loss=0.4087, over 2411502.42 frames. ], batch size: 44, lr: 2.39e-02, grad_scale: 32.0 2024-09-22 18:17:47,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=73938.66666666667, ans=0.025 2024-09-22 18:18:52,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=74078.66666666667, ans=0.125 2024-09-22 18:18:57,279 INFO [train.py:1198] (0/4) Epoch 5, batch 300, loss[loss=0.329, ctc_loss=0.249, cr_loss=0.4, over 15067.00 frames. ], tot_loss[loss=0.3066, ctc_loss=0.2248, cr_loss=0.4087, over 2616233.12 frames. ], batch size: 89, lr: 2.39e-02, grad_scale: 32.0 2024-09-22 18:19:07,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=74125.33333333333, ans=0.125 2024-09-22 18:19:19,849 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:19:41,768 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.262e+02 1.584e+02 1.929e+02 2.265e+02 4.720e+02, threshold=3.859e+02, percent-clipped=2.0 2024-09-22 18:20:19,149 INFO [train.py:1198] (0/4) Epoch 5, batch 350, loss[loss=0.3198, ctc_loss=0.2422, cr_loss=0.3878, over 17346.00 frames. ], tot_loss[loss=0.3064, ctc_loss=0.2246, cr_loss=0.4087, over 2782090.22 frames. ], batch size: 48, lr: 2.38e-02, grad_scale: 32.0 2024-09-22 18:20:22,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=74358.66666666667, ans=0.1 2024-09-22 18:20:52,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=74452.0, ans=0.125 2024-09-22 18:21:00,605 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:21:41,487 INFO [train.py:1198] (0/4) Epoch 5, batch 400, loss[loss=0.3445, ctc_loss=0.2545, cr_loss=0.4503, over 16956.00 frames. ], tot_loss[loss=0.3066, ctc_loss=0.2247, cr_loss=0.4095, over 2910778.52 frames. ], batch size: 58, lr: 2.38e-02, grad_scale: 32.0 2024-09-22 18:21:52,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=74592.0, ans=0.125 2024-09-22 18:22:05,382 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-16000.pt 2024-09-22 18:22:14,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=74685.33333333333, ans=0.125 2024-09-22 18:22:28,037 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.516e+02 1.701e+02 1.953e+02 3.306e+02, threshold=3.402e+02, percent-clipped=0.0 2024-09-22 18:22:30,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-09-22 18:22:33,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=74732.0, ans=0.0 2024-09-22 18:22:37,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=74732.0, ans=0.04949747468305833 2024-09-22 18:22:50,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=74778.66666666667, ans=10.0 2024-09-22 18:23:05,460 INFO [train.py:1198] (0/4) Epoch 5, batch 450, loss[loss=0.2934, ctc_loss=0.2134, cr_loss=0.4, over 17108.00 frames. ], tot_loss[loss=0.3054, ctc_loss=0.2237, cr_loss=0.4082, over 3011979.41 frames. ], batch size: 49, lr: 2.38e-02, grad_scale: 32.0 2024-09-22 18:23:10,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74825.33333333333, ans=0.1 2024-09-22 18:23:21,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=74825.33333333333, ans=0.0 2024-09-22 18:23:24,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=74872.0, ans=0.025 2024-09-22 18:23:32,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=74872.0, ans=0.0 2024-09-22 18:23:33,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=74872.0, ans=10.0 2024-09-22 18:23:52,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=74918.66666666667, ans=0.0 2024-09-22 18:24:02,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=74965.33333333333, ans=0.0 2024-09-22 18:24:03,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74965.33333333333, ans=0.1 2024-09-22 18:24:27,403 INFO [train.py:1198] (0/4) Epoch 5, batch 500, loss[loss=0.3283, ctc_loss=0.2412, cr_loss=0.4353, over 16884.00 frames. ], tot_loss[loss=0.3041, ctc_loss=0.2226, cr_loss=0.4072, over 3090807.49 frames. ], batch size: 58, lr: 2.37e-02, grad_scale: 32.0 2024-09-22 18:25:00,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=75152.0, ans=0.0 2024-09-22 18:25:13,995 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 1.528e+02 1.760e+02 2.096e+02 4.266e+02, threshold=3.519e+02, percent-clipped=4.0 2024-09-22 18:25:17,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=75198.66666666667, ans=0.125 2024-09-22 18:25:25,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=75198.66666666667, ans=0.125 2024-09-22 18:25:36,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=75245.33333333333, ans=0.0 2024-09-22 18:25:48,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=75245.33333333333, ans=0.125 2024-09-22 18:25:49,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-22 18:25:51,521 INFO [train.py:1198] (0/4) Epoch 5, batch 550, loss[loss=0.3031, ctc_loss=0.221, cr_loss=0.4104, over 16988.00 frames. ], tot_loss[loss=0.3038, ctc_loss=0.2223, cr_loss=0.4073, over 3161799.66 frames. ], batch size: 51, lr: 2.37e-02, grad_scale: 32.0 2024-09-22 18:25:58,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=75292.0, ans=0.125 2024-09-22 18:26:17,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2024-09-22 18:26:39,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=75432.0, ans=0.0 2024-09-22 18:26:54,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=75478.66666666667, ans=0.125 2024-09-22 18:27:10,588 INFO [train.py:1198] (0/4) Epoch 5, batch 600, loss[loss=0.3051, ctc_loss=0.2209, cr_loss=0.4211, over 17247.00 frames. ], tot_loss[loss=0.305, ctc_loss=0.2231, cr_loss=0.4093, over 3211321.45 frames. ], batch size: 50, lr: 2.37e-02, grad_scale: 32.0 2024-09-22 18:27:15,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=75525.33333333333, ans=0.1 2024-09-22 18:27:41,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-09-22 18:27:49,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=75618.66666666667, ans=0.125 2024-09-22 18:27:57,743 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.285e+02 1.538e+02 1.800e+02 2.212e+02 3.356e+02, threshold=3.600e+02, percent-clipped=0.0 2024-09-22 18:28:09,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=75665.33333333333, ans=0.07 2024-09-22 18:28:26,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=75712.0, ans=0.0 2024-09-22 18:28:35,293 INFO [train.py:1198] (0/4) Epoch 5, batch 650, loss[loss=0.3453, ctc_loss=0.2541, cr_loss=0.4561, over 17091.00 frames. ], tot_loss[loss=0.3053, ctc_loss=0.2233, cr_loss=0.4102, over 3242983.28 frames. ], batch size: 49, lr: 2.36e-02, grad_scale: 32.0 2024-09-22 18:28:59,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=75805.33333333333, ans=0.0 2024-09-22 18:29:24,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=75898.66666666667, ans=0.125 2024-09-22 18:29:43,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=75945.33333333333, ans=0.125 2024-09-22 18:29:54,260 INFO [train.py:1198] (0/4) Epoch 5, batch 700, loss[loss=0.2453, ctc_loss=0.1738, cr_loss=0.3576, over 17178.00 frames. ], tot_loss[loss=0.3041, ctc_loss=0.2223, cr_loss=0.4093, over 3274924.31 frames. ], batch size: 41, lr: 2.36e-02, grad_scale: 32.0 2024-09-22 18:30:17,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=76038.66666666667, ans=0.0 2024-09-22 18:30:25,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2024-09-22 18:30:30,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=76085.33333333333, ans=0.0 2024-09-22 18:30:43,603 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.226e+02 1.521e+02 1.705e+02 2.105e+02 2.881e+02, threshold=3.410e+02, percent-clipped=0.0 2024-09-22 18:30:54,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=76132.0, ans=0.04949747468305833 2024-09-22 18:31:18,064 INFO [train.py:1198] (0/4) Epoch 5, batch 750, loss[loss=0.3165, ctc_loss=0.2331, cr_loss=0.4171, over 16712.00 frames. ], tot_loss[loss=0.3049, ctc_loss=0.2228, cr_loss=0.4106, over 3293290.61 frames. ], batch size: 61, lr: 2.36e-02, grad_scale: 32.0 2024-09-22 18:31:43,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-09-22 18:31:53,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=76318.66666666667, ans=0.0 2024-09-22 18:32:15,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=76365.33333333333, ans=0.125 2024-09-22 18:32:37,282 INFO [train.py:1198] (0/4) Epoch 5, batch 800, loss[loss=0.2791, ctc_loss=0.2018, cr_loss=0.3865, over 17031.00 frames. ], tot_loss[loss=0.3047, ctc_loss=0.2226, cr_loss=0.4107, over 3317856.64 frames. ], batch size: 44, lr: 2.36e-02, grad_scale: 32.0 2024-09-22 18:32:39,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=76458.66666666667, ans=0.2 2024-09-22 18:33:19,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=76552.0, ans=0.5 2024-09-22 18:33:19,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2024-09-22 18:33:26,744 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.316e+02 1.675e+02 1.893e+02 2.259e+02 3.192e+02, threshold=3.786e+02, percent-clipped=0.0 2024-09-22 18:33:27,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.00 vs. limit=6.0 2024-09-22 18:33:35,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=76598.66666666667, ans=0.125 2024-09-22 18:33:36,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=76598.66666666667, ans=0.125 2024-09-22 18:33:38,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=76598.66666666667, ans=0.0 2024-09-22 18:33:55,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=76645.33333333333, ans=0.2 2024-09-22 18:34:01,758 INFO [train.py:1198] (0/4) Epoch 5, batch 850, loss[loss=0.3542, ctc_loss=0.2623, cr_loss=0.4595, over 17348.00 frames. ], tot_loss[loss=0.3049, ctc_loss=0.2227, cr_loss=0.4107, over 3332395.56 frames. ], batch size: 48, lr: 2.35e-02, grad_scale: 32.0 2024-09-22 18:34:10,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=76692.0, ans=0.125 2024-09-22 18:34:11,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=76692.0, ans=0.125 2024-09-22 18:34:29,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=76738.66666666667, ans=0.125 2024-09-22 18:35:02,528 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:35:19,919 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=22.5 2024-09-22 18:35:23,902 INFO [train.py:1198] (0/4) Epoch 5, batch 900, loss[loss=0.3119, ctc_loss=0.2272, cr_loss=0.4237, over 17205.00 frames. ], tot_loss[loss=0.3049, ctc_loss=0.2226, cr_loss=0.4115, over 3345296.49 frames. ], batch size: 47, lr: 2.35e-02, grad_scale: 32.0 2024-09-22 18:35:46,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-09-22 18:36:10,877 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.238e+02 1.424e+02 1.574e+02 1.806e+02 2.908e+02, threshold=3.147e+02, percent-clipped=0.0 2024-09-22 18:36:46,128 INFO [train.py:1198] (0/4) Epoch 5, batch 950, loss[loss=0.2562, ctc_loss=0.1816, cr_loss=0.3731, over 16345.00 frames. ], tot_loss[loss=0.3041, ctc_loss=0.2221, cr_loss=0.4102, over 3334572.88 frames. ], batch size: 36, lr: 2.35e-02, grad_scale: 32.0 2024-09-22 18:36:49,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=77158.66666666667, ans=0.0 2024-09-22 18:37:11,609 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:37:17,912 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 18:37:32,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77298.66666666667, ans=0.1 2024-09-22 18:37:38,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=77298.66666666667, ans=0.5 2024-09-22 18:37:44,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=22.5 2024-09-22 18:37:51,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=77345.33333333333, ans=0.125 2024-09-22 18:38:06,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-09-22 18:38:10,580 INFO [train.py:1198] (0/4) Epoch 5, batch 1000, loss[loss=0.345, ctc_loss=0.254, cr_loss=0.4549, over 17002.00 frames. ], tot_loss[loss=0.3026, ctc_loss=0.2208, cr_loss=0.4089, over 3340666.82 frames. ], batch size: 51, lr: 2.34e-02, grad_scale: 32.0 2024-09-22 18:38:13,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=77392.0, ans=0.0 2024-09-22 18:38:36,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=77438.66666666667, ans=0.125 2024-09-22 18:38:41,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=15.0 2024-09-22 18:38:55,016 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.192e+02 1.650e+02 1.785e+02 2.141e+02 3.125e+02, threshold=3.569e+02, percent-clipped=0.0 2024-09-22 18:39:23,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=77578.66666666667, ans=0.125 2024-09-22 18:39:29,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-09-22 18:39:30,041 INFO [train.py:1198] (0/4) Epoch 5, batch 1050, loss[loss=0.2813, ctc_loss=0.2049, cr_loss=0.3818, over 16942.00 frames. ], tot_loss[loss=0.3023, ctc_loss=0.2207, cr_loss=0.4083, over 3350588.16 frames. ], batch size: 42, lr: 2.34e-02, grad_scale: 32.0 2024-09-22 18:39:30,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77625.33333333333, ans=0.1 2024-09-22 18:39:40,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.69 vs. limit=15.0 2024-09-22 18:40:54,539 INFO [train.py:1198] (0/4) Epoch 5, batch 1100, loss[loss=0.3223, ctc_loss=0.2363, cr_loss=0.4301, over 17140.00 frames. ], tot_loss[loss=0.302, ctc_loss=0.2205, cr_loss=0.4072, over 3350599.07 frames. ], batch size: 48, lr: 2.34e-02, grad_scale: 16.0 2024-09-22 18:41:05,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=77858.66666666667, ans=0.125 2024-09-22 18:41:07,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2024-09-22 18:41:34,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=77952.0, ans=0.1 2024-09-22 18:41:40,427 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.256e+02 1.520e+02 1.761e+02 2.071e+02 3.558e+02, threshold=3.523e+02, percent-clipped=0.0 2024-09-22 18:41:50,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=77998.66666666667, ans=0.125 2024-09-22 18:41:56,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=78045.33333333333, ans=0.2 2024-09-22 18:42:06,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2024-09-22 18:42:13,887 INFO [train.py:1198] (0/4) Epoch 5, batch 1150, loss[loss=0.2897, ctc_loss=0.2068, cr_loss=0.4147, over 17037.00 frames. ], tot_loss[loss=0.3027, ctc_loss=0.221, cr_loss=0.4085, over 3351902.70 frames. ], batch size: 52, lr: 2.34e-02, grad_scale: 16.0 2024-09-22 18:42:21,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=16.30 vs. limit=15.0 2024-09-22 18:43:09,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=78232.0, ans=0.125 2024-09-22 18:43:21,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-09-22 18:43:22,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=78278.66666666667, ans=0.0 2024-09-22 18:43:22,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=78278.66666666667, ans=0.2 2024-09-22 18:43:25,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=78278.66666666667, ans=0.2 2024-09-22 18:43:36,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-09-22 18:43:37,776 INFO [train.py:1198] (0/4) Epoch 5, batch 1200, loss[loss=0.3596, ctc_loss=0.2647, cr_loss=0.4747, over 17006.00 frames. ], tot_loss[loss=0.3022, ctc_loss=0.2205, cr_loss=0.4089, over 3358519.77 frames. ], batch size: 51, lr: 2.33e-02, grad_scale: 32.0 2024-09-22 18:43:50,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78325.33333333333, ans=0.1 2024-09-22 18:44:00,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=78372.0, ans=0.125 2024-09-22 18:44:00,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-22 18:44:24,262 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.563e+02 1.694e+02 1.943e+02 2.938e+02, threshold=3.387e+02, percent-clipped=0.0 2024-09-22 18:44:30,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=78465.33333333333, ans=0.0 2024-09-22 18:44:40,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=78512.0, ans=0.0 2024-09-22 18:44:57,617 INFO [train.py:1198] (0/4) Epoch 5, batch 1250, loss[loss=0.3319, ctc_loss=0.2472, cr_loss=0.4236, over 17222.00 frames. ], tot_loss[loss=0.3037, ctc_loss=0.2217, cr_loss=0.4099, over 3359351.26 frames. ], batch size: 55, lr: 2.33e-02, grad_scale: 32.0 2024-09-22 18:45:13,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-09-22 18:45:29,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2024-09-22 18:45:42,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=78652.0, ans=0.2 2024-09-22 18:45:46,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-09-22 18:46:02,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-09-22 18:46:11,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=78745.33333333333, ans=0.125 2024-09-22 18:46:22,194 INFO [train.py:1198] (0/4) Epoch 5, batch 1300, loss[loss=0.2683, ctc_loss=0.199, cr_loss=0.3461, over 16744.00 frames. ], tot_loss[loss=0.3027, ctc_loss=0.221, cr_loss=0.4086, over 3355305.18 frames. ], batch size: 37, lr: 2.33e-02, grad_scale: 32.0 2024-09-22 18:46:22,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=78792.0, ans=0.125 2024-09-22 18:46:27,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-09-22 18:46:40,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=78838.66666666667, ans=0.2 2024-09-22 18:46:46,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=78838.66666666667, ans=0.2 2024-09-22 18:47:08,165 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.258e+02 1.495e+02 1.790e+02 2.182e+02 4.439e+02, threshold=3.579e+02, percent-clipped=1.0 2024-09-22 18:47:32,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=22.5 2024-09-22 18:47:43,985 INFO [train.py:1198] (0/4) Epoch 5, batch 1350, loss[loss=0.3093, ctc_loss=0.2304, cr_loss=0.3943, over 17341.00 frames. ], tot_loss[loss=0.3013, ctc_loss=0.22, cr_loss=0.4066, over 3355831.50 frames. ], batch size: 48, lr: 2.32e-02, grad_scale: 32.0 2024-09-22 18:48:10,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=79072.0, ans=0.125 2024-09-22 18:48:24,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=79118.66666666667, ans=0.125 2024-09-22 18:48:42,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=79165.33333333333, ans=0.125 2024-09-22 18:49:05,833 INFO [train.py:1198] (0/4) Epoch 5, batch 1400, loss[loss=0.3016, ctc_loss=0.2203, cr_loss=0.4065, over 17213.00 frames. ], tot_loss[loss=0.3002, ctc_loss=0.2191, cr_loss=0.4057, over 3361119.85 frames. ], batch size: 47, lr: 2.32e-02, grad_scale: 32.0 2024-09-22 18:49:39,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79352.0, ans=0.1 2024-09-22 18:49:54,449 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.244e+02 1.617e+02 1.825e+02 2.247e+02 3.972e+02, threshold=3.649e+02, percent-clipped=1.0 2024-09-22 18:49:56,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=79398.66666666667, ans=0.04949747468305833 2024-09-22 18:50:01,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=79398.66666666667, ans=0.1 2024-09-22 18:50:30,041 INFO [train.py:1198] (0/4) Epoch 5, batch 1450, loss[loss=0.3303, ctc_loss=0.2451, cr_loss=0.4262, over 16907.00 frames. ], tot_loss[loss=0.2998, ctc_loss=0.2187, cr_loss=0.4055, over 3359411.73 frames. ], batch size: 58, lr: 2.32e-02, grad_scale: 32.0 2024-09-22 18:50:41,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=79492.0, ans=0.09899494936611666 2024-09-22 18:50:44,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=79538.66666666667, ans=0.125 2024-09-22 18:50:47,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=79538.66666666667, ans=0.125 2024-09-22 18:51:02,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=12.0 2024-09-22 18:51:28,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-09-22 18:51:43,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=79678.66666666667, ans=0.025 2024-09-22 18:51:45,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=79678.66666666667, ans=0.0 2024-09-22 18:51:49,956 INFO [train.py:1198] (0/4) Epoch 5, batch 1500, loss[loss=0.3432, ctc_loss=0.2523, cr_loss=0.4544, over 17213.00 frames. ], tot_loss[loss=0.3014, ctc_loss=0.2199, cr_loss=0.4071, over 3361800.88 frames. ], batch size: 55, lr: 2.32e-02, grad_scale: 32.0 2024-09-22 18:52:09,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=79772.0, ans=0.07 2024-09-22 18:52:30,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=79818.66666666667, ans=0.05 2024-09-22 18:52:40,772 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.573e+02 1.905e+02 2.453e+02 3.573e+02, threshold=3.810e+02, percent-clipped=0.0 2024-09-22 18:53:14,049 INFO [train.py:1198] (0/4) Epoch 5, batch 1550, loss[loss=0.3064, ctc_loss=0.2217, cr_loss=0.4233, over 17085.00 frames. ], tot_loss[loss=0.3018, ctc_loss=0.2204, cr_loss=0.4071, over 3352697.92 frames. ], batch size: 46, lr: 2.31e-02, grad_scale: 32.0 2024-09-22 18:53:22,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=79958.66666666667, ans=0.0 2024-09-22 18:53:38,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=80005.33333333333, ans=0.125 2024-09-22 18:53:46,883 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-22 18:54:33,900 INFO [train.py:1198] (0/4) Epoch 5, batch 1600, loss[loss=0.3162, ctc_loss=0.2295, cr_loss=0.4333, over 16978.00 frames. ], tot_loss[loss=0.3016, ctc_loss=0.2202, cr_loss=0.4066, over 3351713.09 frames. ], batch size: 53, lr: 2.31e-02, grad_scale: 32.0 2024-09-22 18:55:00,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=80238.66666666667, ans=0.125 2024-09-22 18:55:03,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=80238.66666666667, ans=0.5 2024-09-22 18:55:13,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.49 vs. limit=10.0 2024-09-22 18:55:22,232 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.274e+02 1.455e+02 1.591e+02 1.855e+02 3.051e+02, threshold=3.183e+02, percent-clipped=0.0 2024-09-22 18:55:37,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=80332.0, ans=0.0 2024-09-22 18:55:58,302 INFO [train.py:1198] (0/4) Epoch 5, batch 1650, loss[loss=0.3266, ctc_loss=0.2391, cr_loss=0.4375, over 17296.00 frames. ], tot_loss[loss=0.302, ctc_loss=0.2206, cr_loss=0.407, over 3356925.33 frames. ], batch size: 51, lr: 2.31e-02, grad_scale: 32.0 2024-09-22 18:56:01,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=80425.33333333333, ans=0.1 2024-09-22 18:56:17,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=80472.0, ans=0.0 2024-09-22 18:57:09,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=80612.0, ans=0.07 2024-09-22 18:57:11,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80612.0, ans=0.125 2024-09-22 18:57:19,864 INFO [train.py:1198] (0/4) Epoch 5, batch 1700, loss[loss=0.3052, ctc_loss=0.2271, cr_loss=0.3907, over 15894.00 frames. ], tot_loss[loss=0.3025, ctc_loss=0.221, cr_loss=0.4072, over 3338164.80 frames. ], batch size: 74, lr: 2.30e-02, grad_scale: 32.0 2024-09-22 18:57:27,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=80658.66666666667, ans=0.125 2024-09-22 18:57:43,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=80705.33333333333, ans=0.2 2024-09-22 18:57:46,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=80705.33333333333, ans=10.0 2024-09-22 18:58:08,258 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.290e+02 1.556e+02 1.832e+02 2.128e+02 4.427e+02, threshold=3.664e+02, percent-clipped=3.0 2024-09-22 18:58:41,928 INFO [train.py:1198] (0/4) Epoch 5, batch 1750, loss[loss=0.2983, ctc_loss=0.22, cr_loss=0.3914, over 17298.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2201, cr_loss=0.408, over 3346809.73 frames. ], batch size: 49, lr: 2.30e-02, grad_scale: 32.0 2024-09-22 18:58:48,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=80892.0, ans=0.0 2024-09-22 18:58:48,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=80892.0, ans=0.125 2024-09-22 18:58:56,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=22.5 2024-09-22 18:58:59,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=80938.66666666667, ans=0.125 2024-09-22 18:59:28,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=81032.0, ans=0.0 2024-09-22 18:59:53,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=81078.66666666667, ans=0.125 2024-09-22 18:59:53,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2024-09-22 18:59:54,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=81078.66666666667, ans=10.0 2024-09-22 19:00:03,800 INFO [train.py:1198] (0/4) Epoch 5, batch 1800, loss[loss=0.2899, ctc_loss=0.2116, cr_loss=0.3915, over 17283.00 frames. ], tot_loss[loss=0.3014, ctc_loss=0.2199, cr_loss=0.4077, over 3350290.77 frames. ], batch size: 42, lr: 2.30e-02, grad_scale: 32.0 2024-09-22 19:00:04,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-09-22 19:00:26,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=81172.0, ans=0.125 2024-09-22 19:00:35,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-22 19:00:41,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=81218.66666666667, ans=0.0 2024-09-22 19:00:52,231 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.571e+02 1.779e+02 2.119e+02 4.116e+02, threshold=3.559e+02, percent-clipped=1.0 2024-09-22 19:00:54,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=81265.33333333333, ans=0.125 2024-09-22 19:01:03,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=81265.33333333333, ans=0.125 2024-09-22 19:01:19,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=81312.0, ans=0.0 2024-09-22 19:01:24,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-09-22 19:01:25,612 INFO [train.py:1198] (0/4) Epoch 5, batch 1850, loss[loss=0.225, ctc_loss=0.1614, cr_loss=0.3183, over 17022.00 frames. ], tot_loss[loss=0.3028, ctc_loss=0.2211, cr_loss=0.4087, over 3336907.16 frames. ], batch size: 39, lr: 2.30e-02, grad_scale: 32.0 2024-09-22 19:01:29,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81358.66666666667, ans=0.1 2024-09-22 19:01:54,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=81405.33333333333, ans=0.0 2024-09-22 19:02:04,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81452.0, ans=0.1 2024-09-22 19:02:07,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=81452.0, ans=0.2 2024-09-22 19:02:25,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=81498.66666666667, ans=0.0 2024-09-22 19:02:25,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=81498.66666666667, ans=0.07 2024-09-22 19:02:50,455 INFO [train.py:1198] (0/4) Epoch 5, batch 1900, loss[loss=0.3394, ctc_loss=0.2476, cr_loss=0.4588, over 16502.00 frames. ], tot_loss[loss=0.3024, ctc_loss=0.2206, cr_loss=0.4088, over 3347664.99 frames. ], batch size: 66, lr: 2.29e-02, grad_scale: 32.0 2024-09-22 19:03:27,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=81685.33333333333, ans=0.0 2024-09-22 19:03:36,548 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.294e+02 1.506e+02 1.844e+02 2.270e+02 3.771e+02, threshold=3.688e+02, percent-clipped=2.0 2024-09-22 19:03:59,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=81778.66666666667, ans=0.025 2024-09-22 19:04:03,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=81778.66666666667, ans=0.0 2024-09-22 19:04:06,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=81778.66666666667, ans=0.0 2024-09-22 19:04:06,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=81778.66666666667, ans=0.0 2024-09-22 19:04:10,003 INFO [train.py:1198] (0/4) Epoch 5, batch 1950, loss[loss=0.2406, ctc_loss=0.1734, cr_loss=0.3356, over 17021.00 frames. ], tot_loss[loss=0.3019, ctc_loss=0.2203, cr_loss=0.4079, over 3352360.96 frames. ], batch size: 39, lr: 2.29e-02, grad_scale: 32.0 2024-09-22 19:04:24,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=81872.0, ans=0.0 2024-09-22 19:04:49,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=81918.66666666667, ans=0.0 2024-09-22 19:04:57,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=81918.66666666667, ans=0.125 2024-09-22 19:05:17,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-09-22 19:05:33,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=82058.66666666667, ans=0.0 2024-09-22 19:05:34,771 INFO [train.py:1198] (0/4) Epoch 5, batch 2000, loss[loss=0.3077, ctc_loss=0.2224, cr_loss=0.4262, over 17185.00 frames. ], tot_loss[loss=0.302, ctc_loss=0.2205, cr_loss=0.4073, over 3352942.33 frames. ], batch size: 45, lr: 2.29e-02, grad_scale: 32.0 2024-09-22 19:05:38,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=82058.66666666667, ans=0.125 2024-09-22 19:05:54,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=82105.33333333333, ans=0.125 2024-09-22 19:06:21,341 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.273e+02 1.502e+02 1.753e+02 2.170e+02 3.000e+02, threshold=3.507e+02, percent-clipped=0.0 2024-09-22 19:06:43,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=82245.33333333333, ans=0.0 2024-09-22 19:06:45,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=82245.33333333333, ans=0.025 2024-09-22 19:06:45,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=82245.33333333333, ans=0.125 2024-09-22 19:06:53,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82292.0, ans=0.1 2024-09-22 19:06:54,494 INFO [train.py:1198] (0/4) Epoch 5, batch 2050, loss[loss=0.2674, ctc_loss=0.1956, cr_loss=0.3592, over 17094.00 frames. ], tot_loss[loss=0.3015, ctc_loss=0.2203, cr_loss=0.4062, over 3347061.36 frames. ], batch size: 43, lr: 2.28e-02, grad_scale: 32.0 2024-09-22 19:07:05,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=82292.0, ans=0.125 2024-09-22 19:07:08,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=82338.66666666667, ans=0.125 2024-09-22 19:07:09,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82338.66666666667, ans=0.1 2024-09-22 19:07:17,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=82338.66666666667, ans=0.125 2024-09-22 19:07:32,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.46 vs. limit=15.0 2024-09-22 19:08:08,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2024-09-22 19:08:18,629 INFO [train.py:1198] (0/4) Epoch 5, batch 2100, loss[loss=0.3729, ctc_loss=0.2897, cr_loss=0.4161, over 11495.00 frames. ], tot_loss[loss=0.3026, ctc_loss=0.221, cr_loss=0.4079, over 3344492.51 frames. ], batch size: 123, lr: 2.28e-02, grad_scale: 32.0 2024-09-22 19:08:26,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=82525.33333333333, ans=0.0 2024-09-22 19:08:41,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=82572.0, ans=0.0 2024-09-22 19:08:42,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=82572.0, ans=0.125 2024-09-22 19:09:04,411 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.583e+02 1.843e+02 2.179e+02 3.617e+02, threshold=3.686e+02, percent-clipped=2.0 2024-09-22 19:09:26,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82712.0, ans=0.1 2024-09-22 19:09:31,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=82712.0, ans=0.125 2024-09-22 19:09:40,264 INFO [train.py:1198] (0/4) Epoch 5, batch 2150, loss[loss=0.3151, ctc_loss=0.2258, cr_loss=0.4468, over 17292.00 frames. ], tot_loss[loss=0.3018, ctc_loss=0.2202, cr_loss=0.4081, over 3355140.89 frames. ], batch size: 51, lr: 2.28e-02, grad_scale: 32.0 2024-09-22 19:10:00,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2024-09-22 19:10:27,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=82852.0, ans=0.125 2024-09-22 19:10:40,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=82898.66666666667, ans=0.0 2024-09-22 19:10:54,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=82945.33333333333, ans=0.125 2024-09-22 19:11:02,150 INFO [train.py:1198] (0/4) Epoch 5, batch 2200, loss[loss=0.3258, ctc_loss=0.241, cr_loss=0.4239, over 17007.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2201, cr_loss=0.4082, over 3354114.92 frames. ], batch size: 56, lr: 2.28e-02, grad_scale: 32.0 2024-09-22 19:11:10,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=82992.0, ans=0.1 2024-09-22 19:11:41,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=83085.33333333333, ans=0.125 2024-09-22 19:11:48,592 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.270e+02 1.603e+02 1.776e+02 2.386e+02 3.569e+02, threshold=3.552e+02, percent-clipped=0.0 2024-09-22 19:11:55,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=83132.0, ans=0.125 2024-09-22 19:11:55,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=83132.0, ans=0.125 2024-09-22 19:12:03,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=83132.0, ans=0.125 2024-09-22 19:12:24,735 INFO [train.py:1198] (0/4) Epoch 5, batch 2250, loss[loss=0.2719, ctc_loss=0.1968, cr_loss=0.3756, over 16723.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.22, cr_loss=0.4083, over 3354095.55 frames. ], batch size: 37, lr: 2.27e-02, grad_scale: 32.0 2024-09-22 19:12:25,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=83225.33333333333, ans=0.125 2024-09-22 19:12:29,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=83225.33333333333, ans=0.05 2024-09-22 19:12:46,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83272.0, ans=0.1 2024-09-22 19:12:46,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=83272.0, ans=0.0 2024-09-22 19:13:37,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=83412.0, ans=0.125 2024-09-22 19:13:46,567 INFO [train.py:1198] (0/4) Epoch 5, batch 2300, loss[loss=0.2516, ctc_loss=0.1826, cr_loss=0.3451, over 17115.00 frames. ], tot_loss[loss=0.3014, ctc_loss=0.2198, cr_loss=0.4081, over 3354854.11 frames. ], batch size: 40, lr: 2.27e-02, grad_scale: 32.0 2024-09-22 19:13:51,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=83458.66666666667, ans=0.0 2024-09-22 19:13:53,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=83458.66666666667, ans=0.125 2024-09-22 19:14:34,650 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.240e+02 1.515e+02 1.751e+02 2.046e+02 3.052e+02, threshold=3.503e+02, percent-clipped=0.0 2024-09-22 19:14:41,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=83598.66666666667, ans=0.0 2024-09-22 19:14:44,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=83598.66666666667, ans=0.125 2024-09-22 19:15:07,863 INFO [train.py:1198] (0/4) Epoch 5, batch 2350, loss[loss=0.2921, ctc_loss=0.213, cr_loss=0.3957, over 17074.00 frames. ], tot_loss[loss=0.3008, ctc_loss=0.2193, cr_loss=0.4071, over 3349148.93 frames. ], batch size: 46, lr: 2.27e-02, grad_scale: 32.0 2024-09-22 19:15:11,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=83692.0, ans=0.125 2024-09-22 19:15:23,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=83692.0, ans=0.2 2024-09-22 19:15:50,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=83785.33333333333, ans=0.0 2024-09-22 19:16:30,676 INFO [train.py:1198] (0/4) Epoch 5, batch 2400, loss[loss=0.2838, ctc_loss=0.2059, cr_loss=0.39, over 17029.00 frames. ], tot_loss[loss=0.2994, ctc_loss=0.2182, cr_loss=0.4059, over 3356169.02 frames. ], batch size: 44, lr: 2.27e-02, grad_scale: 32.0 2024-09-22 19:16:31,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=83925.33333333333, ans=0.025 2024-09-22 19:16:42,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2024-09-22 19:16:43,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=83925.33333333333, ans=0.125 2024-09-22 19:17:10,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=84018.66666666667, ans=0.125 2024-09-22 19:17:19,656 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.205e+02 1.495e+02 1.658e+02 1.967e+02 2.763e+02, threshold=3.315e+02, percent-clipped=0.0 2024-09-22 19:17:23,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=84065.33333333333, ans=0.025 2024-09-22 19:17:36,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=84065.33333333333, ans=0.125 2024-09-22 19:17:55,429 INFO [train.py:1198] (0/4) Epoch 5, batch 2450, loss[loss=0.2895, ctc_loss=0.2096, cr_loss=0.3996, over 17220.00 frames. ], tot_loss[loss=0.2998, ctc_loss=0.2186, cr_loss=0.4061, over 3351068.08 frames. ], batch size: 50, lr: 2.26e-02, grad_scale: 32.0 2024-09-22 19:18:06,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=84158.66666666667, ans=0.05 2024-09-22 19:18:43,612 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:18:45,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.54 vs. limit=10.0 2024-09-22 19:18:50,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=84298.66666666667, ans=0.5 2024-09-22 19:18:53,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=84298.66666666667, ans=0.125 2024-09-22 19:19:04,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=84345.33333333333, ans=0.2 2024-09-22 19:19:15,419 INFO [train.py:1198] (0/4) Epoch 5, batch 2500, loss[loss=0.2932, ctc_loss=0.2113, cr_loss=0.4093, over 17284.00 frames. ], tot_loss[loss=0.3001, ctc_loss=0.2189, cr_loss=0.4061, over 3356537.03 frames. ], batch size: 49, lr: 2.26e-02, grad_scale: 32.0 2024-09-22 19:19:40,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=84438.66666666667, ans=0.0 2024-09-22 19:20:04,530 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.268e+02 1.488e+02 1.690e+02 1.912e+02 3.424e+02, threshold=3.381e+02, percent-clipped=1.0 2024-09-22 19:20:08,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=84532.0, ans=0.125 2024-09-22 19:20:13,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=84532.0, ans=0.1 2024-09-22 19:20:26,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=84578.66666666667, ans=0.2 2024-09-22 19:20:40,873 INFO [train.py:1198] (0/4) Epoch 5, batch 2550, loss[loss=0.3005, ctc_loss=0.2176, cr_loss=0.4144, over 17154.00 frames. ], tot_loss[loss=0.2997, ctc_loss=0.2185, cr_loss=0.4062, over 3358031.60 frames. ], batch size: 48, lr: 2.26e-02, grad_scale: 32.0 2024-09-22 19:20:45,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=84625.33333333333, ans=0.0 2024-09-22 19:22:03,011 INFO [train.py:1198] (0/4) Epoch 5, batch 2600, loss[loss=0.3108, ctc_loss=0.2244, cr_loss=0.432, over 17292.00 frames. ], tot_loss[loss=0.2985, ctc_loss=0.2174, cr_loss=0.4058, over 3371605.94 frames. ], batch size: 49, lr: 2.25e-02, grad_scale: 32.0 2024-09-22 19:22:04,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=22.5 2024-09-22 19:22:09,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=84858.66666666667, ans=0.125 2024-09-22 19:22:15,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=84858.66666666667, ans=0.0 2024-09-22 19:22:42,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=84952.0, ans=0.125 2024-09-22 19:22:51,551 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.221e+02 1.583e+02 1.835e+02 2.132e+02 3.981e+02, threshold=3.669e+02, percent-clipped=1.0 2024-09-22 19:23:24,864 INFO [train.py:1198] (0/4) Epoch 5, batch 2650, loss[loss=0.3186, ctc_loss=0.2294, cr_loss=0.446, over 17040.00 frames. ], tot_loss[loss=0.2983, ctc_loss=0.2171, cr_loss=0.4058, over 3366156.64 frames. ], batch size: 52, lr: 2.25e-02, grad_scale: 32.0 2024-09-22 19:23:44,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2024-09-22 19:23:48,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=85138.66666666667, ans=0.2 2024-09-22 19:23:56,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=85185.33333333333, ans=0.125 2024-09-22 19:23:58,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=85185.33333333333, ans=10.0 2024-09-22 19:24:00,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=85185.33333333333, ans=0.07 2024-09-22 19:24:02,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=85185.33333333333, ans=0.125 2024-09-22 19:24:06,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=85185.33333333333, ans=0.025 2024-09-22 19:24:46,096 INFO [train.py:1198] (0/4) Epoch 5, batch 2700, loss[loss=0.3391, ctc_loss=0.2521, cr_loss=0.4353, over 16082.00 frames. ], tot_loss[loss=0.2983, ctc_loss=0.2171, cr_loss=0.406, over 3360617.48 frames. ], batch size: 74, lr: 2.25e-02, grad_scale: 32.0 2024-09-22 19:25:08,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85372.0, ans=0.1 2024-09-22 19:25:12,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2024-09-22 19:25:19,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85418.66666666667, ans=0.1 2024-09-22 19:25:34,610 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.500e+02 1.639e+02 1.809e+02 2.394e+02, threshold=3.278e+02, percent-clipped=0.0 2024-09-22 19:25:49,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=85465.33333333333, ans=0.0 2024-09-22 19:25:49,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2024-09-22 19:26:08,132 INFO [train.py:1198] (0/4) Epoch 5, batch 2750, loss[loss=0.2573, ctc_loss=0.1871, cr_loss=0.351, over 17047.00 frames. ], tot_loss[loss=0.2981, ctc_loss=0.2168, cr_loss=0.4067, over 3365293.66 frames. ], batch size: 39, lr: 2.25e-02, grad_scale: 32.0 2024-09-22 19:26:09,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=85558.66666666667, ans=0.125 2024-09-22 19:26:17,992 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:26:36,719 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:26:39,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=85652.0, ans=0.09899494936611666 2024-09-22 19:26:41,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=85652.0, ans=0.0 2024-09-22 19:26:54,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=85652.0, ans=0.0 2024-09-22 19:27:05,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=85698.66666666667, ans=0.0 2024-09-22 19:27:12,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.00 vs. limit=10.0 2024-09-22 19:27:31,980 INFO [train.py:1198] (0/4) Epoch 5, batch 2800, loss[loss=0.3738, ctc_loss=0.2838, cr_loss=0.4499, over 11903.00 frames. ], tot_loss[loss=0.2985, ctc_loss=0.2173, cr_loss=0.4062, over 3354067.35 frames. ], batch size: 123, lr: 2.24e-02, grad_scale: 32.0 2024-09-22 19:27:40,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=85792.0, ans=0.1 2024-09-22 19:27:51,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85838.66666666667, ans=0.1 2024-09-22 19:27:56,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-09-22 19:27:59,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=85838.66666666667, ans=0.2 2024-09-22 19:28:05,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=85885.33333333333, ans=0.0 2024-09-22 19:28:18,241 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.220e+02 1.484e+02 1.665e+02 1.912e+02 3.153e+02, threshold=3.329e+02, percent-clipped=0.0 2024-09-22 19:28:28,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-09-22 19:28:32,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=85932.0, ans=0.125 2024-09-22 19:28:51,699 INFO [train.py:1198] (0/4) Epoch 5, batch 2850, loss[loss=0.278, ctc_loss=0.2043, cr_loss=0.3686, over 17201.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2178, cr_loss=0.4066, over 3352768.65 frames. ], batch size: 47, lr: 2.24e-02, grad_scale: 32.0 2024-09-22 19:28:56,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=86025.33333333333, ans=0.125 2024-09-22 19:29:04,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86025.33333333333, ans=0.0 2024-09-22 19:29:09,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2024-09-22 19:29:20,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=86072.0, ans=0.0 2024-09-22 19:29:40,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86165.33333333333, ans=0.1 2024-09-22 19:29:42,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=86165.33333333333, ans=0.2 2024-09-22 19:29:48,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=86165.33333333333, ans=0.125 2024-09-22 19:30:11,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=86212.0, ans=0.125 2024-09-22 19:30:16,021 INFO [train.py:1198] (0/4) Epoch 5, batch 2900, loss[loss=0.3078, ctc_loss=0.2249, cr_loss=0.4143, over 16901.00 frames. ], tot_loss[loss=0.2981, ctc_loss=0.217, cr_loss=0.4052, over 3360017.31 frames. ], batch size: 58, lr: 2.24e-02, grad_scale: 32.0 2024-09-22 19:30:23,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2024-09-22 19:30:46,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=86352.0, ans=0.125 2024-09-22 19:30:46,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=86352.0, ans=0.125 2024-09-22 19:31:01,934 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.291e+02 1.588e+02 1.741e+02 2.202e+02 4.410e+02, threshold=3.483e+02, percent-clipped=1.0 2024-09-22 19:31:11,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=86398.66666666667, ans=0.0 2024-09-22 19:31:35,117 INFO [train.py:1198] (0/4) Epoch 5, batch 2950, loss[loss=0.2981, ctc_loss=0.2128, cr_loss=0.4266, over 17073.00 frames. ], tot_loss[loss=0.2993, ctc_loss=0.2181, cr_loss=0.4063, over 3348957.20 frames. ], batch size: 46, lr: 2.24e-02, grad_scale: 32.0 2024-09-22 19:31:36,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=86492.0, ans=0.125 2024-09-22 19:31:41,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-22 19:31:47,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=86492.0, ans=0.125 2024-09-22 19:31:53,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=86538.66666666667, ans=0.125 2024-09-22 19:32:08,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=86585.33333333333, ans=0.09899494936611666 2024-09-22 19:32:26,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=86632.0, ans=0.0 2024-09-22 19:32:27,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.35 vs. limit=6.0 2024-09-22 19:32:31,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=86632.0, ans=0.2 2024-09-22 19:32:45,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86678.66666666667, ans=0.1 2024-09-22 19:32:59,036 INFO [train.py:1198] (0/4) Epoch 5, batch 3000, loss[loss=0.2703, ctc_loss=0.1969, cr_loss=0.3669, over 17060.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2181, cr_loss=0.4053, over 3346501.69 frames. ], batch size: 39, lr: 2.23e-02, grad_scale: 32.0 2024-09-22 19:32:59,037 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 19:33:07,810 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4937, 3.7327, 4.1176, 4.1624], device='cuda:0') 2024-09-22 19:33:14,595 INFO [train.py:1230] (0/4) Epoch 5, validation: loss=0.06642, ctc_loss=0.06642, cr_loss=7.381e-15, over 944034.00 frames. 2024-09-22 19:33:14,596 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 19:33:19,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=86725.33333333333, ans=0.125 2024-09-22 19:33:25,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=86725.33333333333, ans=0.1 2024-09-22 19:33:27,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2024-09-22 19:33:30,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=86772.0, ans=0.2 2024-09-22 19:33:39,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86772.0, ans=0.1 2024-09-22 19:33:46,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=86818.66666666667, ans=0.0 2024-09-22 19:34:00,034 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.256e+02 1.524e+02 1.804e+02 2.236e+02 6.139e+02, threshold=3.607e+02, percent-clipped=4.0 2024-09-22 19:34:01,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=86865.33333333333, ans=0.125 2024-09-22 19:34:08,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=86865.33333333333, ans=0.125 2024-09-22 19:34:11,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=86865.33333333333, ans=0.125 2024-09-22 19:34:17,733 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:34:25,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=86912.0, ans=0.07 2024-09-22 19:34:35,395 INFO [train.py:1198] (0/4) Epoch 5, batch 3050, loss[loss=0.3055, ctc_loss=0.223, cr_loss=0.4128, over 17289.00 frames. ], tot_loss[loss=0.2982, ctc_loss=0.2173, cr_loss=0.4048, over 3345167.82 frames. ], batch size: 49, lr: 2.23e-02, grad_scale: 32.0 2024-09-22 19:34:38,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=86958.66666666667, ans=0.0 2024-09-22 19:34:51,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=87005.33333333333, ans=0.0 2024-09-22 19:35:08,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=87052.0, ans=0.07 2024-09-22 19:35:14,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-09-22 19:35:23,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=87098.66666666667, ans=0.125 2024-09-22 19:35:35,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=15.0 2024-09-22 19:35:53,028 INFO [train.py:1198] (0/4) Epoch 5, batch 3100, loss[loss=0.2598, ctc_loss=0.1883, cr_loss=0.3574, over 16979.00 frames. ], tot_loss[loss=0.2982, ctc_loss=0.2171, cr_loss=0.4054, over 3354254.40 frames. ], batch size: 42, lr: 2.23e-02, grad_scale: 64.0 2024-09-22 19:35:59,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=87192.0, ans=0.2 2024-09-22 19:36:05,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=87192.0, ans=0.0 2024-09-22 19:36:07,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=87238.66666666667, ans=0.2 2024-09-22 19:36:10,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=87238.66666666667, ans=0.125 2024-09-22 19:36:15,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=87238.66666666667, ans=0.125 2024-09-22 19:36:36,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=87285.33333333333, ans=0.95 2024-09-22 19:36:37,948 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.213e+02 1.555e+02 1.774e+02 2.101e+02 3.567e+02, threshold=3.549e+02, percent-clipped=0.0 2024-09-22 19:36:41,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=87332.0, ans=0.0 2024-09-22 19:37:06,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=87378.66666666667, ans=0.0 2024-09-22 19:37:13,092 INFO [train.py:1198] (0/4) Epoch 5, batch 3150, loss[loss=0.3006, ctc_loss=0.2138, cr_loss=0.4338, over 17026.00 frames. ], tot_loss[loss=0.2976, ctc_loss=0.2166, cr_loss=0.4053, over 3355482.23 frames. ], batch size: 44, lr: 2.23e-02, grad_scale: 64.0 2024-09-22 19:38:05,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=87565.33333333333, ans=0.2 2024-09-22 19:38:31,403 INFO [train.py:1198] (0/4) Epoch 5, batch 3200, loss[loss=0.3626, ctc_loss=0.2785, cr_loss=0.4203, over 12475.00 frames. ], tot_loss[loss=0.297, ctc_loss=0.216, cr_loss=0.4052, over 3363733.93 frames. ], batch size: 123, lr: 2.22e-02, grad_scale: 32.0 2024-09-22 19:38:44,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-09-22 19:38:56,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=87705.33333333333, ans=0.125 2024-09-22 19:38:59,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87705.33333333333, ans=0.1 2024-09-22 19:39:14,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2024-09-22 19:39:18,460 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 1.520e+02 1.734e+02 1.974e+02 3.517e+02, threshold=3.467e+02, percent-clipped=0.0 2024-09-22 19:39:49,595 INFO [train.py:1198] (0/4) Epoch 5, batch 3250, loss[loss=0.2652, ctc_loss=0.1917, cr_loss=0.3677, over 17164.00 frames. ], tot_loss[loss=0.2976, ctc_loss=0.2164, cr_loss=0.4058, over 3359727.50 frames. ], batch size: 45, lr: 2.22e-02, grad_scale: 32.0 2024-09-22 19:39:49,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=87892.0, ans=0.125 2024-09-22 19:40:28,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=87985.33333333333, ans=0.0 2024-09-22 19:41:05,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-09-22 19:41:09,580 INFO [train.py:1198] (0/4) Epoch 5, batch 3300, loss[loss=0.2747, ctc_loss=0.1964, cr_loss=0.3919, over 17319.00 frames. ], tot_loss[loss=0.2973, ctc_loss=0.2162, cr_loss=0.4055, over 3357592.80 frames. ], batch size: 51, lr: 2.22e-02, grad_scale: 32.0 2024-09-22 19:41:17,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=88125.33333333333, ans=0.0 2024-09-22 19:41:31,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=88172.0, ans=0.2 2024-09-22 19:41:34,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=88172.0, ans=0.09899494936611666 2024-09-22 19:41:43,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=88218.66666666667, ans=0.125 2024-09-22 19:41:52,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=88218.66666666667, ans=0.0 2024-09-22 19:41:57,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2024-09-22 19:41:58,205 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.266e+02 1.544e+02 1.772e+02 2.234e+02 4.094e+02, threshold=3.543e+02, percent-clipped=4.0 2024-09-22 19:42:07,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-09-22 19:42:17,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=22.5 2024-09-22 19:42:18,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88312.0, ans=0.1 2024-09-22 19:42:21,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=88312.0, ans=0.125 2024-09-22 19:42:26,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=88312.0, ans=0.05 2024-09-22 19:42:29,375 INFO [train.py:1198] (0/4) Epoch 5, batch 3350, loss[loss=0.2424, ctc_loss=0.1739, cr_loss=0.3426, over 17244.00 frames. ], tot_loss[loss=0.2985, ctc_loss=0.2172, cr_loss=0.4064, over 3357964.60 frames. ], batch size: 42, lr: 2.22e-02, grad_scale: 32.0 2024-09-22 19:42:51,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=88405.33333333333, ans=0.2 2024-09-22 19:42:58,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=15.0 2024-09-22 19:43:03,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=88452.0, ans=0.0 2024-09-22 19:43:41,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88545.33333333333, ans=0.1 2024-09-22 19:43:47,477 INFO [train.py:1198] (0/4) Epoch 5, batch 3400, loss[loss=0.2979, ctc_loss=0.2158, cr_loss=0.4103, over 17348.00 frames. ], tot_loss[loss=0.2978, ctc_loss=0.2166, cr_loss=0.4059, over 3361374.10 frames. ], batch size: 48, lr: 2.21e-02, grad_scale: 32.0 2024-09-22 19:43:49,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=88592.0, ans=0.0 2024-09-22 19:43:52,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=88592.0, ans=0.05 2024-09-22 19:44:11,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=88638.66666666667, ans=0.0 2024-09-22 19:44:16,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=88638.66666666667, ans=0.125 2024-09-22 19:44:18,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=88685.33333333333, ans=0.1 2024-09-22 19:44:23,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=88685.33333333333, ans=0.0 2024-09-22 19:44:23,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=88685.33333333333, ans=0.125 2024-09-22 19:44:34,721 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.286e+02 1.562e+02 1.776e+02 2.153e+02 3.268e+02, threshold=3.552e+02, percent-clipped=0.0 2024-09-22 19:44:36,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=88732.0, ans=0.0 2024-09-22 19:44:36,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=88732.0, ans=0.125 2024-09-22 19:44:45,598 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:44:50,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=88778.66666666667, ans=0.125 2024-09-22 19:45:05,686 INFO [train.py:1198] (0/4) Epoch 5, batch 3450, loss[loss=0.306, ctc_loss=0.2236, cr_loss=0.4116, over 17224.00 frames. ], tot_loss[loss=0.2991, ctc_loss=0.2176, cr_loss=0.4073, over 3355594.45 frames. ], batch size: 50, lr: 2.21e-02, grad_scale: 32.0 2024-09-22 19:45:20,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=88825.33333333333, ans=0.1 2024-09-22 19:45:36,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2024-09-22 19:45:58,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=15.0 2024-09-22 19:46:24,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-09-22 19:46:25,705 INFO [train.py:1198] (0/4) Epoch 5, batch 3500, loss[loss=0.2834, ctc_loss=0.2038, cr_loss=0.3978, over 17233.00 frames. ], tot_loss[loss=0.2992, ctc_loss=0.2178, cr_loss=0.4071, over 3342651.09 frames. ], batch size: 44, lr: 2.21e-02, grad_scale: 32.0 2024-09-22 19:46:34,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-09-22 19:46:40,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-09-22 19:47:14,315 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.466e+02 1.595e+02 1.829e+02 3.245e+02, threshold=3.189e+02, percent-clipped=0.0 2024-09-22 19:47:16,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=89198.66666666667, ans=0.2 2024-09-22 19:47:26,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=89198.66666666667, ans=0.125 2024-09-22 19:47:34,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=89245.33333333333, ans=0.0 2024-09-22 19:47:45,032 INFO [train.py:1198] (0/4) Epoch 5, batch 3550, loss[loss=0.295, ctc_loss=0.2154, cr_loss=0.3978, over 16780.00 frames. ], tot_loss[loss=0.2996, ctc_loss=0.218, cr_loss=0.408, over 3344427.78 frames. ], batch size: 61, lr: 2.21e-02, grad_scale: 32.0 2024-09-22 19:48:04,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=89338.66666666667, ans=0.125 2024-09-22 19:48:18,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=89385.33333333333, ans=0.025 2024-09-22 19:48:22,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=89385.33333333333, ans=0.125 2024-09-22 19:48:22,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2024-09-22 19:48:32,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=89432.0, ans=15.0 2024-09-22 19:48:35,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89432.0, ans=0.1 2024-09-22 19:48:36,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=89432.0, ans=0.125 2024-09-22 19:49:02,837 INFO [train.py:1198] (0/4) Epoch 5, batch 3600, loss[loss=0.29, ctc_loss=0.2069, cr_loss=0.4154, over 17262.00 frames. ], tot_loss[loss=0.2984, ctc_loss=0.217, cr_loss=0.4071, over 3345300.73 frames. ], batch size: 44, lr: 2.20e-02, grad_scale: 32.0 2024-09-22 19:49:17,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.66 vs. limit=10.0 2024-09-22 19:49:25,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.57 vs. limit=22.5 2024-09-22 19:49:31,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89572.0, ans=0.1 2024-09-22 19:49:42,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=89618.66666666667, ans=0.125 2024-09-22 19:49:49,381 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.445e+02 1.594e+02 1.731e+02 2.971e+02, threshold=3.187e+02, percent-clipped=0.0 2024-09-22 19:50:02,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=89665.33333333333, ans=0.09899494936611666 2024-09-22 19:50:17,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-09-22 19:50:19,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=89758.66666666667, ans=0.025 2024-09-22 19:50:22,568 INFO [train.py:1198] (0/4) Epoch 5, batch 3650, loss[loss=0.3291, ctc_loss=0.2334, cr_loss=0.4785, over 17207.00 frames. ], tot_loss[loss=0.2979, ctc_loss=0.2165, cr_loss=0.407, over 3343419.68 frames. ], batch size: 55, lr: 2.20e-02, grad_scale: 32.0 2024-09-22 19:50:39,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=89805.33333333333, ans=0.025 2024-09-22 19:51:16,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-09-22 19:51:26,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=89945.33333333333, ans=0.0 2024-09-22 19:51:33,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=89945.33333333333, ans=0.125 2024-09-22 19:51:38,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=89945.33333333333, ans=0.125 2024-09-22 19:51:40,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.10 vs. limit=22.5 2024-09-22 19:51:42,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2024-09-22 19:51:43,157 INFO [train.py:1198] (0/4) Epoch 5, batch 3700, loss[loss=0.3514, ctc_loss=0.2694, cr_loss=0.4104, over 12057.00 frames. ], tot_loss[loss=0.298, ctc_loss=0.2166, cr_loss=0.4068, over 3344381.11 frames. ], batch size: 123, lr: 2.20e-02, grad_scale: 32.0 2024-09-22 19:51:43,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=89992.0, ans=0.05 2024-09-22 19:51:51,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=89992.0, ans=0.2 2024-09-22 19:52:09,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90038.66666666667, ans=0.1 2024-09-22 19:52:20,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90085.33333333333, ans=0.1 2024-09-22 19:52:24,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.80 vs. limit=22.5 2024-09-22 19:52:29,541 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.256e+02 1.562e+02 1.758e+02 2.028e+02 3.638e+02, threshold=3.517e+02, percent-clipped=1.0 2024-09-22 19:52:32,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-09-22 19:52:34,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=90132.0, ans=0.0 2024-09-22 19:52:36,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2024-09-22 19:53:01,152 INFO [train.py:1198] (0/4) Epoch 5, batch 3750, loss[loss=0.3135, ctc_loss=0.2233, cr_loss=0.4507, over 17319.00 frames. ], tot_loss[loss=0.2974, ctc_loss=0.2162, cr_loss=0.4061, over 3346877.28 frames. ], batch size: 51, lr: 2.20e-02, grad_scale: 32.0 2024-09-22 19:53:06,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2024-09-22 19:53:31,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=90318.66666666667, ans=0.125 2024-09-22 19:53:50,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=90365.33333333333, ans=0.125 2024-09-22 19:54:01,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=90365.33333333333, ans=0.0 2024-09-22 19:54:06,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=90412.0, ans=0.125 2024-09-22 19:54:19,783 INFO [train.py:1198] (0/4) Epoch 5, batch 3800, loss[loss=0.3119, ctc_loss=0.2281, cr_loss=0.419, over 17028.00 frames. ], tot_loss[loss=0.2988, ctc_loss=0.2176, cr_loss=0.4063, over 3323513.91 frames. ], batch size: 56, lr: 2.19e-02, grad_scale: 32.0 2024-09-22 19:54:37,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=90505.33333333333, ans=0.025 2024-09-22 19:54:47,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=90505.33333333333, ans=0.07 2024-09-22 19:54:48,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=90505.33333333333, ans=0.0 2024-09-22 19:55:06,884 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.208e+02 1.578e+02 1.852e+02 2.152e+02 4.120e+02, threshold=3.704e+02, percent-clipped=2.0 2024-09-22 19:55:34,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2024-09-22 19:55:38,742 INFO [train.py:1198] (0/4) Epoch 5, batch 3850, loss[loss=0.3666, ctc_loss=0.2816, cr_loss=0.4253, over 11808.00 frames. ], tot_loss[loss=0.3004, ctc_loss=0.2192, cr_loss=0.4057, over 3284956.51 frames. ], batch size: 124, lr: 2.19e-02, grad_scale: 32.0 2024-09-22 19:55:46,776 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 19:55:51,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=90692.0, ans=15.0 2024-09-22 19:56:21,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90785.33333333333, ans=0.1 2024-09-22 19:56:22,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2024-09-22 19:56:39,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=90878.66666666667, ans=0.125 2024-09-22 19:56:48,350 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-5.pt 2024-09-22 19:57:39,904 INFO [train.py:1198] (0/4) Epoch 6, batch 0, loss[loss=0.3172, ctc_loss=0.2302, cr_loss=0.435, over 17307.00 frames. ], tot_loss[loss=0.3172, ctc_loss=0.2302, cr_loss=0.435, over 17307.00 frames. ], batch size: 46, lr: 2.04e-02, grad_scale: 32.0 2024-09-22 19:57:39,905 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 19:57:55,111 INFO [train.py:1230] (0/4) Epoch 6, validation: loss=0.06886, ctc_loss=0.06886, cr_loss=9.986e-15, over 944034.00 frames. 2024-09-22 19:57:55,111 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 19:58:11,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=90953.33333333333, ans=0.125 2024-09-22 19:58:21,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-09-22 19:58:26,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.42 vs. limit=10.0 2024-09-22 19:58:29,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=91000.0, ans=0.125 2024-09-22 19:58:29,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-09-22 19:58:37,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=91000.0, ans=0.0 2024-09-22 19:58:51,239 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.268e+02 1.570e+02 1.853e+02 2.174e+02 4.194e+02, threshold=3.706e+02, percent-clipped=2.0 2024-09-22 19:59:16,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=91093.33333333333, ans=0.025 2024-09-22 19:59:19,095 INFO [train.py:1198] (0/4) Epoch 6, batch 50, loss[loss=0.2544, ctc_loss=0.1807, cr_loss=0.3682, over 17288.00 frames. ], tot_loss[loss=0.3001, ctc_loss=0.2183, cr_loss=0.4089, over 756150.43 frames. ], batch size: 42, lr: 2.04e-02, grad_scale: 32.0 2024-09-22 19:59:20,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91140.0, ans=0.125 2024-09-22 19:59:32,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=91140.0, ans=0.125 2024-09-22 19:59:32,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=91140.0, ans=0.125 2024-09-22 19:59:43,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=91186.66666666667, ans=0.125 2024-09-22 20:00:06,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-09-22 20:00:41,135 INFO [train.py:1198] (0/4) Epoch 6, batch 100, loss[loss=0.3391, ctc_loss=0.2521, cr_loss=0.435, over 17013.00 frames. ], tot_loss[loss=0.2969, ctc_loss=0.215, cr_loss=0.4094, over 1341923.66 frames. ], batch size: 53, lr: 2.04e-02, grad_scale: 32.0 2024-09-22 20:01:01,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=91420.0, ans=0.125 2024-09-22 20:01:18,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91466.66666666667, ans=0.1 2024-09-22 20:01:26,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=91466.66666666667, ans=0.0 2024-09-22 20:01:28,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=22.5 2024-09-22 20:01:35,327 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.210e+02 1.399e+02 1.624e+02 1.941e+02 3.446e+02, threshold=3.247e+02, percent-clipped=0.0 2024-09-22 20:02:00,849 INFO [train.py:1198] (0/4) Epoch 6, batch 150, loss[loss=0.2839, ctc_loss=0.2068, cr_loss=0.3854, over 17147.00 frames. ], tot_loss[loss=0.2954, ctc_loss=0.2142, cr_loss=0.4061, over 1781220.56 frames. ], batch size: 48, lr: 2.04e-02, grad_scale: 32.0 2024-09-22 20:02:04,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=91606.66666666667, ans=0.0 2024-09-22 20:02:17,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=91653.33333333333, ans=0.125 2024-09-22 20:03:00,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-22 20:03:04,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=91746.66666666667, ans=0.125 2024-09-22 20:03:13,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91793.33333333333, ans=0.1 2024-09-22 20:03:25,936 INFO [train.py:1198] (0/4) Epoch 6, batch 200, loss[loss=0.2981, ctc_loss=0.2174, cr_loss=0.4032, over 17207.00 frames. ], tot_loss[loss=0.2946, ctc_loss=0.2135, cr_loss=0.4055, over 2140601.42 frames. ], batch size: 50, lr: 2.03e-02, grad_scale: 32.0 2024-09-22 20:03:31,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=91840.0, ans=0.0 2024-09-22 20:03:34,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=91840.0, ans=0.125 2024-09-22 20:04:09,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=91933.33333333333, ans=0.2 2024-09-22 20:04:17,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=91980.0, ans=10.0 2024-09-22 20:04:25,624 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.298e+02 1.556e+02 1.826e+02 2.254e+02 3.362e+02, threshold=3.652e+02, percent-clipped=2.0 2024-09-22 20:04:27,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91980.0, ans=0.125 2024-09-22 20:04:46,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=92026.66666666667, ans=0.125 2024-09-22 20:04:51,204 INFO [train.py:1198] (0/4) Epoch 6, batch 250, loss[loss=0.2936, ctc_loss=0.2134, cr_loss=0.4009, over 17297.00 frames. ], tot_loss[loss=0.2945, ctc_loss=0.2133, cr_loss=0.4057, over 2407741.03 frames. ], batch size: 49, lr: 2.03e-02, grad_scale: 32.0 2024-09-22 20:05:04,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92073.33333333333, ans=0.1 2024-09-22 20:05:28,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2024-09-22 20:05:48,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=92213.33333333333, ans=0.0 2024-09-22 20:05:54,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92260.0, ans=0.1 2024-09-22 20:06:10,522 INFO [train.py:1198] (0/4) Epoch 6, batch 300, loss[loss=0.3178, ctc_loss=0.2376, cr_loss=0.4007, over 16723.00 frames. ], tot_loss[loss=0.2931, ctc_loss=0.2122, cr_loss=0.4048, over 2619798.51 frames. ], batch size: 61, lr: 2.03e-02, grad_scale: 32.0 2024-09-22 20:06:20,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92306.66666666667, ans=0.125 2024-09-22 20:06:39,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2024-09-22 20:07:04,352 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.190e+02 1.501e+02 1.679e+02 1.995e+02 3.588e+02, threshold=3.358e+02, percent-clipped=0.0 2024-09-22 20:07:07,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=92446.66666666667, ans=0.125 2024-09-22 20:07:22,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=92493.33333333333, ans=0.125 2024-09-22 20:07:24,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-09-22 20:07:32,319 INFO [train.py:1198] (0/4) Epoch 6, batch 350, loss[loss=0.297, ctc_loss=0.2111, cr_loss=0.4292, over 17215.00 frames. ], tot_loss[loss=0.2929, ctc_loss=0.2121, cr_loss=0.4044, over 2777770.92 frames. ], batch size: 47, lr: 2.03e-02, grad_scale: 32.0 2024-09-22 20:07:43,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=92540.0, ans=0.125 2024-09-22 20:08:44,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=22.5 2024-09-22 20:08:46,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-09-22 20:08:55,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=92773.33333333333, ans=0.2 2024-09-22 20:08:57,316 INFO [train.py:1198] (0/4) Epoch 6, batch 400, loss[loss=0.2498, ctc_loss=0.1746, cr_loss=0.3761, over 17073.00 frames. ], tot_loss[loss=0.2931, ctc_loss=0.2123, cr_loss=0.404, over 2901127.56 frames. ], batch size: 46, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:09:19,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92820.0, ans=0.1 2024-09-22 20:09:39,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-09-22 20:09:51,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=92913.33333333333, ans=0.0 2024-09-22 20:09:54,060 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.481e+02 1.649e+02 1.890e+02 2.985e+02, threshold=3.299e+02, percent-clipped=0.0 2024-09-22 20:10:12,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-09-22 20:10:19,709 INFO [train.py:1198] (0/4) Epoch 6, batch 450, loss[loss=0.3397, ctc_loss=0.257, cr_loss=0.4132, over 12255.00 frames. ], tot_loss[loss=0.2934, ctc_loss=0.2125, cr_loss=0.4042, over 2996463.91 frames. ], batch size: 123, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:10:23,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=93006.66666666667, ans=0.125 2024-09-22 20:10:23,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=93006.66666666667, ans=0.0 2024-09-22 20:10:53,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=93100.0, ans=0.2 2024-09-22 20:11:39,001 INFO [train.py:1198] (0/4) Epoch 6, batch 500, loss[loss=0.2626, ctc_loss=0.1891, cr_loss=0.3676, over 16968.00 frames. ], tot_loss[loss=0.2917, ctc_loss=0.2112, cr_loss=0.4023, over 3077464.96 frames. ], batch size: 42, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:11:40,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=93240.0, ans=0.025 2024-09-22 20:11:48,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93240.0, ans=0.1 2024-09-22 20:11:50,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=93240.0, ans=0.125 2024-09-22 20:12:08,843 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-20000.pt 2024-09-22 20:12:28,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=93333.33333333333, ans=0.025 2024-09-22 20:12:31,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=93380.0, ans=0.125 2024-09-22 20:12:37,312 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.459e+02 1.650e+02 1.980e+02 2.967e+02, threshold=3.301e+02, percent-clipped=0.0 2024-09-22 20:12:52,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=93426.66666666667, ans=0.0 2024-09-22 20:12:56,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=93426.66666666667, ans=0.125 2024-09-22 20:12:57,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=93426.66666666667, ans=0.125 2024-09-22 20:13:05,203 INFO [train.py:1198] (0/4) Epoch 6, batch 550, loss[loss=0.3099, ctc_loss=0.2259, cr_loss=0.4198, over 17141.00 frames. ], tot_loss[loss=0.291, ctc_loss=0.2106, cr_loss=0.4022, over 3145033.73 frames. ], batch size: 48, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:13:26,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=93520.0, ans=0.0 2024-09-22 20:13:27,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=93520.0, ans=0.2 2024-09-22 20:13:43,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=93566.66666666667, ans=0.95 2024-09-22 20:13:55,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=93613.33333333333, ans=0.0 2024-09-22 20:14:29,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=93706.66666666667, ans=0.125 2024-09-22 20:14:30,622 INFO [train.py:1198] (0/4) Epoch 6, batch 600, loss[loss=0.2972, ctc_loss=0.2129, cr_loss=0.4215, over 17205.00 frames. ], tot_loss[loss=0.2915, ctc_loss=0.2108, cr_loss=0.4033, over 3194480.82 frames. ], batch size: 47, lr: 2.02e-02, grad_scale: 32.0 2024-09-22 20:14:46,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=93753.33333333333, ans=0.0 2024-09-22 20:14:50,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2024-09-22 20:15:02,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=93800.0, ans=0.0 2024-09-22 20:15:15,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=93800.0, ans=0.125 2024-09-22 20:15:24,958 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.461e+02 1.595e+02 1.846e+02 3.407e+02, threshold=3.191e+02, percent-clipped=1.0 2024-09-22 20:15:41,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=93893.33333333333, ans=0.07 2024-09-22 20:15:44,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=93893.33333333333, ans=0.125 2024-09-22 20:15:50,711 INFO [train.py:1198] (0/4) Epoch 6, batch 650, loss[loss=0.2728, ctc_loss=0.197, cr_loss=0.3789, over 17109.00 frames. ], tot_loss[loss=0.2912, ctc_loss=0.2106, cr_loss=0.4032, over 3229058.54 frames. ], batch size: 49, lr: 2.01e-02, grad_scale: 32.0 2024-09-22 20:15:51,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-22 20:16:06,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=93986.66666666667, ans=0.025 2024-09-22 20:16:20,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=94033.33333333333, ans=0.125 2024-09-22 20:16:40,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=94080.0, ans=0.125 2024-09-22 20:17:09,773 INFO [train.py:1198] (0/4) Epoch 6, batch 700, loss[loss=0.2736, ctc_loss=0.1913, cr_loss=0.4114, over 16970.00 frames. ], tot_loss[loss=0.2912, ctc_loss=0.2106, cr_loss=0.4032, over 3256654.11 frames. ], batch size: 42, lr: 2.01e-02, grad_scale: 32.0 2024-09-22 20:18:09,123 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.224e+02 1.435e+02 1.629e+02 1.890e+02 2.825e+02, threshold=3.258e+02, percent-clipped=0.0 2024-09-22 20:18:15,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=94313.33333333333, ans=0.95 2024-09-22 20:18:20,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=94360.0, ans=0.0 2024-09-22 20:18:28,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=94360.0, ans=0.0 2024-09-22 20:18:30,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=94360.0, ans=0.05 2024-09-22 20:18:34,804 INFO [train.py:1198] (0/4) Epoch 6, batch 750, loss[loss=0.2743, ctc_loss=0.2003, cr_loss=0.3702, over 17333.00 frames. ], tot_loss[loss=0.2931, ctc_loss=0.2122, cr_loss=0.4045, over 3266241.26 frames. ], batch size: 48, lr: 2.01e-02, grad_scale: 32.0 2024-09-22 20:19:15,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=94500.0, ans=0.2 2024-09-22 20:19:33,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=94546.66666666667, ans=0.0 2024-09-22 20:19:35,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2024-09-22 20:19:38,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=94546.66666666667, ans=0.125 2024-09-22 20:19:58,981 INFO [train.py:1198] (0/4) Epoch 6, batch 800, loss[loss=0.2917, ctc_loss=0.2148, cr_loss=0.3844, over 17242.00 frames. ], tot_loss[loss=0.2945, ctc_loss=0.2133, cr_loss=0.4058, over 3287700.97 frames. ], batch size: 50, lr: 2.01e-02, grad_scale: 32.0 2024-09-22 20:20:09,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=94640.0, ans=22.5 2024-09-22 20:20:29,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=94733.33333333333, ans=0.0 2024-09-22 20:20:32,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=94733.33333333333, ans=0.0 2024-09-22 20:20:33,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=94733.33333333333, ans=0.02 2024-09-22 20:20:39,815 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2024-09-22 20:20:42,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.47 vs. limit=10.0 2024-09-22 20:20:53,106 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.473e+02 1.596e+02 1.875e+02 3.402e+02, threshold=3.192e+02, percent-clipped=2.0 2024-09-22 20:21:07,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=94826.66666666667, ans=0.125 2024-09-22 20:21:18,627 INFO [train.py:1198] (0/4) Epoch 6, batch 850, loss[loss=0.305, ctc_loss=0.2148, cr_loss=0.451, over 17019.00 frames. ], tot_loss[loss=0.2932, ctc_loss=0.2121, cr_loss=0.4053, over 3305425.07 frames. ], batch size: 56, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:21:27,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=94873.33333333333, ans=0.125 2024-09-22 20:21:36,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94920.0, ans=0.1 2024-09-22 20:21:46,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=94920.0, ans=0.125 2024-09-22 20:21:48,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=94920.0, ans=0.2 2024-09-22 20:22:00,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=94966.66666666667, ans=0.125 2024-09-22 20:22:02,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94966.66666666667, ans=0.1 2024-09-22 20:22:38,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2024-09-22 20:22:43,772 INFO [train.py:1198] (0/4) Epoch 6, batch 900, loss[loss=0.234, ctc_loss=0.1667, cr_loss=0.3365, over 17084.00 frames. ], tot_loss[loss=0.2927, ctc_loss=0.2116, cr_loss=0.4051, over 3323725.14 frames. ], batch size: 43, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:22:50,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=95106.66666666667, ans=0.125 2024-09-22 20:22:58,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.75 vs. limit=6.0 2024-09-22 20:23:14,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=95200.0, ans=0.125 2024-09-22 20:23:17,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=95200.0, ans=0.125 2024-09-22 20:23:24,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=95200.0, ans=0.125 2024-09-22 20:23:31,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=95246.66666666667, ans=0.0 2024-09-22 20:23:32,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2024-09-22 20:23:37,721 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.204e+02 1.462e+02 1.630e+02 1.916e+02 2.984e+02, threshold=3.259e+02, percent-clipped=0.0 2024-09-22 20:23:40,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-09-22 20:23:51,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=95293.33333333333, ans=0.1 2024-09-22 20:23:59,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=95293.33333333333, ans=0.125 2024-09-22 20:23:59,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=95293.33333333333, ans=0.125 2024-09-22 20:24:05,923 INFO [train.py:1198] (0/4) Epoch 6, batch 950, loss[loss=0.3413, ctc_loss=0.2483, cr_loss=0.4653, over 17027.00 frames. ], tot_loss[loss=0.2918, ctc_loss=0.2109, cr_loss=0.4047, over 3335054.93 frames. ], batch size: 56, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:24:21,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95340.0, ans=0.1 2024-09-22 20:24:21,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=95340.0, ans=0.1 2024-09-22 20:24:42,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95433.33333333333, ans=0.1 2024-09-22 20:25:17,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95526.66666666667, ans=0.1 2024-09-22 20:25:20,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=95526.66666666667, ans=0.0 2024-09-22 20:25:28,488 INFO [train.py:1198] (0/4) Epoch 6, batch 1000, loss[loss=0.2915, ctc_loss=0.2109, cr_loss=0.4029, over 16998.00 frames. ], tot_loss[loss=0.2916, ctc_loss=0.2107, cr_loss=0.4047, over 3347384.16 frames. ], batch size: 53, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:25:38,612 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-22 20:25:46,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=95620.0, ans=0.125 2024-09-22 20:25:54,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=95620.0, ans=0.025 2024-09-22 20:26:21,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-09-22 20:26:22,800 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.223e+02 1.403e+02 1.529e+02 1.831e+02 2.517e+02, threshold=3.058e+02, percent-clipped=0.0 2024-09-22 20:26:34,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=95760.0, ans=0.2 2024-09-22 20:26:48,477 INFO [train.py:1198] (0/4) Epoch 6, batch 1050, loss[loss=0.2431, ctc_loss=0.1705, cr_loss=0.3628, over 17269.00 frames. ], tot_loss[loss=0.2906, ctc_loss=0.2098, cr_loss=0.4039, over 3348816.12 frames. ], batch size: 42, lr: 2.00e-02, grad_scale: 32.0 2024-09-22 20:26:58,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=95806.66666666667, ans=0.125 2024-09-22 20:26:59,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=95806.66666666667, ans=0.025 2024-09-22 20:27:09,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-09-22 20:27:49,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=95946.66666666667, ans=0.125 2024-09-22 20:27:54,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=95946.66666666667, ans=0.125 2024-09-22 20:28:06,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=95993.33333333333, ans=0.5 2024-09-22 20:28:13,100 INFO [train.py:1198] (0/4) Epoch 6, batch 1100, loss[loss=0.2968, ctc_loss=0.2183, cr_loss=0.3924, over 17323.00 frames. ], tot_loss[loss=0.2905, ctc_loss=0.2097, cr_loss=0.4037, over 3352620.94 frames. ], batch size: 51, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:28:21,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=96040.0, ans=0.125 2024-09-22 20:29:12,627 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.277e+02 1.448e+02 1.600e+02 1.818e+02 3.191e+02, threshold=3.201e+02, percent-clipped=3.0 2024-09-22 20:29:22,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=96226.66666666667, ans=0.125 2024-09-22 20:29:22,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=96226.66666666667, ans=0.125 2024-09-22 20:29:38,029 INFO [train.py:1198] (0/4) Epoch 6, batch 1150, loss[loss=0.3004, ctc_loss=0.218, cr_loss=0.4118, over 16780.00 frames. ], tot_loss[loss=0.2905, ctc_loss=0.2098, cr_loss=0.4038, over 3357581.94 frames. ], batch size: 61, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:29:41,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=96273.33333333333, ans=0.125 2024-09-22 20:29:44,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=96273.33333333333, ans=0.125 2024-09-22 20:30:00,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=96320.0, ans=0.125 2024-09-22 20:30:11,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=96366.66666666667, ans=0.125 2024-09-22 20:30:25,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=96413.33333333333, ans=0.125 2024-09-22 20:30:35,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-09-22 20:30:51,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=96460.0, ans=0.0 2024-09-22 20:30:51,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=96460.0, ans=0.2 2024-09-22 20:30:57,290 INFO [train.py:1198] (0/4) Epoch 6, batch 1200, loss[loss=0.2934, ctc_loss=0.2131, cr_loss=0.4015, over 17251.00 frames. ], tot_loss[loss=0.2904, ctc_loss=0.2097, cr_loss=0.4037, over 3363945.83 frames. ], batch size: 44, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:31:00,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=96506.66666666667, ans=0.125 2024-09-22 20:31:07,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=96506.66666666667, ans=0.2 2024-09-22 20:31:21,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-09-22 20:31:50,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=96646.66666666667, ans=0.05 2024-09-22 20:31:51,564 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.470e+02 1.636e+02 2.013e+02 4.309e+02, threshold=3.271e+02, percent-clipped=1.0 2024-09-22 20:31:55,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96646.66666666667, ans=0.1 2024-09-22 20:32:12,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96693.33333333333, ans=0.1 2024-09-22 20:32:17,038 INFO [train.py:1198] (0/4) Epoch 6, batch 1250, loss[loss=0.2577, ctc_loss=0.1867, cr_loss=0.355, over 17129.00 frames. ], tot_loss[loss=0.2896, ctc_loss=0.209, cr_loss=0.4033, over 3369731.26 frames. ], batch size: 40, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:32:18,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=96740.0, ans=0.125 2024-09-22 20:32:20,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=96740.0, ans=0.0 2024-09-22 20:32:41,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=96786.66666666667, ans=0.125 2024-09-22 20:33:01,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2024-09-22 20:33:02,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=96833.33333333333, ans=0.0 2024-09-22 20:33:02,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96833.33333333333, ans=0.1 2024-09-22 20:33:43,910 INFO [train.py:1198] (0/4) Epoch 6, batch 1300, loss[loss=0.2565, ctc_loss=0.1838, cr_loss=0.3637, over 17255.00 frames. ], tot_loss[loss=0.2896, ctc_loss=0.2089, cr_loss=0.4034, over 3368557.57 frames. ], batch size: 44, lr: 1.99e-02, grad_scale: 32.0 2024-09-22 20:34:36,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=97113.33333333333, ans=0.0 2024-09-22 20:34:39,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=97113.33333333333, ans=0.1 2024-09-22 20:34:42,526 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.454e+02 1.645e+02 1.943e+02 2.545e+02, threshold=3.291e+02, percent-clipped=0.0 2024-09-22 20:34:42,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=97113.33333333333, ans=0.025 2024-09-22 20:34:52,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=97160.0, ans=0.0 2024-09-22 20:35:06,836 INFO [train.py:1198] (0/4) Epoch 6, batch 1350, loss[loss=0.2596, ctc_loss=0.1831, cr_loss=0.3823, over 17099.00 frames. ], tot_loss[loss=0.2887, ctc_loss=0.2082, cr_loss=0.4028, over 3366471.09 frames. ], batch size: 43, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:35:30,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=97253.33333333333, ans=0.025 2024-09-22 20:35:56,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2024-09-22 20:35:57,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2024-09-22 20:36:13,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=97393.33333333333, ans=0.125 2024-09-22 20:36:25,756 INFO [train.py:1198] (0/4) Epoch 6, batch 1400, loss[loss=0.2998, ctc_loss=0.2149, cr_loss=0.4243, over 17212.00 frames. ], tot_loss[loss=0.2888, ctc_loss=0.2082, cr_loss=0.4029, over 3359800.90 frames. ], batch size: 47, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:36:26,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=97440.0, ans=0.0 2024-09-22 20:37:01,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97533.33333333333, ans=0.1 2024-09-22 20:37:24,530 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.201e+02 1.419e+02 1.584e+02 1.972e+02 3.775e+02, threshold=3.168e+02, percent-clipped=1.0 2024-09-22 20:37:26,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=97580.0, ans=0.0 2024-09-22 20:37:34,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=97626.66666666667, ans=0.1 2024-09-22 20:37:48,279 INFO [train.py:1198] (0/4) Epoch 6, batch 1450, loss[loss=0.2305, ctc_loss=0.159, cr_loss=0.3576, over 17116.00 frames. ], tot_loss[loss=0.287, ctc_loss=0.2068, cr_loss=0.4009, over 3366029.80 frames. ], batch size: 40, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:37:51,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=97673.33333333333, ans=0.025 2024-09-22 20:37:58,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2024-09-22 20:38:07,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=97720.0, ans=0.125 2024-09-22 20:38:41,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=97813.33333333333, ans=0.125 2024-09-22 20:38:45,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=97813.33333333333, ans=0.0 2024-09-22 20:39:12,833 INFO [train.py:1198] (0/4) Epoch 6, batch 1500, loss[loss=0.2463, ctc_loss=0.179, cr_loss=0.3368, over 17079.00 frames. ], tot_loss[loss=0.287, ctc_loss=0.2067, cr_loss=0.4014, over 3369515.95 frames. ], batch size: 43, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:39:17,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=15.0 2024-09-22 20:39:19,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=97906.66666666667, ans=0.125 2024-09-22 20:40:09,174 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.237e+02 1.478e+02 1.585e+02 1.791e+02 2.453e+02, threshold=3.170e+02, percent-clipped=0.0 2024-09-22 20:40:11,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2024-09-22 20:40:32,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.00 vs. limit=5.0 2024-09-22 20:40:33,020 INFO [train.py:1198] (0/4) Epoch 6, batch 1550, loss[loss=0.2779, ctc_loss=0.1989, cr_loss=0.3948, over 17317.00 frames. ], tot_loss[loss=0.2891, ctc_loss=0.2085, cr_loss=0.4031, over 3351621.26 frames. ], batch size: 51, lr: 1.98e-02, grad_scale: 32.0 2024-09-22 20:40:57,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=98186.66666666667, ans=0.0 2024-09-22 20:41:21,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=98280.0, ans=0.0 2024-09-22 20:41:36,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=98326.66666666667, ans=0.125 2024-09-22 20:41:41,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=98326.66666666667, ans=0.0 2024-09-22 20:41:51,964 INFO [train.py:1198] (0/4) Epoch 6, batch 1600, loss[loss=0.2677, ctc_loss=0.1892, cr_loss=0.3926, over 16682.00 frames. ], tot_loss[loss=0.2895, ctc_loss=0.2089, cr_loss=0.403, over 3345734.94 frames. ], batch size: 37, lr: 1.97e-02, grad_scale: 32.0 2024-09-22 20:42:48,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=98513.33333333333, ans=0.0 2024-09-22 20:42:52,417 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.223e+02 1.461e+02 1.657e+02 2.022e+02 3.350e+02, threshold=3.314e+02, percent-clipped=2.0 2024-09-22 20:42:56,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=98513.33333333333, ans=0.05 2024-09-22 20:43:02,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=98560.0, ans=0.2 2024-09-22 20:43:16,607 INFO [train.py:1198] (0/4) Epoch 6, batch 1650, loss[loss=0.2351, ctc_loss=0.1664, cr_loss=0.3434, over 17189.00 frames. ], tot_loss[loss=0.2884, ctc_loss=0.2078, cr_loss=0.4029, over 3351849.86 frames. ], batch size: 41, lr: 1.97e-02, grad_scale: 32.0 2024-09-22 20:43:20,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=98606.66666666667, ans=0.125 2024-09-22 20:43:33,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=98653.33333333333, ans=0.0 2024-09-22 20:43:56,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=98700.0, ans=0.125 2024-09-22 20:44:08,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98746.66666666667, ans=0.1 2024-09-22 20:44:27,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98793.33333333333, ans=0.1 2024-09-22 20:44:40,255 INFO [train.py:1198] (0/4) Epoch 6, batch 1700, loss[loss=0.3127, ctc_loss=0.2308, cr_loss=0.4096, over 17010.00 frames. ], tot_loss[loss=0.2879, ctc_loss=0.2074, cr_loss=0.4028, over 3352313.33 frames. ], batch size: 51, lr: 1.97e-02, grad_scale: 32.0 2024-09-22 20:45:20,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=98933.33333333333, ans=0.125 2024-09-22 20:45:20,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98933.33333333333, ans=0.125 2024-09-22 20:45:36,072 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.207e+02 1.362e+02 1.512e+02 1.797e+02 3.142e+02, threshold=3.023e+02, percent-clipped=0.0 2024-09-22 20:45:55,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=99026.66666666667, ans=0.0 2024-09-22 20:46:00,121 INFO [train.py:1198] (0/4) Epoch 6, batch 1750, loss[loss=0.2806, ctc_loss=0.2033, cr_loss=0.3864, over 17095.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2071, cr_loss=0.4019, over 3358987.16 frames. ], batch size: 49, lr: 1.97e-02, grad_scale: 32.0 2024-09-22 20:46:10,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=99073.33333333333, ans=0.125 2024-09-22 20:46:24,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=99120.0, ans=0.0 2024-09-22 20:46:30,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=99166.66666666667, ans=0.0 2024-09-22 20:46:35,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=99166.66666666667, ans=0.125 2024-09-22 20:46:40,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=99166.66666666667, ans=0.125 2024-09-22 20:46:49,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=99213.33333333333, ans=0.125 2024-09-22 20:46:55,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=99213.33333333333, ans=0.125 2024-09-22 20:47:13,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.04 vs. limit=6.0 2024-09-22 20:47:17,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2024-09-22 20:47:24,703 INFO [train.py:1198] (0/4) Epoch 6, batch 1800, loss[loss=0.2764, ctc_loss=0.2007, cr_loss=0.3783, over 16936.00 frames. ], tot_loss[loss=0.2853, ctc_loss=0.2053, cr_loss=0.3998, over 3368785.86 frames. ], batch size: 42, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:47:44,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=99353.33333333333, ans=0.0 2024-09-22 20:47:47,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=99353.33333333333, ans=0.0 2024-09-22 20:47:56,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=99400.0, ans=0.125 2024-09-22 20:48:02,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2024-09-22 20:48:22,609 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.233e+02 1.401e+02 1.510e+02 1.768e+02 2.829e+02, threshold=3.019e+02, percent-clipped=0.0 2024-09-22 20:48:22,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=99446.66666666667, ans=0.125 2024-09-22 20:48:27,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=99446.66666666667, ans=0.125 2024-09-22 20:48:34,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=99493.33333333333, ans=0.2 2024-09-22 20:48:34,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=99493.33333333333, ans=0.125 2024-09-22 20:48:41,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=99493.33333333333, ans=0.025 2024-09-22 20:48:46,315 INFO [train.py:1198] (0/4) Epoch 6, batch 1850, loss[loss=0.3003, ctc_loss=0.2133, cr_loss=0.4351, over 17023.00 frames. ], tot_loss[loss=0.285, ctc_loss=0.2051, cr_loss=0.3998, over 3370126.62 frames. ], batch size: 44, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:48:48,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-22 20:49:32,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=99633.33333333333, ans=0.125 2024-09-22 20:50:08,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2024-09-22 20:50:09,095 INFO [train.py:1198] (0/4) Epoch 6, batch 1900, loss[loss=0.2309, ctc_loss=0.1624, cr_loss=0.3423, over 16707.00 frames. ], tot_loss[loss=0.2869, ctc_loss=0.2064, cr_loss=0.4023, over 3370963.80 frames. ], batch size: 37, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:50:14,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=99773.33333333333, ans=0.2 2024-09-22 20:50:14,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2024-09-22 20:51:05,355 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.450e+02 1.646e+02 2.008e+02 3.199e+02, threshold=3.291e+02, percent-clipped=1.0 2024-09-22 20:51:29,423 INFO [train.py:1198] (0/4) Epoch 6, batch 1950, loss[loss=0.3572, ctc_loss=0.2723, cr_loss=0.4249, over 12210.00 frames. ], tot_loss[loss=0.2882, ctc_loss=0.2076, cr_loss=0.4032, over 3358588.15 frames. ], batch size: 123, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:52:17,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=100100.0, ans=0.125 2024-09-22 20:52:45,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100193.33333333333, ans=0.1 2024-09-22 20:52:46,015 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 20:52:53,474 INFO [train.py:1198] (0/4) Epoch 6, batch 2000, loss[loss=0.2551, ctc_loss=0.1776, cr_loss=0.3878, over 16980.00 frames. ], tot_loss[loss=0.2882, ctc_loss=0.2076, cr_loss=0.4029, over 3362397.38 frames. ], batch size: 42, lr: 1.96e-02, grad_scale: 32.0 2024-09-22 20:53:23,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100286.66666666667, ans=0.1 2024-09-22 20:53:54,725 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.289e+02 1.463e+02 1.652e+02 2.000e+02 4.397e+02, threshold=3.304e+02, percent-clipped=3.0 2024-09-22 20:54:01,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100426.66666666667, ans=0.1 2024-09-22 20:54:18,664 INFO [train.py:1198] (0/4) Epoch 6, batch 2050, loss[loss=0.3434, ctc_loss=0.251, cr_loss=0.4624, over 15863.00 frames. ], tot_loss[loss=0.2892, ctc_loss=0.2084, cr_loss=0.4039, over 3356201.37 frames. ], batch size: 74, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:54:38,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=100520.0, ans=0.0 2024-09-22 20:55:08,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=100613.33333333333, ans=0.125 2024-09-22 20:55:12,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2024-09-22 20:55:19,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=100613.33333333333, ans=0.125 2024-09-22 20:55:19,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=100613.33333333333, ans=0.125 2024-09-22 20:55:22,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=100660.0, ans=0.0 2024-09-22 20:55:30,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=100660.0, ans=0.025 2024-09-22 20:55:30,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=100660.0, ans=0.04949747468305833 2024-09-22 20:55:35,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=22.5 2024-09-22 20:55:37,973 INFO [train.py:1198] (0/4) Epoch 6, batch 2100, loss[loss=0.3056, ctc_loss=0.2196, cr_loss=0.4301, over 17092.00 frames. ], tot_loss[loss=0.289, ctc_loss=0.2082, cr_loss=0.404, over 3369262.84 frames. ], batch size: 49, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:55:43,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=100706.66666666667, ans=0.125 2024-09-22 20:56:11,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100800.0, ans=0.1 2024-09-22 20:56:13,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=100800.0, ans=0.0 2024-09-22 20:56:19,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=100800.0, ans=0.0 2024-09-22 20:56:32,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=100846.66666666667, ans=0.015 2024-09-22 20:56:33,955 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.234e+02 1.439e+02 1.570e+02 1.892e+02 4.315e+02, threshold=3.139e+02, percent-clipped=1.0 2024-09-22 20:56:58,028 INFO [train.py:1198] (0/4) Epoch 6, batch 2150, loss[loss=0.3067, ctc_loss=0.2217, cr_loss=0.4248, over 17354.00 frames. ], tot_loss[loss=0.2877, ctc_loss=0.2072, cr_loss=0.4021, over 3374967.14 frames. ], batch size: 48, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:57:21,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100986.66666666667, ans=0.1 2024-09-22 20:57:50,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-09-22 20:58:03,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=101080.0, ans=0.0 2024-09-22 20:58:20,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=101126.66666666667, ans=0.125 2024-09-22 20:58:21,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=12.0 2024-09-22 20:58:25,373 INFO [train.py:1198] (0/4) Epoch 6, batch 2200, loss[loss=0.2952, ctc_loss=0.2144, cr_loss=0.404, over 17111.00 frames. ], tot_loss[loss=0.2872, ctc_loss=0.2067, cr_loss=0.4025, over 3380013.93 frames. ], batch size: 49, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:58:44,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=22.5 2024-09-22 20:58:55,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-09-22 20:58:58,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=101266.66666666667, ans=0.125 2024-09-22 20:58:58,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=101266.66666666667, ans=0.0 2024-09-22 20:59:16,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=101313.33333333333, ans=0.2 2024-09-22 20:59:20,164 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 20:59:22,925 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.199e+02 1.501e+02 1.682e+02 2.031e+02 3.137e+02, threshold=3.364e+02, percent-clipped=0.0 2024-09-22 20:59:46,928 INFO [train.py:1198] (0/4) Epoch 6, batch 2250, loss[loss=0.2642, ctc_loss=0.1874, cr_loss=0.3841, over 17134.00 frames. ], tot_loss[loss=0.2875, ctc_loss=0.2069, cr_loss=0.4028, over 3381612.73 frames. ], batch size: 40, lr: 1.95e-02, grad_scale: 32.0 2024-09-22 20:59:53,717 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:00:03,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2024-09-22 21:00:14,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=101453.33333333333, ans=0.125 2024-09-22 21:00:47,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=101546.66666666667, ans=0.025 2024-09-22 21:01:06,135 INFO [train.py:1198] (0/4) Epoch 6, batch 2300, loss[loss=0.2742, ctc_loss=0.195, cr_loss=0.3958, over 17291.00 frames. ], tot_loss[loss=0.2871, ctc_loss=0.2065, cr_loss=0.4027, over 3380723.10 frames. ], batch size: 46, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:02:06,859 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.444e+02 1.711e+02 1.913e+02 2.754e+02, threshold=3.422e+02, percent-clipped=0.0 2024-09-22 21:02:30,423 INFO [train.py:1198] (0/4) Epoch 6, batch 2350, loss[loss=0.2763, ctc_loss=0.1953, cr_loss=0.4048, over 17152.00 frames. ], tot_loss[loss=0.287, ctc_loss=0.2066, cr_loss=0.4023, over 3374571.43 frames. ], batch size: 48, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:02:57,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=101920.0, ans=0.0 2024-09-22 21:03:06,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=101966.66666666667, ans=0.2 2024-09-22 21:03:53,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=102106.66666666667, ans=0.0 2024-09-22 21:03:55,324 INFO [train.py:1198] (0/4) Epoch 6, batch 2400, loss[loss=0.3208, ctc_loss=0.2429, cr_loss=0.3895, over 14922.00 frames. ], tot_loss[loss=0.2865, ctc_loss=0.2062, cr_loss=0.4016, over 3372485.09 frames. ], batch size: 88, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:03:57,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=102106.66666666667, ans=0.025 2024-09-22 21:03:57,364 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:03:58,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=102106.66666666667, ans=0.0 2024-09-22 21:04:23,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=102153.33333333333, ans=10.0 2024-09-22 21:04:45,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=102246.66666666667, ans=0.125 2024-09-22 21:04:50,185 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.414e+02 1.609e+02 1.881e+02 2.919e+02, threshold=3.217e+02, percent-clipped=0.0 2024-09-22 21:04:52,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2024-09-22 21:04:59,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=102293.33333333333, ans=12.0 2024-09-22 21:05:14,227 INFO [train.py:1198] (0/4) Epoch 6, batch 2450, loss[loss=0.3247, ctc_loss=0.2395, cr_loss=0.426, over 16072.00 frames. ], tot_loss[loss=0.2869, ctc_loss=0.2065, cr_loss=0.4023, over 3369339.62 frames. ], batch size: 74, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:05:22,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102340.0, ans=0.1 2024-09-22 21:05:34,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=22.5 2024-09-22 21:05:47,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=102433.33333333333, ans=0.025 2024-09-22 21:06:14,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=102480.0, ans=0.0 2024-09-22 21:06:26,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=102526.66666666667, ans=0.0 2024-09-22 21:06:33,761 INFO [train.py:1198] (0/4) Epoch 6, batch 2500, loss[loss=0.2955, ctc_loss=0.2144, cr_loss=0.4052, over 17223.00 frames. ], tot_loss[loss=0.2877, ctc_loss=0.2071, cr_loss=0.403, over 3357363.52 frames. ], batch size: 47, lr: 1.94e-02, grad_scale: 32.0 2024-09-22 21:07:03,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=102620.0, ans=0.125 2024-09-22 21:07:03,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=102620.0, ans=0.0 2024-09-22 21:07:04,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=102620.0, ans=0.0 2024-09-22 21:07:34,921 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.228e+02 1.454e+02 1.581e+02 1.860e+02 2.771e+02, threshold=3.162e+02, percent-clipped=0.0 2024-09-22 21:08:01,447 INFO [train.py:1198] (0/4) Epoch 6, batch 2550, loss[loss=0.2294, ctc_loss=0.1603, cr_loss=0.3458, over 17271.00 frames. ], tot_loss[loss=0.2862, ctc_loss=0.2061, cr_loss=0.4009, over 3355625.99 frames. ], batch size: 42, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:08:33,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=102853.33333333333, ans=0.125 2024-09-22 21:08:33,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=102853.33333333333, ans=0.125 2024-09-22 21:08:33,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-09-22 21:08:46,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=12.0 2024-09-22 21:08:51,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2024-09-22 21:08:54,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=12.0 2024-09-22 21:09:04,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=102946.66666666667, ans=0.2 2024-09-22 21:09:19,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=102993.33333333333, ans=0.125 2024-09-22 21:09:22,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=103040.0, ans=0.125 2024-09-22 21:09:23,606 INFO [train.py:1198] (0/4) Epoch 6, batch 2600, loss[loss=0.3065, ctc_loss=0.2248, cr_loss=0.4083, over 17151.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.2044, cr_loss=0.3991, over 3368114.88 frames. ], batch size: 48, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:10:18,671 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.459e+02 1.706e+02 2.072e+02 3.287e+02, threshold=3.412e+02, percent-clipped=1.0 2024-09-22 21:10:31,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=103226.66666666667, ans=0.1 2024-09-22 21:10:41,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=103273.33333333333, ans=0.0 2024-09-22 21:10:42,362 INFO [train.py:1198] (0/4) Epoch 6, batch 2650, loss[loss=0.3192, ctc_loss=0.2338, cr_loss=0.4271, over 16583.00 frames. ], tot_loss[loss=0.2835, ctc_loss=0.2037, cr_loss=0.3987, over 3370484.19 frames. ], batch size: 66, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:10:45,943 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:10:46,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=103273.33333333333, ans=0.125 2024-09-22 21:11:31,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-09-22 21:11:47,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=103460.0, ans=0.125 2024-09-22 21:12:02,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103460.0, ans=0.1 2024-09-22 21:12:06,889 INFO [train.py:1198] (0/4) Epoch 6, batch 2700, loss[loss=0.2667, ctc_loss=0.1881, cr_loss=0.3933, over 16945.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.2038, cr_loss=0.3989, over 3360788.68 frames. ], batch size: 42, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:12:15,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=103506.66666666667, ans=0.0 2024-09-22 21:13:05,402 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 1.442e+02 1.580e+02 1.786e+02 2.700e+02, threshold=3.159e+02, percent-clipped=0.0 2024-09-22 21:13:31,581 INFO [train.py:1198] (0/4) Epoch 6, batch 2750, loss[loss=0.3196, ctc_loss=0.2328, cr_loss=0.4342, over 17016.00 frames. ], tot_loss[loss=0.2827, ctc_loss=0.203, cr_loss=0.3985, over 3370434.14 frames. ], batch size: 53, lr: 1.93e-02, grad_scale: 32.0 2024-09-22 21:13:35,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2024-09-22 21:13:58,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=103786.66666666667, ans=0.125 2024-09-22 21:14:11,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-22 21:14:23,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-09-22 21:14:35,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=103926.66666666667, ans=0.1 2024-09-22 21:14:51,043 INFO [train.py:1198] (0/4) Epoch 6, batch 2800, loss[loss=0.2363, ctc_loss=0.1637, cr_loss=0.3629, over 16937.00 frames. ], tot_loss[loss=0.2824, ctc_loss=0.2028, cr_loss=0.3979, over 3368248.87 frames. ], batch size: 42, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:14:51,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=103973.33333333333, ans=0.04949747468305833 2024-09-22 21:15:02,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=103973.33333333333, ans=0.125 2024-09-22 21:15:26,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=104066.66666666667, ans=0.125 2024-09-22 21:15:46,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.09 vs. limit=15.0 2024-09-22 21:15:46,539 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.391e+02 1.557e+02 1.769e+02 3.866e+02, threshold=3.114e+02, percent-clipped=1.0 2024-09-22 21:15:46,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=104113.33333333333, ans=0.0 2024-09-22 21:15:49,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.41 vs. limit=10.0 2024-09-22 21:16:10,247 INFO [train.py:1198] (0/4) Epoch 6, batch 2850, loss[loss=0.2694, ctc_loss=0.1953, cr_loss=0.3706, over 17145.00 frames. ], tot_loss[loss=0.2835, ctc_loss=0.2037, cr_loss=0.3988, over 3363288.79 frames. ], batch size: 48, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:16:47,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=104300.0, ans=0.04949747468305833 2024-09-22 21:16:51,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=104300.0, ans=0.125 2024-09-22 21:16:51,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-22 21:16:59,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-22 21:17:35,031 INFO [train.py:1198] (0/4) Epoch 6, batch 2900, loss[loss=0.2266, ctc_loss=0.1579, cr_loss=0.3433, over 16314.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.2037, cr_loss=0.3991, over 3365079.04 frames. ], batch size: 36, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:17:41,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=104440.0, ans=0.035 2024-09-22 21:17:43,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=104440.0, ans=0.125 2024-09-22 21:18:08,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=104533.33333333333, ans=0.025 2024-09-22 21:18:27,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-09-22 21:18:35,985 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.209e+02 1.508e+02 1.666e+02 1.915e+02 2.988e+02, threshold=3.332e+02, percent-clipped=0.0 2024-09-22 21:18:44,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2024-09-22 21:18:59,989 INFO [train.py:1198] (0/4) Epoch 6, batch 2950, loss[loss=0.2336, ctc_loss=0.1659, cr_loss=0.3383, over 17112.00 frames. ], tot_loss[loss=0.2842, ctc_loss=0.2042, cr_loss=0.3999, over 3366443.53 frames. ], batch size: 40, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:19:00,459 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:19:00,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=104673.33333333333, ans=0.2 2024-09-22 21:19:56,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.18 vs. limit=22.5 2024-09-22 21:19:57,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=104813.33333333333, ans=0.125 2024-09-22 21:20:19,065 INFO [train.py:1198] (0/4) Epoch 6, batch 3000, loss[loss=0.241, ctc_loss=0.1711, cr_loss=0.3497, over 17041.00 frames. ], tot_loss[loss=0.2837, ctc_loss=0.2039, cr_loss=0.3988, over 3361819.95 frames. ], batch size: 39, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:20:19,066 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 21:20:26,554 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0167, 4.8317, 4.6583, 4.2620], device='cuda:0') 2024-09-22 21:20:29,671 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.7925, 2.3826, 3.6427, 3.5016], device='cuda:0') 2024-09-22 21:20:34,503 INFO [train.py:1230] (0/4) Epoch 6, validation: loss=0.06097, ctc_loss=0.06097, cr_loss=6.736e-15, over 944034.00 frames. 2024-09-22 21:20:34,504 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 21:20:34,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=104906.66666666667, ans=0.2 2024-09-22 21:20:53,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=104953.33333333333, ans=0.05 2024-09-22 21:20:59,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104953.33333333333, ans=0.1 2024-09-22 21:21:21,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=105046.66666666667, ans=0.2 2024-09-22 21:21:29,319 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.397e+02 1.557e+02 1.721e+02 3.093e+02, threshold=3.113e+02, percent-clipped=0.0 2024-09-22 21:21:34,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105046.66666666667, ans=0.1 2024-09-22 21:21:37,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=105093.33333333333, ans=0.125 2024-09-22 21:21:52,865 INFO [train.py:1198] (0/4) Epoch 6, batch 3050, loss[loss=0.2888, ctc_loss=0.21, cr_loss=0.394, over 16943.00 frames. ], tot_loss[loss=0.2839, ctc_loss=0.2042, cr_loss=0.3989, over 3360565.66 frames. ], batch size: 58, lr: 1.92e-02, grad_scale: 32.0 2024-09-22 21:22:01,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=105140.0, ans=0.0 2024-09-22 21:22:15,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=105186.66666666667, ans=0.125 2024-09-22 21:23:12,836 INFO [train.py:1198] (0/4) Epoch 6, batch 3100, loss[loss=0.3167, ctc_loss=0.2261, cr_loss=0.4532, over 16056.00 frames. ], tot_loss[loss=0.2829, ctc_loss=0.2033, cr_loss=0.3983, over 3365438.76 frames. ], batch size: 74, lr: 1.91e-02, grad_scale: 32.0 2024-09-22 21:23:23,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=105373.33333333333, ans=0.07 2024-09-22 21:23:30,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=105420.0, ans=0.2 2024-09-22 21:23:31,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=105420.0, ans=0.125 2024-09-22 21:24:09,900 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.436e+02 1.623e+02 1.870e+02 2.887e+02, threshold=3.246e+02, percent-clipped=0.0 2024-09-22 21:24:19,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105560.0, ans=0.1 2024-09-22 21:24:33,411 INFO [train.py:1198] (0/4) Epoch 6, batch 3150, loss[loss=0.2881, ctc_loss=0.2073, cr_loss=0.4038, over 17356.00 frames. ], tot_loss[loss=0.2834, ctc_loss=0.2036, cr_loss=0.3989, over 3367325.75 frames. ], batch size: 48, lr: 1.91e-02, grad_scale: 32.0 2024-09-22 21:24:49,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105653.33333333333, ans=0.1 2024-09-22 21:25:32,795 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:25:44,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=105793.33333333333, ans=0.0 2024-09-22 21:25:53,520 INFO [train.py:1198] (0/4) Epoch 6, batch 3200, loss[loss=0.3063, ctc_loss=0.2159, cr_loss=0.4522, over 17157.00 frames. ], tot_loss[loss=0.2838, ctc_loss=0.2039, cr_loss=0.3995, over 3363258.31 frames. ], batch size: 48, lr: 1.91e-02, grad_scale: 32.0 2024-09-22 21:25:55,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=105840.0, ans=0.125 2024-09-22 21:26:04,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=105840.0, ans=0.125 2024-09-22 21:26:12,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=105886.66666666667, ans=0.125 2024-09-22 21:26:15,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=105886.66666666667, ans=0.0 2024-09-22 21:26:31,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=105933.33333333333, ans=15.0 2024-09-22 21:26:34,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=105933.33333333333, ans=0.125 2024-09-22 21:26:50,292 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.272e+02 1.472e+02 1.715e+02 1.976e+02 3.064e+02, threshold=3.429e+02, percent-clipped=0.0 2024-09-22 21:27:12,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=106073.33333333333, ans=0.0 2024-09-22 21:27:13,661 INFO [train.py:1198] (0/4) Epoch 6, batch 3250, loss[loss=0.2358, ctc_loss=0.1656, cr_loss=0.351, over 17093.00 frames. ], tot_loss[loss=0.2833, ctc_loss=0.2034, cr_loss=0.3994, over 3362997.32 frames. ], batch size: 43, lr: 1.91e-02, grad_scale: 32.0 2024-09-22 21:27:16,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=106073.33333333333, ans=0.125 2024-09-22 21:27:23,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=106073.33333333333, ans=0.125 2024-09-22 21:27:32,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=106120.0, ans=0.125 2024-09-22 21:28:31,722 INFO [train.py:1198] (0/4) Epoch 6, batch 3300, loss[loss=0.2801, ctc_loss=0.1958, cr_loss=0.4216, over 17226.00 frames. ], tot_loss[loss=0.2834, ctc_loss=0.2035, cr_loss=0.3992, over 3367324.02 frames. ], batch size: 55, lr: 1.91e-02, grad_scale: 64.0 2024-09-22 21:28:52,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=106353.33333333333, ans=0.125 2024-09-22 21:29:07,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=106400.0, ans=0.125 2024-09-22 21:29:15,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=106400.0, ans=10.0 2024-09-22 21:29:15,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=106400.0, ans=0.0 2024-09-22 21:29:18,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=106446.66666666667, ans=0.125 2024-09-22 21:29:22,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.32 vs. limit=15.0 2024-09-22 21:29:26,378 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.242e+02 1.523e+02 1.757e+02 2.023e+02 3.259e+02, threshold=3.514e+02, percent-clipped=0.0 2024-09-22 21:29:39,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.36 vs. limit=10.0 2024-09-22 21:29:49,749 INFO [train.py:1198] (0/4) Epoch 6, batch 3350, loss[loss=0.2669, ctc_loss=0.1867, cr_loss=0.4014, over 17264.00 frames. ], tot_loss[loss=0.2832, ctc_loss=0.2032, cr_loss=0.3996, over 3368814.61 frames. ], batch size: 44, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:29:53,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=106540.0, ans=0.125 2024-09-22 21:30:01,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106540.0, ans=0.1 2024-09-22 21:30:26,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=106633.33333333333, ans=0.125 2024-09-22 21:30:29,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=106633.33333333333, ans=0.035 2024-09-22 21:30:40,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=106680.0, ans=0.2 2024-09-22 21:30:49,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=106680.0, ans=0.125 2024-09-22 21:31:00,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=106726.66666666667, ans=0.2 2024-09-22 21:31:08,300 INFO [train.py:1198] (0/4) Epoch 6, batch 3400, loss[loss=0.2568, ctc_loss=0.1798, cr_loss=0.3848, over 17084.00 frames. ], tot_loss[loss=0.2822, ctc_loss=0.2024, cr_loss=0.3988, over 3372707.97 frames. ], batch size: 43, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:31:10,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=106773.33333333333, ans=0.0 2024-09-22 21:31:31,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-09-22 21:31:35,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2024-09-22 21:32:04,336 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 1.439e+02 1.572e+02 1.824e+02 3.611e+02, threshold=3.144e+02, percent-clipped=1.0 2024-09-22 21:32:25,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=107006.66666666667, ans=0.0 2024-09-22 21:32:26,357 INFO [train.py:1198] (0/4) Epoch 6, batch 3450, loss[loss=0.3061, ctc_loss=0.2225, cr_loss=0.418, over 17002.00 frames. ], tot_loss[loss=0.2825, ctc_loss=0.2027, cr_loss=0.3991, over 3375792.49 frames. ], batch size: 53, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:33:03,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=107100.0, ans=0.0 2024-09-22 21:33:15,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.74 vs. limit=22.5 2024-09-22 21:33:24,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=107146.66666666667, ans=0.0 2024-09-22 21:33:29,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=107193.33333333333, ans=0.1 2024-09-22 21:33:46,060 INFO [train.py:1198] (0/4) Epoch 6, batch 3500, loss[loss=0.2959, ctc_loss=0.2083, cr_loss=0.4381, over 17009.00 frames. ], tot_loss[loss=0.283, ctc_loss=0.2031, cr_loss=0.3995, over 3369651.28 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:33:46,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2024-09-22 21:33:49,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=107240.0, ans=10.0 2024-09-22 21:33:50,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-22 21:34:44,453 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.243e+02 1.542e+02 1.676e+02 1.907e+02 3.181e+02, threshold=3.352e+02, percent-clipped=1.0 2024-09-22 21:35:06,139 INFO [train.py:1198] (0/4) Epoch 6, batch 3550, loss[loss=0.2433, ctc_loss=0.1722, cr_loss=0.3554, over 17091.00 frames. ], tot_loss[loss=0.2837, ctc_loss=0.2038, cr_loss=0.3995, over 3371837.49 frames. ], batch size: 43, lr: 1.90e-02, grad_scale: 32.0 2024-09-22 21:35:18,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=107473.33333333333, ans=0.125 2024-09-22 21:35:18,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=107473.33333333333, ans=0.0 2024-09-22 21:35:44,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=107566.66666666667, ans=0.125 2024-09-22 21:35:45,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=107566.66666666667, ans=0.5 2024-09-22 21:35:52,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2024-09-22 21:36:22,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=107660.0, ans=0.025 2024-09-22 21:36:28,360 INFO [train.py:1198] (0/4) Epoch 6, batch 3600, loss[loss=0.2779, ctc_loss=0.2028, cr_loss=0.3751, over 17305.00 frames. ], tot_loss[loss=0.2836, ctc_loss=0.2035, cr_loss=0.4004, over 3375681.21 frames. ], batch size: 49, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:36:32,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2024-09-22 21:36:33,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=107706.66666666667, ans=0.125 2024-09-22 21:36:41,260 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:37:09,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=107800.0, ans=0.09899494936611666 2024-09-22 21:37:10,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=107800.0, ans=0.125 2024-09-22 21:37:24,564 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.276e+02 1.473e+02 1.668e+02 1.932e+02 3.410e+02, threshold=3.336e+02, percent-clipped=1.0 2024-09-22 21:37:26,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107846.66666666667, ans=0.1 2024-09-22 21:37:46,257 INFO [train.py:1198] (0/4) Epoch 6, batch 3650, loss[loss=0.2439, ctc_loss=0.1713, cr_loss=0.3629, over 17206.00 frames. ], tot_loss[loss=0.2845, ctc_loss=0.2041, cr_loss=0.4018, over 3372969.50 frames. ], batch size: 41, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:37:51,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=107940.0, ans=0.125 2024-09-22 21:38:03,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=107986.66666666667, ans=0.025 2024-09-22 21:38:11,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=107986.66666666667, ans=0.2 2024-09-22 21:38:26,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=108033.33333333333, ans=0.05 2024-09-22 21:38:37,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=108080.0, ans=0.0 2024-09-22 21:39:04,820 INFO [train.py:1198] (0/4) Epoch 6, batch 3700, loss[loss=0.2974, ctc_loss=0.2083, cr_loss=0.4455, over 17032.00 frames. ], tot_loss[loss=0.2846, ctc_loss=0.2043, cr_loss=0.4015, over 3373219.77 frames. ], batch size: 52, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:39:17,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=108173.33333333333, ans=0.025 2024-09-22 21:39:46,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2024-09-22 21:39:58,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=108313.33333333333, ans=0.1 2024-09-22 21:40:01,664 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.455e+02 1.690e+02 2.004e+02 3.040e+02, threshold=3.380e+02, percent-clipped=0.0 2024-09-22 21:40:23,106 INFO [train.py:1198] (0/4) Epoch 6, batch 3750, loss[loss=0.2913, ctc_loss=0.2122, cr_loss=0.3958, over 16234.00 frames. ], tot_loss[loss=0.285, ctc_loss=0.2048, cr_loss=0.4012, over 3345860.80 frames. ], batch size: 36, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:40:24,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-09-22 21:40:33,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-09-22 21:40:50,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=108453.33333333333, ans=0.0 2024-09-22 21:41:04,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.01 vs. limit=15.0 2024-09-22 21:41:14,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108546.66666666667, ans=0.1 2024-09-22 21:41:21,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=108546.66666666667, ans=0.125 2024-09-22 21:41:26,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=108593.33333333333, ans=0.125 2024-09-22 21:41:32,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=108593.33333333333, ans=0.2 2024-09-22 21:41:42,110 INFO [train.py:1198] (0/4) Epoch 6, batch 3800, loss[loss=0.3197, ctc_loss=0.2375, cr_loss=0.4107, over 16947.00 frames. ], tot_loss[loss=0.2859, ctc_loss=0.2056, cr_loss=0.4019, over 3328907.16 frames. ], batch size: 58, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:42:01,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=108686.66666666667, ans=0.125 2024-09-22 21:42:30,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=108780.0, ans=0.0 2024-09-22 21:42:33,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108780.0, ans=0.1 2024-09-22 21:42:39,194 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.201e+02 1.543e+02 1.830e+02 2.234e+02 3.927e+02, threshold=3.660e+02, percent-clipped=2.0 2024-09-22 21:43:00,881 INFO [train.py:1198] (0/4) Epoch 6, batch 3850, loss[loss=0.3208, ctc_loss=0.239, cr_loss=0.4089, over 15159.00 frames. ], tot_loss[loss=0.2859, ctc_loss=0.2057, cr_loss=0.4014, over 3305326.16 frames. ], batch size: 89, lr: 1.89e-02, grad_scale: 32.0 2024-09-22 21:43:12,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=108873.33333333333, ans=0.125 2024-09-22 21:43:17,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108920.0, ans=0.1 2024-09-22 21:43:36,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=108966.66666666667, ans=0.0 2024-09-22 21:44:10,568 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-6.pt 2024-09-22 21:45:03,018 INFO [train.py:1198] (0/4) Epoch 7, batch 0, loss[loss=0.2484, ctc_loss=0.1797, cr_loss=0.3437, over 17267.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1797, cr_loss=0.3437, over 17267.00 frames. ], batch size: 42, lr: 1.77e-02, grad_scale: 32.0 2024-09-22 21:45:03,019 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 21:45:18,430 INFO [train.py:1230] (0/4) Epoch 7, validation: loss=0.06283, ctc_loss=0.06283, cr_loss=7.028e-15, over 944034.00 frames. 2024-09-22 21:45:18,430 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 21:45:21,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=109088.0, ans=0.125 2024-09-22 21:45:57,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=109181.33333333333, ans=0.09899494936611666 2024-09-22 21:46:06,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=109228.0, ans=0.125 2024-09-22 21:46:24,181 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.230e+02 1.560e+02 1.911e+02 2.609e+02 3.890e+02, threshold=3.822e+02, percent-clipped=2.0 2024-09-22 21:46:39,894 INFO [train.py:1198] (0/4) Epoch 7, batch 50, loss[loss=0.267, ctc_loss=0.1896, cr_loss=0.3867, over 17015.00 frames. ], tot_loss[loss=0.2807, ctc_loss=0.2011, cr_loss=0.398, over 765247.17 frames. ], batch size: 53, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:47:21,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2024-09-22 21:48:04,580 INFO [train.py:1198] (0/4) Epoch 7, batch 100, loss[loss=0.2624, ctc_loss=0.1891, cr_loss=0.3667, over 17084.00 frames. ], tot_loss[loss=0.2805, ctc_loss=0.2009, cr_loss=0.3982, over 1341120.88 frames. ], batch size: 43, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:48:17,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=109554.66666666667, ans=0.2 2024-09-22 21:48:22,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=109601.33333333333, ans=0.125 2024-09-22 21:48:30,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=109601.33333333333, ans=0.0 2024-09-22 21:48:41,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=109648.0, ans=0.0 2024-09-22 21:49:07,938 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.339e+02 1.498e+02 1.759e+02 2.443e+02, threshold=2.996e+02, percent-clipped=0.0 2024-09-22 21:49:26,881 INFO [train.py:1198] (0/4) Epoch 7, batch 150, loss[loss=0.3117, ctc_loss=0.2234, cr_loss=0.4411, over 16578.00 frames. ], tot_loss[loss=0.2829, ctc_loss=0.2027, cr_loss=0.4011, over 1785997.87 frames. ], batch size: 66, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:49:28,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=109788.0, ans=0.125 2024-09-22 21:49:36,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109788.0, ans=0.1 2024-09-22 21:49:49,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=109834.66666666667, ans=0.0 2024-09-22 21:50:02,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=109881.33333333333, ans=0.05 2024-09-22 21:50:25,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-09-22 21:50:45,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=109974.66666666667, ans=15.0 2024-09-22 21:50:49,453 INFO [train.py:1198] (0/4) Epoch 7, batch 200, loss[loss=0.3036, ctc_loss=0.2161, cr_loss=0.4376, over 17218.00 frames. ], tot_loss[loss=0.2822, ctc_loss=0.2021, cr_loss=0.4007, over 2133330.47 frames. ], batch size: 47, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:50:57,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2024-09-22 21:51:48,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110161.33333333333, ans=0.1 2024-09-22 21:51:53,495 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.194e+02 1.411e+02 1.617e+02 1.825e+02 4.000e+02, threshold=3.234e+02, percent-clipped=2.0 2024-09-22 21:51:56,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=110208.0, ans=0.125 2024-09-22 21:52:12,072 INFO [train.py:1198] (0/4) Epoch 7, batch 250, loss[loss=0.2934, ctc_loss=0.2074, cr_loss=0.4299, over 16972.00 frames. ], tot_loss[loss=0.2821, ctc_loss=0.2021, cr_loss=0.3997, over 2393822.78 frames. ], batch size: 53, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:52:28,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=110301.33333333333, ans=0.125 2024-09-22 21:52:49,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=110348.0, ans=0.1 2024-09-22 21:53:12,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=110394.66666666667, ans=0.125 2024-09-22 21:53:19,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=110441.33333333333, ans=0.125 2024-09-22 21:53:20,980 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:53:33,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=110488.0, ans=0.125 2024-09-22 21:53:34,853 INFO [train.py:1198] (0/4) Epoch 7, batch 300, loss[loss=0.2391, ctc_loss=0.1654, cr_loss=0.3682, over 16693.00 frames. ], tot_loss[loss=0.2787, ctc_loss=0.1994, cr_loss=0.3967, over 2610032.28 frames. ], batch size: 37, lr: 1.76e-02, grad_scale: 32.0 2024-09-22 21:53:35,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=12.0 2024-09-22 21:53:36,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=110488.0, ans=0.125 2024-09-22 21:54:34,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=110628.0, ans=0.0 2024-09-22 21:54:41,019 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 1.446e+02 1.620e+02 1.814e+02 2.683e+02, threshold=3.241e+02, percent-clipped=0.0 2024-09-22 21:54:49,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=110674.66666666667, ans=0.125 2024-09-22 21:54:56,993 INFO [train.py:1198] (0/4) Epoch 7, batch 350, loss[loss=0.2633, ctc_loss=0.1868, cr_loss=0.3822, over 17295.00 frames. ], tot_loss[loss=0.2803, ctc_loss=0.2004, cr_loss=0.3994, over 2776130.57 frames. ], batch size: 46, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 21:54:57,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=110721.33333333333, ans=0.07 2024-09-22 21:55:05,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=110721.33333333333, ans=0.05 2024-09-22 21:55:06,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=110721.33333333333, ans=0.09899494936611666 2024-09-22 21:55:27,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=110814.66666666667, ans=0.1 2024-09-22 21:55:43,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=110814.66666666667, ans=0.125 2024-09-22 21:55:49,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110861.33333333333, ans=0.1 2024-09-22 21:55:54,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2024-09-22 21:56:14,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=110908.0, ans=0.125 2024-09-22 21:56:16,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=110908.0, ans=0.2 2024-09-22 21:56:16,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=110908.0, ans=0.125 2024-09-22 21:56:18,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-09-22 21:56:19,159 INFO [train.py:1198] (0/4) Epoch 7, batch 400, loss[loss=0.2854, ctc_loss=0.2019, cr_loss=0.4175, over 17302.00 frames. ], tot_loss[loss=0.2788, ctc_loss=0.1991, cr_loss=0.3983, over 2905523.88 frames. ], batch size: 51, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 21:56:30,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=110954.66666666667, ans=0.0 2024-09-22 21:56:43,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=111001.33333333333, ans=0.125 2024-09-22 21:56:46,640 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 21:57:14,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=111094.66666666667, ans=0.125 2024-09-22 21:57:25,761 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.408e+02 1.555e+02 1.806e+02 2.695e+02, threshold=3.109e+02, percent-clipped=0.0 2024-09-22 21:57:41,729 INFO [train.py:1198] (0/4) Epoch 7, batch 450, loss[loss=0.2908, ctc_loss=0.2038, cr_loss=0.4347, over 16914.00 frames. ], tot_loss[loss=0.2801, ctc_loss=0.2002, cr_loss=0.3995, over 3004351.14 frames. ], batch size: 58, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 21:57:49,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111188.0, ans=0.125 2024-09-22 21:58:03,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=111234.66666666667, ans=0.125 2024-09-22 21:58:11,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=111234.66666666667, ans=0.125 2024-09-22 21:58:59,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=111374.66666666667, ans=0.025 2024-09-22 21:59:03,789 INFO [train.py:1198] (0/4) Epoch 7, batch 500, loss[loss=0.3124, ctc_loss=0.2206, cr_loss=0.4586, over 16886.00 frames. ], tot_loss[loss=0.2796, ctc_loss=0.1999, cr_loss=0.3984, over 3074915.45 frames. ], batch size: 58, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 21:59:06,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-22 21:59:07,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=111421.33333333333, ans=15.0 2024-09-22 21:59:14,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=111421.33333333333, ans=0.2 2024-09-22 21:59:16,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=111421.33333333333, ans=0.0 2024-09-22 21:59:53,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.54 vs. limit=6.0 2024-09-22 21:59:57,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=111561.33333333333, ans=0.0 2024-09-22 22:00:00,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=111561.33333333333, ans=0.0 2024-09-22 22:00:09,624 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.405e+02 1.626e+02 1.844e+02 3.754e+02, threshold=3.253e+02, percent-clipped=1.0 2024-09-22 22:00:25,421 INFO [train.py:1198] (0/4) Epoch 7, batch 550, loss[loss=0.2539, ctc_loss=0.1808, cr_loss=0.3658, over 17201.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.1982, cr_loss=0.3953, over 3140782.82 frames. ], batch size: 41, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 22:00:30,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=111654.66666666667, ans=0.125 2024-09-22 22:00:57,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.72 vs. limit=6.0 2024-09-22 22:01:08,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=111748.0, ans=0.2 2024-09-22 22:01:12,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2024-09-22 22:01:15,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-09-22 22:01:38,788 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:01:40,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=111841.33333333333, ans=0.0 2024-09-22 22:01:47,902 INFO [train.py:1198] (0/4) Epoch 7, batch 600, loss[loss=0.2426, ctc_loss=0.1721, cr_loss=0.3523, over 16950.00 frames. ], tot_loss[loss=0.2784, ctc_loss=0.199, cr_loss=0.397, over 3190419.62 frames. ], batch size: 42, lr: 1.75e-02, grad_scale: 32.0 2024-09-22 22:01:55,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=111888.0, ans=0.125 2024-09-22 22:02:24,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-09-22 22:02:27,183 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-24000.pt 2024-09-22 22:02:37,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=111981.33333333333, ans=0.0 2024-09-22 22:02:48,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-22 22:02:51,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112028.0, ans=0.1 2024-09-22 22:02:54,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=112028.0, ans=0.125 2024-09-22 22:02:58,792 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.416e+02 1.598e+02 2.018e+02 3.504e+02, threshold=3.196e+02, percent-clipped=2.0 2024-09-22 22:03:14,673 INFO [train.py:1198] (0/4) Epoch 7, batch 650, loss[loss=0.2491, ctc_loss=0.1789, cr_loss=0.3509, over 17071.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1991, cr_loss=0.3971, over 3225414.96 frames. ], batch size: 43, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:03:29,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-09-22 22:03:42,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112168.0, ans=0.1 2024-09-22 22:03:46,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=112214.66666666667, ans=0.09899494936611666 2024-09-22 22:03:56,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=112214.66666666667, ans=0.125 2024-09-22 22:04:01,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=112261.33333333333, ans=0.125 2024-09-22 22:04:14,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=112261.33333333333, ans=0.125 2024-09-22 22:04:24,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=112308.0, ans=0.125 2024-09-22 22:04:27,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=112308.0, ans=0.2 2024-09-22 22:04:36,943 INFO [train.py:1198] (0/4) Epoch 7, batch 700, loss[loss=0.244, ctc_loss=0.1708, cr_loss=0.3663, over 17043.00 frames. ], tot_loss[loss=0.2794, ctc_loss=0.1997, cr_loss=0.3984, over 3260180.76 frames. ], batch size: 39, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:04:37,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=112354.66666666667, ans=0.2 2024-09-22 22:04:43,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=112354.66666666667, ans=0.2 2024-09-22 22:05:08,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=112448.0, ans=0.2 2024-09-22 22:05:28,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=112494.66666666667, ans=0.05 2024-09-22 22:05:33,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=112494.66666666667, ans=0.1 2024-09-22 22:05:42,323 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.430e+02 1.586e+02 1.856e+02 3.477e+02, threshold=3.173e+02, percent-clipped=1.0 2024-09-22 22:05:50,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=112541.33333333333, ans=0.125 2024-09-22 22:05:53,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=112541.33333333333, ans=0.0 2024-09-22 22:05:55,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=112541.33333333333, ans=0.025 2024-09-22 22:05:58,155 INFO [train.py:1198] (0/4) Epoch 7, batch 750, loss[loss=0.2995, ctc_loss=0.2125, cr_loss=0.4348, over 16731.00 frames. ], tot_loss[loss=0.2777, ctc_loss=0.1983, cr_loss=0.3973, over 3287717.46 frames. ], batch size: 61, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:06:26,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=112634.66666666667, ans=0.125 2024-09-22 22:06:55,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-22 22:07:04,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=112774.66666666667, ans=0.025 2024-09-22 22:07:19,627 INFO [train.py:1198] (0/4) Epoch 7, batch 800, loss[loss=0.2977, ctc_loss=0.217, cr_loss=0.4031, over 17030.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1995, cr_loss=0.3983, over 3300352.69 frames. ], batch size: 51, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:07:41,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=112868.0, ans=0.0 2024-09-22 22:08:04,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=112914.66666666667, ans=0.025 2024-09-22 22:08:08,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=112961.33333333333, ans=0.2 2024-09-22 22:08:26,195 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.385e+02 1.523e+02 1.724e+02 2.705e+02, threshold=3.046e+02, percent-clipped=0.0 2024-09-22 22:08:41,932 INFO [train.py:1198] (0/4) Epoch 7, batch 850, loss[loss=0.3037, ctc_loss=0.2182, cr_loss=0.4274, over 17304.00 frames. ], tot_loss[loss=0.2794, ctc_loss=0.1998, cr_loss=0.398, over 3315463.84 frames. ], batch size: 51, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:08:50,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=113054.66666666667, ans=0.0 2024-09-22 22:08:55,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=113054.66666666667, ans=0.04949747468305833 2024-09-22 22:09:02,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=113101.33333333333, ans=0.125 2024-09-22 22:09:27,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=113148.0, ans=0.0 2024-09-22 22:09:35,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=113194.66666666667, ans=0.125 2024-09-22 22:09:46,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=113241.33333333333, ans=0.125 2024-09-22 22:09:46,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=113241.33333333333, ans=0.2 2024-09-22 22:09:54,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113241.33333333333, ans=0.1 2024-09-22 22:10:03,663 INFO [train.py:1198] (0/4) Epoch 7, batch 900, loss[loss=0.3146, ctc_loss=0.2238, cr_loss=0.4539, over 17236.00 frames. ], tot_loss[loss=0.2792, ctc_loss=0.1996, cr_loss=0.3982, over 3332342.58 frames. ], batch size: 50, lr: 1.74e-02, grad_scale: 32.0 2024-09-22 22:10:12,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113288.0, ans=0.1 2024-09-22 22:10:17,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=113288.0, ans=0.0 2024-09-22 22:10:23,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=113334.66666666667, ans=0.0 2024-09-22 22:10:24,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.07 vs. limit=15.0 2024-09-22 22:10:54,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=113428.0, ans=0.0 2024-09-22 22:10:58,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=113428.0, ans=0.125 2024-09-22 22:11:03,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=113428.0, ans=0.0 2024-09-22 22:11:03,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=113428.0, ans=0.0 2024-09-22 22:11:04,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=113428.0, ans=0.0 2024-09-22 22:11:06,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=113428.0, ans=0.125 2024-09-22 22:11:09,266 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.239e+02 1.440e+02 1.570e+02 1.832e+02 2.236e+02, threshold=3.140e+02, percent-clipped=0.0 2024-09-22 22:11:25,182 INFO [train.py:1198] (0/4) Epoch 7, batch 950, loss[loss=0.2577, ctc_loss=0.1827, cr_loss=0.3749, over 17013.00 frames. ], tot_loss[loss=0.2793, ctc_loss=0.1996, cr_loss=0.3986, over 3338772.83 frames. ], batch size: 44, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:12:06,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=113614.66666666667, ans=0.125 2024-09-22 22:12:41,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=113708.0, ans=0.125 2024-09-22 22:12:50,311 INFO [train.py:1198] (0/4) Epoch 7, batch 1000, loss[loss=0.3092, ctc_loss=0.2231, cr_loss=0.4305, over 17016.00 frames. ], tot_loss[loss=0.278, ctc_loss=0.1985, cr_loss=0.3971, over 3353326.79 frames. ], batch size: 51, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:12:58,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=113754.66666666667, ans=0.0 2024-09-22 22:13:15,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=113801.33333333333, ans=0.125 2024-09-22 22:13:25,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=113848.0, ans=0.125 2024-09-22 22:13:27,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.92 vs. limit=12.0 2024-09-22 22:13:44,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=113894.66666666667, ans=0.125 2024-09-22 22:13:52,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=113941.33333333333, ans=0.125 2024-09-22 22:13:53,768 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.387e+02 1.549e+02 1.823e+02 4.640e+02, threshold=3.099e+02, percent-clipped=1.0 2024-09-22 22:14:04,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=113941.33333333333, ans=0.2 2024-09-22 22:14:08,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2024-09-22 22:14:12,273 INFO [train.py:1198] (0/4) Epoch 7, batch 1050, loss[loss=0.3351, ctc_loss=0.2474, cr_loss=0.4384, over 15143.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.1981, cr_loss=0.396, over 3351176.35 frames. ], batch size: 89, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:14:14,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2024-09-22 22:14:20,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=113988.0, ans=0.0 2024-09-22 22:14:54,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-09-22 22:15:19,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=114174.66666666667, ans=0.5 2024-09-22 22:15:30,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.32 vs. limit=15.0 2024-09-22 22:15:34,439 INFO [train.py:1198] (0/4) Epoch 7, batch 1100, loss[loss=0.2447, ctc_loss=0.1694, cr_loss=0.3762, over 17271.00 frames. ], tot_loss[loss=0.2779, ctc_loss=0.1986, cr_loss=0.3968, over 3344680.04 frames. ], batch size: 42, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:15:36,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-09-22 22:15:37,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=114221.33333333333, ans=0.125 2024-09-22 22:16:32,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-09-22 22:16:37,934 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.457e+02 1.783e+02 2.141e+02 3.294e+02, threshold=3.566e+02, percent-clipped=3.0 2024-09-22 22:16:56,277 INFO [train.py:1198] (0/4) Epoch 7, batch 1150, loss[loss=0.2793, ctc_loss=0.1975, cr_loss=0.409, over 17311.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1989, cr_loss=0.3978, over 3346466.48 frames. ], batch size: 49, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:17:01,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=114454.66666666667, ans=0.125 2024-09-22 22:17:21,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=114501.33333333333, ans=0.0 2024-09-22 22:17:59,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=114594.66666666667, ans=0.0 2024-09-22 22:18:00,918 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:18:07,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=114641.33333333333, ans=0.0 2024-09-22 22:18:17,892 INFO [train.py:1198] (0/4) Epoch 7, batch 1200, loss[loss=0.252, ctc_loss=0.1759, cr_loss=0.3808, over 16945.00 frames. ], tot_loss[loss=0.2794, ctc_loss=0.1996, cr_loss=0.3988, over 3354237.00 frames. ], batch size: 42, lr: 1.73e-02, grad_scale: 32.0 2024-09-22 22:18:18,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=114688.0, ans=0.125 2024-09-22 22:19:00,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=114781.33333333333, ans=0.125 2024-09-22 22:19:08,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=114828.0, ans=0.125 2024-09-22 22:19:19,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=114828.0, ans=0.125 2024-09-22 22:19:24,538 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.404e+02 1.583e+02 1.854e+02 5.575e+02, threshold=3.166e+02, percent-clipped=2.0 2024-09-22 22:19:40,343 INFO [train.py:1198] (0/4) Epoch 7, batch 1250, loss[loss=0.3137, ctc_loss=0.2258, cr_loss=0.4391, over 17219.00 frames. ], tot_loss[loss=0.2773, ctc_loss=0.198, cr_loss=0.3965, over 3355279.63 frames. ], batch size: 55, lr: 1.72e-02, grad_scale: 32.0 2024-09-22 22:19:48,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=114921.33333333333, ans=0.0 2024-09-22 22:20:11,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=114968.0, ans=0.2 2024-09-22 22:20:35,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=115061.33333333333, ans=0.0 2024-09-22 22:20:56,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=115108.0, ans=0.125 2024-09-22 22:21:01,584 INFO [train.py:1198] (0/4) Epoch 7, batch 1300, loss[loss=0.2784, ctc_loss=0.2007, cr_loss=0.3886, over 17210.00 frames. ], tot_loss[loss=0.278, ctc_loss=0.1986, cr_loss=0.3971, over 3351188.83 frames. ], batch size: 47, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:21:08,359 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-09-22 22:21:17,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=115201.33333333333, ans=0.1 2024-09-22 22:21:28,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=115201.33333333333, ans=0.125 2024-09-22 22:21:51,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=115294.66666666667, ans=0.0 2024-09-22 22:22:09,482 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.485e+02 1.678e+02 2.013e+02 3.139e+02, threshold=3.356e+02, percent-clipped=0.0 2024-09-22 22:22:18,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=115341.33333333333, ans=0.125 2024-09-22 22:22:26,394 INFO [train.py:1198] (0/4) Epoch 7, batch 1350, loss[loss=0.3594, ctc_loss=0.2702, cr_loss=0.446, over 11531.00 frames. ], tot_loss[loss=0.2787, ctc_loss=0.1992, cr_loss=0.3974, over 3332281.56 frames. ], batch size: 123, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:22:49,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=115434.66666666667, ans=0.04949747468305833 2024-09-22 22:22:53,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=115434.66666666667, ans=0.025 2024-09-22 22:22:57,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=115481.33333333333, ans=0.0 2024-09-22 22:23:33,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=115574.66666666667, ans=0.125 2024-09-22 22:23:36,936 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:23:41,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=115574.66666666667, ans=0.07 2024-09-22 22:23:46,224 INFO [train.py:1198] (0/4) Epoch 7, batch 1400, loss[loss=0.2879, ctc_loss=0.203, cr_loss=0.4247, over 17223.00 frames. ], tot_loss[loss=0.2774, ctc_loss=0.1982, cr_loss=0.396, over 3332224.51 frames. ], batch size: 55, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:23:58,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=115621.33333333333, ans=0.125 2024-09-22 22:24:19,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=115714.66666666667, ans=0.0 2024-09-22 22:24:35,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=115761.33333333333, ans=0.125 2024-09-22 22:24:44,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=115761.33333333333, ans=0.1 2024-09-22 22:24:54,140 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.408e+02 1.629e+02 2.097e+02 4.051e+02, threshold=3.259e+02, percent-clipped=2.0 2024-09-22 22:25:04,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=115808.0, ans=0.2 2024-09-22 22:25:09,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=115854.66666666667, ans=0.125 2024-09-22 22:25:10,923 INFO [train.py:1198] (0/4) Epoch 7, batch 1450, loss[loss=0.3297, ctc_loss=0.242, cr_loss=0.4387, over 14940.00 frames. ], tot_loss[loss=0.2766, ctc_loss=0.1976, cr_loss=0.3952, over 3340848.61 frames. ], batch size: 89, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:25:44,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115948.0, ans=0.1 2024-09-22 22:25:48,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115948.0, ans=0.1 2024-09-22 22:25:55,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115948.0, ans=0.1 2024-09-22 22:26:06,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=115994.66666666667, ans=0.2 2024-09-22 22:26:08,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=115994.66666666667, ans=0.0 2024-09-22 22:26:32,311 INFO [train.py:1198] (0/4) Epoch 7, batch 1500, loss[loss=0.2282, ctc_loss=0.1556, cr_loss=0.3631, over 17103.00 frames. ], tot_loss[loss=0.2766, ctc_loss=0.1976, cr_loss=0.3953, over 3346893.16 frames. ], batch size: 43, lr: 1.72e-02, grad_scale: 16.0 2024-09-22 22:26:37,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=116088.0, ans=0.0 2024-09-22 22:26:38,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=116088.0, ans=0.09899494936611666 2024-09-22 22:26:43,678 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:26:50,002 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:27:02,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=116181.33333333333, ans=0.125 2024-09-22 22:27:18,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-09-22 22:27:20,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-22 22:27:29,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=116228.0, ans=0.125 2024-09-22 22:27:40,531 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.248e+02 1.482e+02 1.664e+02 1.936e+02 3.285e+02, threshold=3.328e+02, percent-clipped=1.0 2024-09-22 22:27:41,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.20 vs. limit=10.0 2024-09-22 22:27:43,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=116274.66666666667, ans=0.125 2024-09-22 22:27:54,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.35 vs. limit=15.0 2024-09-22 22:27:54,708 INFO [train.py:1198] (0/4) Epoch 7, batch 1550, loss[loss=0.2153, ctc_loss=0.1527, cr_loss=0.313, over 16958.00 frames. ], tot_loss[loss=0.2758, ctc_loss=0.1967, cr_loss=0.3952, over 3342948.85 frames. ], batch size: 42, lr: 1.71e-02, grad_scale: 16.0 2024-09-22 22:27:56,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=116321.33333333333, ans=0.125 2024-09-22 22:28:26,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=116414.66666666667, ans=0.0 2024-09-22 22:28:48,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=22.5 2024-09-22 22:28:54,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=116461.33333333333, ans=0.2 2024-09-22 22:28:59,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=116508.0, ans=0.125 2024-09-22 22:29:02,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=116508.0, ans=0.0 2024-09-22 22:29:16,547 INFO [train.py:1198] (0/4) Epoch 7, batch 1600, loss[loss=0.3629, ctc_loss=0.2789, cr_loss=0.4201, over 11763.00 frames. ], tot_loss[loss=0.2755, ctc_loss=0.1964, cr_loss=0.3954, over 3344085.92 frames. ], batch size: 123, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:30:18,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2024-09-22 22:30:24,104 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.410e+02 1.579e+02 1.948e+02 3.056e+02, threshold=3.158e+02, percent-clipped=0.0 2024-09-22 22:30:26,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=116741.33333333333, ans=0.0 2024-09-22 22:30:33,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=116741.33333333333, ans=0.0 2024-09-22 22:30:38,422 INFO [train.py:1198] (0/4) Epoch 7, batch 1650, loss[loss=0.2517, ctc_loss=0.1793, cr_loss=0.3618, over 17037.00 frames. ], tot_loss[loss=0.2765, ctc_loss=0.1973, cr_loss=0.3962, over 3348189.48 frames. ], batch size: 44, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:30:57,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=116834.66666666667, ans=0.125 2024-09-22 22:31:00,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=116834.66666666667, ans=0.0 2024-09-22 22:31:13,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=116881.33333333333, ans=0.1 2024-09-22 22:31:40,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=22.5 2024-09-22 22:31:44,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=116974.66666666667, ans=0.125 2024-09-22 22:31:53,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=116974.66666666667, ans=0.125 2024-09-22 22:31:56,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=116974.66666666667, ans=0.2 2024-09-22 22:31:59,729 INFO [train.py:1198] (0/4) Epoch 7, batch 1700, loss[loss=0.2782, ctc_loss=0.1951, cr_loss=0.4156, over 16921.00 frames. ], tot_loss[loss=0.2771, ctc_loss=0.1978, cr_loss=0.3968, over 3345729.98 frames. ], batch size: 58, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:32:16,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=117068.0, ans=0.2 2024-09-22 22:32:34,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=117114.66666666667, ans=0.125 2024-09-22 22:32:38,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=117114.66666666667, ans=0.0 2024-09-22 22:32:40,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=117114.66666666667, ans=0.04949747468305833 2024-09-22 22:32:40,793 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-09-22 22:32:45,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2024-09-22 22:32:48,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=117161.33333333333, ans=0.125 2024-09-22 22:33:07,038 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.332e+02 1.436e+02 1.651e+02 2.321e+02, threshold=2.871e+02, percent-clipped=0.0 2024-09-22 22:33:08,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=117208.0, ans=0.07 2024-09-22 22:33:19,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=117254.66666666667, ans=0.0 2024-09-22 22:33:21,095 INFO [train.py:1198] (0/4) Epoch 7, batch 1750, loss[loss=0.2481, ctc_loss=0.1765, cr_loss=0.3576, over 17069.00 frames. ], tot_loss[loss=0.2775, ctc_loss=0.1981, cr_loss=0.3972, over 3340512.28 frames. ], batch size: 46, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:33:23,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=117254.66666666667, ans=0.0 2024-09-22 22:33:29,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=117254.66666666667, ans=0.125 2024-09-22 22:34:06,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=117348.0, ans=0.0 2024-09-22 22:34:13,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=117394.66666666667, ans=0.0 2024-09-22 22:34:29,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.01 vs. limit=15.0 2024-09-22 22:34:33,888 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 22:34:45,736 INFO [train.py:1198] (0/4) Epoch 7, batch 1800, loss[loss=0.2605, ctc_loss=0.1841, cr_loss=0.3816, over 17094.00 frames. ], tot_loss[loss=0.2783, ctc_loss=0.1987, cr_loss=0.3984, over 3339986.96 frames. ], batch size: 43, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:34:52,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=117488.0, ans=0.0 2024-09-22 22:34:58,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=117488.0, ans=0.125 2024-09-22 22:35:17,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=117581.33333333333, ans=0.0 2024-09-22 22:35:25,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=117581.33333333333, ans=0.125 2024-09-22 22:35:27,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=117581.33333333333, ans=0.0 2024-09-22 22:35:30,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=117581.33333333333, ans=0.125 2024-09-22 22:35:50,924 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.391e+02 1.562e+02 1.960e+02 3.440e+02, threshold=3.125e+02, percent-clipped=2.0 2024-09-22 22:36:05,279 INFO [train.py:1198] (0/4) Epoch 7, batch 1850, loss[loss=0.2496, ctc_loss=0.1776, cr_loss=0.3602, over 17227.00 frames. ], tot_loss[loss=0.277, ctc_loss=0.1976, cr_loss=0.3971, over 3352642.85 frames. ], batch size: 55, lr: 1.71e-02, grad_scale: 32.0 2024-09-22 22:36:07,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117721.33333333333, ans=0.1 2024-09-22 22:36:10,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117721.33333333333, ans=0.1 2024-09-22 22:36:19,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2024-09-22 22:37:25,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=117908.0, ans=0.0 2024-09-22 22:37:25,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=117908.0, ans=0.025 2024-09-22 22:37:29,663 INFO [train.py:1198] (0/4) Epoch 7, batch 1900, loss[loss=0.287, ctc_loss=0.2043, cr_loss=0.4136, over 17234.00 frames. ], tot_loss[loss=0.2771, ctc_loss=0.1976, cr_loss=0.3973, over 3357437.26 frames. ], batch size: 55, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:37:41,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-22 22:37:48,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=118001.33333333333, ans=0.2 2024-09-22 22:38:01,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=118048.0, ans=0.2 2024-09-22 22:38:30,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-22 22:38:37,125 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.224e+02 1.462e+02 1.627e+02 1.880e+02 2.641e+02, threshold=3.255e+02, percent-clipped=0.0 2024-09-22 22:38:47,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2024-09-22 22:38:51,413 INFO [train.py:1198] (0/4) Epoch 7, batch 1950, loss[loss=0.2608, ctc_loss=0.1838, cr_loss=0.3849, over 17105.00 frames. ], tot_loss[loss=0.278, ctc_loss=0.1983, cr_loss=0.3983, over 3358273.99 frames. ], batch size: 49, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:39:20,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=118234.66666666667, ans=0.125 2024-09-22 22:39:28,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=118281.33333333333, ans=0.125 2024-09-22 22:39:34,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=118281.33333333333, ans=0.2 2024-09-22 22:39:42,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=118328.0, ans=0.125 2024-09-22 22:39:44,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2024-09-22 22:39:56,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=118374.66666666667, ans=0.0 2024-09-22 22:40:01,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=118374.66666666667, ans=0.125 2024-09-22 22:40:13,182 INFO [train.py:1198] (0/4) Epoch 7, batch 2000, loss[loss=0.3545, ctc_loss=0.269, cr_loss=0.4274, over 12375.00 frames. ], tot_loss[loss=0.2797, ctc_loss=0.1997, cr_loss=0.4, over 3346863.94 frames. ], batch size: 125, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:40:18,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-22 22:40:19,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=118421.33333333333, ans=0.125 2024-09-22 22:40:42,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=118468.0, ans=0.125 2024-09-22 22:40:58,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=12.0 2024-09-22 22:41:07,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=118561.33333333333, ans=0.125 2024-09-22 22:41:08,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=118561.33333333333, ans=0.0 2024-09-22 22:41:09,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=118561.33333333333, ans=0.125 2024-09-22 22:41:15,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=118561.33333333333, ans=0.0 2024-09-22 22:41:21,330 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.217e+02 1.357e+02 1.496e+02 1.687e+02 2.654e+02, threshold=2.993e+02, percent-clipped=0.0 2024-09-22 22:41:35,547 INFO [train.py:1198] (0/4) Epoch 7, batch 2050, loss[loss=0.2658, ctc_loss=0.1843, cr_loss=0.4075, over 16957.00 frames. ], tot_loss[loss=0.2785, ctc_loss=0.1987, cr_loss=0.3989, over 3353112.49 frames. ], batch size: 42, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:41:44,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2024-09-22 22:41:52,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118701.33333333333, ans=0.1 2024-09-22 22:41:56,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=118701.33333333333, ans=0.125 2024-09-22 22:42:27,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2024-09-22 22:42:36,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-09-22 22:42:37,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=118794.66666666667, ans=0.0 2024-09-22 22:42:45,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=118841.33333333333, ans=0.125 2024-09-22 22:42:50,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=118841.33333333333, ans=0.0 2024-09-22 22:42:51,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=118841.33333333333, ans=0.0 2024-09-22 22:42:58,046 INFO [train.py:1198] (0/4) Epoch 7, batch 2100, loss[loss=0.2992, ctc_loss=0.2208, cr_loss=0.3922, over 16747.00 frames. ], tot_loss[loss=0.2774, ctc_loss=0.1978, cr_loss=0.3981, over 3357969.45 frames. ], batch size: 61, lr: 1.70e-02, grad_scale: 32.0 2024-09-22 22:43:23,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=118934.66666666667, ans=0.1 2024-09-22 22:43:23,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=118934.66666666667, ans=0.1 2024-09-22 22:43:40,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=118981.33333333333, ans=0.125 2024-09-22 22:43:48,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=119028.0, ans=0.0 2024-09-22 22:44:06,954 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.431e+02 1.548e+02 1.785e+02 2.946e+02, threshold=3.097e+02, percent-clipped=0.0 2024-09-22 22:44:19,897 INFO [train.py:1198] (0/4) Epoch 7, batch 2150, loss[loss=0.3233, ctc_loss=0.235, cr_loss=0.4414, over 14837.00 frames. ], tot_loss[loss=0.2768, ctc_loss=0.1974, cr_loss=0.3972, over 3353723.40 frames. ], batch size: 89, lr: 1.70e-02, grad_scale: 16.0 2024-09-22 22:44:20,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=119121.33333333333, ans=0.2 2024-09-22 22:44:30,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=119121.33333333333, ans=0.125 2024-09-22 22:44:41,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=119168.0, ans=0.0 2024-09-22 22:44:42,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-22 22:44:44,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=119168.0, ans=0.1 2024-09-22 22:45:07,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-09-22 22:45:27,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=119308.0, ans=0.125 2024-09-22 22:45:28,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119308.0, ans=0.1 2024-09-22 22:45:33,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=119308.0, ans=0.125 2024-09-22 22:45:33,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119308.0, ans=0.1 2024-09-22 22:45:41,317 INFO [train.py:1198] (0/4) Epoch 7, batch 2200, loss[loss=0.28, ctc_loss=0.1979, cr_loss=0.4102, over 17310.00 frames. ], tot_loss[loss=0.277, ctc_loss=0.1974, cr_loss=0.3979, over 3359693.86 frames. ], batch size: 51, lr: 1.69e-02, grad_scale: 16.0 2024-09-22 22:46:13,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119401.33333333333, ans=0.125 2024-09-22 22:46:22,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=119448.0, ans=0.0 2024-09-22 22:46:30,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=119494.66666666667, ans=0.125 2024-09-22 22:46:53,571 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.357e+02 1.455e+02 1.682e+02 2.486e+02, threshold=2.909e+02, percent-clipped=0.0 2024-09-22 22:47:06,285 INFO [train.py:1198] (0/4) Epoch 7, batch 2250, loss[loss=0.2388, ctc_loss=0.17, cr_loss=0.344, over 17269.00 frames. ], tot_loss[loss=0.277, ctc_loss=0.1973, cr_loss=0.3984, over 3359579.11 frames. ], batch size: 42, lr: 1.69e-02, grad_scale: 16.0 2024-09-22 22:47:24,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-09-22 22:47:35,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=119634.66666666667, ans=0.125 2024-09-22 22:47:41,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=119681.33333333333, ans=0.125 2024-09-22 22:48:07,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=22.5 2024-09-22 22:48:13,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=12.0 2024-09-22 22:48:21,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=119774.66666666667, ans=0.0 2024-09-22 22:48:25,672 INFO [train.py:1198] (0/4) Epoch 7, batch 2300, loss[loss=0.2918, ctc_loss=0.2043, cr_loss=0.4377, over 16911.00 frames. ], tot_loss[loss=0.2766, ctc_loss=0.197, cr_loss=0.3981, over 3359143.74 frames. ], batch size: 58, lr: 1.69e-02, grad_scale: 16.0 2024-09-22 22:48:41,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=119821.33333333333, ans=0.125 2024-09-22 22:48:54,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=119868.0, ans=22.5 2024-09-22 22:48:58,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=119914.66666666667, ans=0.125 2024-09-22 22:49:23,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2024-09-22 22:49:31,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=119961.33333333333, ans=0.035 2024-09-22 22:49:37,566 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.210e+02 1.381e+02 1.499e+02 1.768e+02 3.628e+02, threshold=2.997e+02, percent-clipped=2.0 2024-09-22 22:49:41,857 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-09-22 22:49:47,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=120008.0, ans=0.2 2024-09-22 22:49:50,251 INFO [train.py:1198] (0/4) Epoch 7, batch 2350, loss[loss=0.2959, ctc_loss=0.21, cr_loss=0.4294, over 16760.00 frames. ], tot_loss[loss=0.2767, ctc_loss=0.1971, cr_loss=0.398, over 3359840.66 frames. ], batch size: 61, lr: 1.69e-02, grad_scale: 16.0 2024-09-22 22:50:15,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=120101.33333333333, ans=0.125 2024-09-22 22:50:19,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=120101.33333333333, ans=0.2 2024-09-22 22:50:19,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120101.33333333333, ans=0.1 2024-09-22 22:50:41,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2024-09-22 22:50:44,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=120194.66666666667, ans=0.0 2024-09-22 22:51:03,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=120241.33333333333, ans=0.125 2024-09-22 22:51:12,230 INFO [train.py:1198] (0/4) Epoch 7, batch 2400, loss[loss=0.2897, ctc_loss=0.2062, cr_loss=0.4176, over 17301.00 frames. ], tot_loss[loss=0.2763, ctc_loss=0.1968, cr_loss=0.3974, over 3360879.04 frames. ], batch size: 51, lr: 1.69e-02, grad_scale: 32.0 2024-09-22 22:51:56,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=120381.33333333333, ans=0.025 2024-09-22 22:52:21,490 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.376e+02 1.485e+02 1.691e+02 2.822e+02, threshold=2.971e+02, percent-clipped=0.0 2024-09-22 22:52:34,409 INFO [train.py:1198] (0/4) Epoch 7, batch 2450, loss[loss=0.2819, ctc_loss=0.2048, cr_loss=0.3857, over 17001.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1955, cr_loss=0.3953, over 3365848.44 frames. ], batch size: 53, lr: 1.69e-02, grad_scale: 32.0 2024-09-22 22:52:44,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=120521.33333333333, ans=0.125 2024-09-22 22:52:58,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=12.0 2024-09-22 22:53:12,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=120614.66666666667, ans=0.125 2024-09-22 22:53:20,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120661.33333333333, ans=0.1 2024-09-22 22:53:33,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-09-22 22:53:36,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=120661.33333333333, ans=0.125 2024-09-22 22:53:40,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=120708.0, ans=0.125 2024-09-22 22:53:56,719 INFO [train.py:1198] (0/4) Epoch 7, batch 2500, loss[loss=0.2654, ctc_loss=0.1867, cr_loss=0.3933, over 17222.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1953, cr_loss=0.396, over 3371299.45 frames. ], batch size: 47, lr: 1.69e-02, grad_scale: 32.0 2024-09-22 22:54:05,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=120754.66666666667, ans=0.0 2024-09-22 22:54:08,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=120754.66666666667, ans=0.125 2024-09-22 22:54:13,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=120801.33333333333, ans=0.0 2024-09-22 22:54:30,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=120848.0, ans=0.0 2024-09-22 22:54:39,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=120848.0, ans=0.125 2024-09-22 22:55:05,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2024-09-22 22:55:06,269 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.238e+02 1.457e+02 1.729e+02 2.020e+02 3.233e+02, threshold=3.458e+02, percent-clipped=3.0 2024-09-22 22:55:18,957 INFO [train.py:1198] (0/4) Epoch 7, batch 2550, loss[loss=0.3078, ctc_loss=0.223, cr_loss=0.4243, over 15200.00 frames. ], tot_loss[loss=0.275, ctc_loss=0.1957, cr_loss=0.3961, over 3362760.63 frames. ], batch size: 90, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 22:55:29,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2024-09-22 22:55:46,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=121034.66666666667, ans=0.125 2024-09-22 22:55:55,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=15.0 2024-09-22 22:56:04,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-09-22 22:56:08,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-09-22 22:56:40,346 INFO [train.py:1198] (0/4) Epoch 7, batch 2600, loss[loss=0.2392, ctc_loss=0.166, cr_loss=0.366, over 17020.00 frames. ], tot_loss[loss=0.2759, ctc_loss=0.1966, cr_loss=0.3964, over 3350338.30 frames. ], batch size: 39, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 22:57:00,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=121268.0, ans=0.125 2024-09-22 22:57:05,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2024-09-22 22:57:32,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=121361.33333333333, ans=0.0 2024-09-22 22:57:35,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=121361.33333333333, ans=0.0 2024-09-22 22:57:47,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=22.5 2024-09-22 22:57:49,631 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.441e+02 1.588e+02 1.825e+02 2.905e+02, threshold=3.176e+02, percent-clipped=0.0 2024-09-22 22:57:59,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=121408.0, ans=0.2 2024-09-22 22:58:02,510 INFO [train.py:1198] (0/4) Epoch 7, batch 2650, loss[loss=0.2713, ctc_loss=0.1925, cr_loss=0.3939, over 16796.00 frames. ], tot_loss[loss=0.2748, ctc_loss=0.1958, cr_loss=0.3949, over 3358370.81 frames. ], batch size: 61, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 22:58:09,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=121454.66666666667, ans=0.0 2024-09-22 22:58:17,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=121501.33333333333, ans=0.09899494936611666 2024-09-22 22:58:19,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=22.5 2024-09-22 22:58:30,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=121501.33333333333, ans=0.125 2024-09-22 22:58:44,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=121548.0, ans=0.0 2024-09-22 22:58:52,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=121594.66666666667, ans=0.0 2024-09-22 22:59:00,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=121594.66666666667, ans=0.0 2024-09-22 22:59:21,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=121641.33333333333, ans=0.125 2024-09-22 22:59:27,168 INFO [train.py:1198] (0/4) Epoch 7, batch 2700, loss[loss=0.2274, ctc_loss=0.1568, cr_loss=0.3532, over 17035.00 frames. ], tot_loss[loss=0.2741, ctc_loss=0.1952, cr_loss=0.3944, over 3361077.10 frames. ], batch size: 39, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 22:59:28,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=15.0 2024-09-22 22:59:45,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=121734.66666666667, ans=0.0 2024-09-22 23:00:05,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=121781.33333333333, ans=0.125 2024-09-22 23:00:27,519 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:00:33,554 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.410e+02 1.530e+02 1.699e+02 3.124e+02, threshold=3.060e+02, percent-clipped=0.0 2024-09-22 23:00:48,619 INFO [train.py:1198] (0/4) Epoch 7, batch 2750, loss[loss=0.2369, ctc_loss=0.1654, cr_loss=0.3576, over 17202.00 frames. ], tot_loss[loss=0.2736, ctc_loss=0.1947, cr_loss=0.3943, over 3362057.14 frames. ], batch size: 41, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 23:00:58,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.53 vs. limit=15.0 2024-09-22 23:01:09,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=121968.0, ans=0.0 2024-09-22 23:01:12,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=121968.0, ans=0.0 2024-09-22 23:01:12,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=121968.0, ans=0.0 2024-09-22 23:01:50,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=122061.33333333333, ans=0.125 2024-09-22 23:02:01,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=122108.0, ans=0.04949747468305833 2024-09-22 23:02:10,732 INFO [train.py:1198] (0/4) Epoch 7, batch 2800, loss[loss=0.2797, ctc_loss=0.1976, cr_loss=0.4106, over 17028.00 frames. ], tot_loss[loss=0.2738, ctc_loss=0.1948, cr_loss=0.3951, over 3368419.85 frames. ], batch size: 51, lr: 1.68e-02, grad_scale: 32.0 2024-09-22 23:02:35,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122201.33333333333, ans=0.1 2024-09-22 23:03:17,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-22 23:03:18,222 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.349e+02 1.457e+02 1.581e+02 2.828e+02, threshold=2.913e+02, percent-clipped=0.0 2024-09-22 23:03:18,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=122341.33333333333, ans=0.125 2024-09-22 23:03:33,472 INFO [train.py:1198] (0/4) Epoch 7, batch 2850, loss[loss=0.2372, ctc_loss=0.1643, cr_loss=0.3647, over 17095.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1941, cr_loss=0.3941, over 3369562.54 frames. ], batch size: 40, lr: 1.67e-02, grad_scale: 32.0 2024-09-22 23:03:40,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=122388.0, ans=0.0 2024-09-22 23:04:23,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2024-09-22 23:04:34,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=122528.0, ans=0.0 2024-09-22 23:04:42,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=122574.66666666667, ans=0.2 2024-09-22 23:04:43,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2024-09-22 23:04:44,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=122574.66666666667, ans=0.0 2024-09-22 23:04:55,303 INFO [train.py:1198] (0/4) Epoch 7, batch 2900, loss[loss=0.3148, ctc_loss=0.2243, cr_loss=0.4526, over 17353.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1946, cr_loss=0.3955, over 3370170.41 frames. ], batch size: 48, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:05:43,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2024-09-22 23:05:55,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=22.5 2024-09-22 23:06:05,785 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.190e+02 1.397e+02 1.534e+02 1.877e+02 3.701e+02, threshold=3.067e+02, percent-clipped=2.0 2024-09-22 23:06:06,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=122808.0, ans=0.1 2024-09-22 23:06:12,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=122808.0, ans=0.2 2024-09-22 23:06:16,925 INFO [train.py:1198] (0/4) Epoch 7, batch 2950, loss[loss=0.2635, ctc_loss=0.1868, cr_loss=0.3837, over 17295.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1951, cr_loss=0.3965, over 3374158.49 frames. ], batch size: 49, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:06:43,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=122901.33333333333, ans=0.025 2024-09-22 23:06:48,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=122901.33333333333, ans=0.025 2024-09-22 23:07:10,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=122994.66666666667, ans=0.125 2024-09-22 23:07:24,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=123041.33333333333, ans=0.125 2024-09-22 23:07:38,711 INFO [train.py:1198] (0/4) Epoch 7, batch 3000, loss[loss=0.2282, ctc_loss=0.1581, cr_loss=0.3509, over 16683.00 frames. ], tot_loss[loss=0.2735, ctc_loss=0.1945, cr_loss=0.395, over 3367514.10 frames. ], batch size: 37, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:07:38,712 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 23:07:54,142 INFO [train.py:1230] (0/4) Epoch 7, validation: loss=0.05688, ctc_loss=0.05688, cr_loss=7.669e-15, over 944034.00 frames. 2024-09-22 23:07:54,143 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 23:08:02,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-22 23:08:40,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=123228.0, ans=0.0 2024-09-22 23:08:48,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=123228.0, ans=0.2 2024-09-22 23:08:51,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=123228.0, ans=0.0 2024-09-22 23:08:54,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=123228.0, ans=0.0 2024-09-22 23:08:59,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=123274.66666666667, ans=0.1 2024-09-22 23:09:01,891 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.381e+02 1.504e+02 1.827e+02 5.428e+02, threshold=3.008e+02, percent-clipped=9.0 2024-09-22 23:09:12,842 INFO [train.py:1198] (0/4) Epoch 7, batch 3050, loss[loss=0.2962, ctc_loss=0.2139, cr_loss=0.4113, over 16929.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1937, cr_loss=0.3946, over 3373384.08 frames. ], batch size: 58, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:09:35,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=123368.0, ans=0.0 2024-09-22 23:10:16,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=123508.0, ans=0.0 2024-09-22 23:10:33,468 INFO [train.py:1198] (0/4) Epoch 7, batch 3100, loss[loss=0.2339, ctc_loss=0.1619, cr_loss=0.3601, over 17102.00 frames. ], tot_loss[loss=0.2729, ctc_loss=0.1938, cr_loss=0.3955, over 3376742.41 frames. ], batch size: 40, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:10:58,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=123601.33333333333, ans=0.125 2024-09-22 23:11:42,861 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.455e+02 1.593e+02 1.857e+02 3.038e+02, threshold=3.186e+02, percent-clipped=1.0 2024-09-22 23:11:53,644 INFO [train.py:1198] (0/4) Epoch 7, batch 3150, loss[loss=0.2435, ctc_loss=0.1692, cr_loss=0.3716, over 17032.00 frames. ], tot_loss[loss=0.2723, ctc_loss=0.1934, cr_loss=0.3947, over 3370049.36 frames. ], batch size: 39, lr: 1.67e-02, grad_scale: 16.0 2024-09-22 23:11:54,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=22.5 2024-09-22 23:12:18,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2024-09-22 23:12:22,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2024-09-22 23:12:28,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=123881.33333333333, ans=15.0 2024-09-22 23:13:06,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=12.0 2024-09-22 23:13:12,038 INFO [train.py:1198] (0/4) Epoch 7, batch 3200, loss[loss=0.2651, ctc_loss=0.189, cr_loss=0.3806, over 17229.00 frames. ], tot_loss[loss=0.2739, ctc_loss=0.1947, cr_loss=0.3961, over 3365485.75 frames. ], batch size: 47, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:14:19,024 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.425e+02 1.561e+02 1.765e+02 3.696e+02, threshold=3.122e+02, percent-clipped=1.0 2024-09-22 23:14:22,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=124208.0, ans=0.125 2024-09-22 23:14:29,940 INFO [train.py:1198] (0/4) Epoch 7, batch 3250, loss[loss=0.2751, ctc_loss=0.1948, cr_loss=0.4014, over 17304.00 frames. ], tot_loss[loss=0.2747, ctc_loss=0.1953, cr_loss=0.3968, over 3358985.67 frames. ], batch size: 51, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:14:43,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=124254.66666666667, ans=0.125 2024-09-22 23:14:48,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2024-09-22 23:15:23,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=124394.66666666667, ans=0.025 2024-09-22 23:15:34,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=124441.33333333333, ans=0.025 2024-09-22 23:15:49,579 INFO [train.py:1198] (0/4) Epoch 7, batch 3300, loss[loss=0.3006, ctc_loss=0.218, cr_loss=0.4127, over 16938.00 frames. ], tot_loss[loss=0.2751, ctc_loss=0.1956, cr_loss=0.3977, over 3370114.63 frames. ], batch size: 58, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:15:51,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=124488.0, ans=0.2 2024-09-22 23:15:52,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=124488.0, ans=0.0 2024-09-22 23:16:18,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=124534.66666666667, ans=0.1 2024-09-22 23:16:22,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=124581.33333333333, ans=0.0 2024-09-22 23:16:42,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=124628.0, ans=0.125 2024-09-22 23:16:44,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=124628.0, ans=0.2 2024-09-22 23:16:50,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=124628.0, ans=0.025 2024-09-22 23:16:55,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=124674.66666666667, ans=0.1 2024-09-22 23:16:58,144 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.199e+02 1.418e+02 1.603e+02 1.871e+02 3.678e+02, threshold=3.206e+02, percent-clipped=3.0 2024-09-22 23:17:04,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=124674.66666666667, ans=0.025 2024-09-22 23:17:07,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=124721.33333333333, ans=0.0 2024-09-22 23:17:09,116 INFO [train.py:1198] (0/4) Epoch 7, batch 3350, loss[loss=0.2739, ctc_loss=0.1994, cr_loss=0.3728, over 16555.00 frames. ], tot_loss[loss=0.2745, ctc_loss=0.1952, cr_loss=0.3961, over 3362054.25 frames. ], batch size: 66, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:17:28,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124768.0, ans=0.1 2024-09-22 23:17:38,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=124814.66666666667, ans=0.0 2024-09-22 23:18:05,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=124861.33333333333, ans=0.025 2024-09-22 23:18:21,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-09-22 23:18:24,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=124908.0, ans=0.0 2024-09-22 23:18:26,925 INFO [train.py:1198] (0/4) Epoch 7, batch 3400, loss[loss=0.2419, ctc_loss=0.1711, cr_loss=0.3538, over 17256.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1952, cr_loss=0.3961, over 3363178.09 frames. ], batch size: 42, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:18:27,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2024-09-22 23:18:43,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=125001.33333333333, ans=0.125 2024-09-22 23:18:59,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-09-22 23:19:02,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=125048.0, ans=0.1 2024-09-22 23:19:11,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=125048.0, ans=0.025 2024-09-22 23:19:17,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2024-09-22 23:19:19,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=125094.66666666667, ans=0.125 2024-09-22 23:19:34,690 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.374e+02 1.483e+02 1.750e+02 2.564e+02, threshold=2.966e+02, percent-clipped=0.0 2024-09-22 23:19:45,724 INFO [train.py:1198] (0/4) Epoch 7, batch 3450, loss[loss=0.2731, ctc_loss=0.1928, cr_loss=0.4015, over 17292.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1953, cr_loss=0.3955, over 3352137.30 frames. ], batch size: 49, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:20:13,171 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:20:22,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=125281.33333333333, ans=0.0 2024-09-22 23:20:23,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125281.33333333333, ans=0.1 2024-09-22 23:20:33,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=125328.0, ans=0.2 2024-09-22 23:20:56,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=125374.66666666667, ans=0.1 2024-09-22 23:20:57,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=22.5 2024-09-22 23:21:05,781 INFO [train.py:1198] (0/4) Epoch 7, batch 3500, loss[loss=0.2763, ctc_loss=0.1966, cr_loss=0.3986, over 17095.00 frames. ], tot_loss[loss=0.2744, ctc_loss=0.1953, cr_loss=0.3954, over 3347176.93 frames. ], batch size: 43, lr: 1.66e-02, grad_scale: 32.0 2024-09-22 23:21:13,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=125421.33333333333, ans=0.0 2024-09-22 23:21:14,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.56 vs. limit=10.0 2024-09-22 23:22:14,393 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.203e+02 1.377e+02 1.536e+02 1.902e+02 3.624e+02, threshold=3.071e+02, percent-clipped=2.0 2024-09-22 23:22:25,094 INFO [train.py:1198] (0/4) Epoch 7, batch 3550, loss[loss=0.2694, ctc_loss=0.189, cr_loss=0.4019, over 17218.00 frames. ], tot_loss[loss=0.2738, ctc_loss=0.1948, cr_loss=0.395, over 3349584.43 frames. ], batch size: 50, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:22:39,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=125701.33333333333, ans=0.2 2024-09-22 23:22:42,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=125701.33333333333, ans=0.025 2024-09-22 23:22:48,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=125701.33333333333, ans=0.125 2024-09-22 23:22:59,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=125748.0, ans=0.125 2024-09-22 23:22:59,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=125748.0, ans=0.125 2024-09-22 23:23:05,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=125748.0, ans=0.125 2024-09-22 23:23:29,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2024-09-22 23:23:39,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=125841.33333333333, ans=0.09899494936611666 2024-09-22 23:23:41,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=125888.0, ans=0.0 2024-09-22 23:23:42,344 INFO [train.py:1198] (0/4) Epoch 7, batch 3600, loss[loss=0.2965, ctc_loss=0.2118, cr_loss=0.4236, over 17310.00 frames. ], tot_loss[loss=0.2727, ctc_loss=0.1938, cr_loss=0.3945, over 3358985.91 frames. ], batch size: 49, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:23:48,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=125888.0, ans=0.125 2024-09-22 23:23:57,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=125888.0, ans=0.2 2024-09-22 23:24:14,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125981.33333333333, ans=0.1 2024-09-22 23:24:24,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=125981.33333333333, ans=0.0 2024-09-22 23:24:34,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=126028.0, ans=0.125 2024-09-22 23:24:34,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.81 vs. limit=10.0 2024-09-22 23:24:37,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126028.0, ans=0.1 2024-09-22 23:24:46,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=126074.66666666667, ans=0.125 2024-09-22 23:24:49,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=126074.66666666667, ans=0.125 2024-09-22 23:24:50,666 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.222e+02 1.445e+02 1.618e+02 1.998e+02 3.129e+02, threshold=3.236e+02, percent-clipped=1.0 2024-09-22 23:24:51,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-09-22 23:24:58,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=126074.66666666667, ans=0.2 2024-09-22 23:25:01,500 INFO [train.py:1198] (0/4) Epoch 7, batch 3650, loss[loss=0.2986, ctc_loss=0.2149, cr_loss=0.4182, over 15884.00 frames. ], tot_loss[loss=0.2728, ctc_loss=0.1939, cr_loss=0.3944, over 3359166.47 frames. ], batch size: 74, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:25:08,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-09-22 23:25:27,485 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:25:27,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-22 23:25:30,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=22.5 2024-09-22 23:25:40,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2024-09-22 23:25:40,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-22 23:25:50,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=126261.33333333333, ans=0.0 2024-09-22 23:26:13,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=126308.0, ans=0.125 2024-09-22 23:26:22,288 INFO [train.py:1198] (0/4) Epoch 7, batch 3700, loss[loss=0.2388, ctc_loss=0.1623, cr_loss=0.3826, over 16662.00 frames. ], tot_loss[loss=0.272, ctc_loss=0.1933, cr_loss=0.3935, over 3354269.03 frames. ], batch size: 37, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:26:25,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=126354.66666666667, ans=0.125 2024-09-22 23:26:31,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-09-22 23:27:05,248 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:27:09,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=126494.66666666667, ans=0.2 2024-09-22 23:27:12,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2024-09-22 23:27:13,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=126494.66666666667, ans=0.125 2024-09-22 23:27:21,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=126494.66666666667, ans=0.125 2024-09-22 23:27:30,063 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.388e+02 1.512e+02 1.895e+02 2.751e+02, threshold=3.024e+02, percent-clipped=0.0 2024-09-22 23:27:41,023 INFO [train.py:1198] (0/4) Epoch 7, batch 3750, loss[loss=0.2542, ctc_loss=0.1818, cr_loss=0.3621, over 17047.00 frames. ], tot_loss[loss=0.2716, ctc_loss=0.193, cr_loss=0.3932, over 3358133.00 frames. ], batch size: 39, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:27:54,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126634.66666666667, ans=0.1 2024-09-22 23:28:01,374 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:28:13,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=126681.33333333333, ans=0.125 2024-09-22 23:28:18,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=126681.33333333333, ans=0.125 2024-09-22 23:28:24,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126681.33333333333, ans=0.1 2024-09-22 23:28:40,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=126728.0, ans=0.0 2024-09-22 23:28:45,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=126774.66666666667, ans=0.125 2024-09-22 23:28:59,019 INFO [train.py:1198] (0/4) Epoch 7, batch 3800, loss[loss=0.3473, ctc_loss=0.263, cr_loss=0.4215, over 11226.00 frames. ], tot_loss[loss=0.2726, ctc_loss=0.1939, cr_loss=0.3935, over 3341079.87 frames. ], batch size: 123, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:28:59,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=126821.33333333333, ans=0.125 2024-09-22 23:29:17,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126868.0, ans=0.1 2024-09-22 23:29:27,150 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:30:02,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=127008.0, ans=0.0 2024-09-22 23:30:07,021 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.225e+02 1.460e+02 1.621e+02 1.904e+02 2.959e+02, threshold=3.241e+02, percent-clipped=0.0 2024-09-22 23:30:17,788 INFO [train.py:1198] (0/4) Epoch 7, batch 3850, loss[loss=0.3216, ctc_loss=0.242, cr_loss=0.3982, over 12097.00 frames. ], tot_loss[loss=0.2762, ctc_loss=0.1972, cr_loss=0.3952, over 3295011.97 frames. ], batch size: 124, lr: 1.65e-02, grad_scale: 32.0 2024-09-22 23:30:22,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=127054.66666666667, ans=0.0 2024-09-22 23:30:25,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=127054.66666666667, ans=0.125 2024-09-22 23:30:40,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=127101.33333333333, ans=0.125 2024-09-22 23:30:45,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.40 vs. limit=10.0 2024-09-22 23:30:48,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=127148.0, ans=0.125 2024-09-22 23:31:24,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=127241.33333333333, ans=0.2 2024-09-22 23:31:28,153 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-7.pt 2024-09-22 23:32:20,223 INFO [train.py:1198] (0/4) Epoch 8, batch 0, loss[loss=0.3017, ctc_loss=0.2214, cr_loss=0.4011, over 17013.00 frames. ], tot_loss[loss=0.3017, ctc_loss=0.2214, cr_loss=0.4011, over 17013.00 frames. ], batch size: 51, lr: 1.55e-02, grad_scale: 32.0 2024-09-22 23:32:20,224 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-22 23:32:27,253 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.8067, 3.3339, 3.2045, 3.8305, 3.0119, 3.2048, 3.7269, 3.9222], device='cuda:0') 2024-09-22 23:32:35,554 INFO [train.py:1230] (0/4) Epoch 8, validation: loss=0.05692, ctc_loss=0.05692, cr_loss=7.316e-15, over 944034.00 frames. 2024-09-22 23:32:35,555 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-22 23:33:28,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=127409.33333333333, ans=0.125 2024-09-22 23:33:31,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=127409.33333333333, ans=0.125 2024-09-22 23:33:52,400 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.394e+02 1.752e+02 2.098e+02 6.301e+02, threshold=3.504e+02, percent-clipped=3.0 2024-09-22 23:33:55,647 INFO [train.py:1198] (0/4) Epoch 8, batch 50, loss[loss=0.2746, ctc_loss=0.1868, cr_loss=0.4389, over 17020.00 frames. ], tot_loss[loss=0.2759, ctc_loss=0.1959, cr_loss=0.4001, over 752600.78 frames. ], batch size: 52, lr: 1.55e-02, grad_scale: 32.0 2024-09-22 23:34:03,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.31 vs. limit=15.0 2024-09-22 23:34:20,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-09-22 23:34:51,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=127642.66666666667, ans=0.1 2024-09-22 23:34:52,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2024-09-22 23:35:19,392 INFO [train.py:1198] (0/4) Epoch 8, batch 100, loss[loss=0.3743, ctc_loss=0.2835, cr_loss=0.4541, over 11585.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1946, cr_loss=0.3981, over 1322955.82 frames. ], batch size: 123, lr: 1.55e-02, grad_scale: 32.0 2024-09-22 23:35:48,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=127782.66666666667, ans=0.125 2024-09-22 23:35:55,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=127829.33333333333, ans=0.125 2024-09-22 23:35:57,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=127829.33333333333, ans=0.125 2024-09-22 23:36:11,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.94 vs. limit=15.0 2024-09-22 23:36:37,480 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.179e+02 1.356e+02 1.486e+02 1.737e+02 3.200e+02, threshold=2.973e+02, percent-clipped=0.0 2024-09-22 23:36:40,698 INFO [train.py:1198] (0/4) Epoch 8, batch 150, loss[loss=0.2592, ctc_loss=0.181, cr_loss=0.391, over 17214.00 frames. ], tot_loss[loss=0.2737, ctc_loss=0.1942, cr_loss=0.3977, over 1776440.46 frames. ], batch size: 50, lr: 1.55e-02, grad_scale: 32.0 2024-09-22 23:36:47,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=127969.33333333333, ans=0.2 2024-09-22 23:37:04,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=128016.0, ans=0.0 2024-09-22 23:37:32,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=128109.33333333333, ans=0.07 2024-09-22 23:37:44,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=128156.0, ans=0.07 2024-09-22 23:37:54,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=128156.0, ans=0.125 2024-09-22 23:37:59,605 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:38:02,307 INFO [train.py:1198] (0/4) Epoch 8, batch 200, loss[loss=0.341, ctc_loss=0.2475, cr_loss=0.4674, over 15432.00 frames. ], tot_loss[loss=0.2742, ctc_loss=0.1947, cr_loss=0.3977, over 2110257.36 frames. ], batch size: 89, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:38:45,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=128296.0, ans=0.1 2024-09-22 23:38:50,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2024-09-22 23:39:12,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=128389.33333333333, ans=0.125 2024-09-22 23:39:16,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2024-09-22 23:39:20,523 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 1.362e+02 1.519e+02 1.695e+02 2.654e+02, threshold=3.037e+02, percent-clipped=0.0 2024-09-22 23:39:23,710 INFO [train.py:1198] (0/4) Epoch 8, batch 250, loss[loss=0.2854, ctc_loss=0.2059, cr_loss=0.3979, over 17225.00 frames. ], tot_loss[loss=0.2733, ctc_loss=0.194, cr_loss=0.3966, over 2390234.23 frames. ], batch size: 50, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:39:23,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=128436.0, ans=0.125 2024-09-22 23:39:32,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=128436.0, ans=0.025 2024-09-22 23:39:52,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=128482.66666666667, ans=0.125 2024-09-22 23:39:55,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2024-09-22 23:40:12,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=128529.33333333333, ans=12.0 2024-09-22 23:40:23,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=12.0 2024-09-22 23:40:32,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=128622.66666666667, ans=0.125 2024-09-22 23:40:46,250 INFO [train.py:1198] (0/4) Epoch 8, batch 300, loss[loss=0.2806, ctc_loss=0.1968, cr_loss=0.419, over 17133.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.1923, cr_loss=0.3947, over 2609670.15 frames. ], batch size: 48, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:41:08,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=128716.0, ans=0.2 2024-09-22 23:41:15,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=12.0 2024-09-22 23:41:18,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.24 vs. limit=6.0 2024-09-22 23:41:47,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=128809.33333333333, ans=0.0 2024-09-22 23:41:56,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=128856.0, ans=0.2 2024-09-22 23:42:07,483 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.437e+02 1.612e+02 1.809e+02 3.626e+02, threshold=3.224e+02, percent-clipped=3.0 2024-09-22 23:42:09,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=128902.66666666667, ans=0.125 2024-09-22 23:42:10,766 INFO [train.py:1198] (0/4) Epoch 8, batch 350, loss[loss=0.2369, ctc_loss=0.167, cr_loss=0.3494, over 16952.00 frames. ], tot_loss[loss=0.2718, ctc_loss=0.1928, cr_loss=0.3949, over 2773787.17 frames. ], batch size: 42, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:42:17,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2024-09-22 23:42:22,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128902.66666666667, ans=0.1 2024-09-22 23:42:45,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.62 vs. limit=22.5 2024-09-22 23:42:46,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.26 vs. limit=12.0 2024-09-22 23:43:08,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=129042.66666666667, ans=0.125 2024-09-22 23:43:08,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-09-22 23:43:13,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=129089.33333333333, ans=0.07 2024-09-22 23:43:14,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=129089.33333333333, ans=0.0 2024-09-22 23:43:16,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=129089.33333333333, ans=0.125 2024-09-22 23:43:30,393 INFO [train.py:1198] (0/4) Epoch 8, batch 400, loss[loss=0.3217, ctc_loss=0.2314, cr_loss=0.4514, over 15069.00 frames. ], tot_loss[loss=0.2713, ctc_loss=0.1924, cr_loss=0.3943, over 2901838.61 frames. ], batch size: 90, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:43:40,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-09-22 23:43:52,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=129182.66666666667, ans=0.0 2024-09-22 23:43:56,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=129182.66666666667, ans=0.125 2024-09-22 23:44:29,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=129276.0, ans=0.09899494936611666 2024-09-22 23:44:30,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=129276.0, ans=0.125 2024-09-22 23:44:47,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2024-09-22 23:44:49,603 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 1.364e+02 1.519e+02 1.603e+02 2.870e+02, threshold=3.038e+02, percent-clipped=0.0 2024-09-22 23:44:52,859 INFO [train.py:1198] (0/4) Epoch 8, batch 450, loss[loss=0.26, ctc_loss=0.1831, cr_loss=0.3848, over 17034.00 frames. ], tot_loss[loss=0.2706, ctc_loss=0.1918, cr_loss=0.394, over 2999615.71 frames. ], batch size: 51, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:45:13,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=129416.0, ans=0.2 2024-09-22 23:45:43,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129509.33333333333, ans=0.1 2024-09-22 23:45:56,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2024-09-22 23:46:18,033 INFO [train.py:1198] (0/4) Epoch 8, batch 500, loss[loss=0.276, ctc_loss=0.1999, cr_loss=0.3803, over 17326.00 frames. ], tot_loss[loss=0.2717, ctc_loss=0.1926, cr_loss=0.3951, over 3076102.87 frames. ], batch size: 51, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:47:36,030 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.377e+02 1.524e+02 1.726e+02 2.894e+02, threshold=3.047e+02, percent-clipped=0.0 2024-09-22 23:47:39,314 INFO [train.py:1198] (0/4) Epoch 8, batch 550, loss[loss=0.2836, ctc_loss=0.2024, cr_loss=0.4063, over 16988.00 frames. ], tot_loss[loss=0.27, ctc_loss=0.1913, cr_loss=0.3934, over 3135584.95 frames. ], batch size: 56, lr: 1.54e-02, grad_scale: 32.0 2024-09-22 23:47:39,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=129836.0, ans=0.0 2024-09-22 23:47:42,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=129836.0, ans=0.2 2024-09-22 23:47:55,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=129882.66666666667, ans=0.125 2024-09-22 23:48:13,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=129929.33333333333, ans=0.0 2024-09-22 23:48:21,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=129929.33333333333, ans=0.125 2024-09-22 23:48:25,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=129976.0, ans=0.07 2024-09-22 23:48:35,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2024-09-22 23:48:37,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2024-09-22 23:48:58,804 INFO [train.py:1198] (0/4) Epoch 8, batch 600, loss[loss=0.2618, ctc_loss=0.183, cr_loss=0.3937, over 17168.00 frames. ], tot_loss[loss=0.2682, ctc_loss=0.1899, cr_loss=0.3913, over 3187881.35 frames. ], batch size: 45, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:49:02,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=130069.33333333333, ans=0.025 2024-09-22 23:49:19,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=130116.0, ans=0.125 2024-09-22 23:49:42,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2024-09-22 23:50:04,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=130209.33333333333, ans=0.2 2024-09-22 23:50:18,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=130256.0, ans=0.125 2024-09-22 23:50:20,278 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.396e+02 1.531e+02 1.850e+02 5.586e+02, threshold=3.061e+02, percent-clipped=2.0 2024-09-22 23:50:23,505 INFO [train.py:1198] (0/4) Epoch 8, batch 650, loss[loss=0.2409, ctc_loss=0.1651, cr_loss=0.379, over 16700.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1897, cr_loss=0.3918, over 3232636.03 frames. ], batch size: 37, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:50:25,443 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-22 23:50:41,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=130349.33333333333, ans=0.025 2024-09-22 23:50:52,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2024-09-22 23:51:00,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=130396.0, ans=0.125 2024-09-22 23:51:00,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2024-09-22 23:51:42,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=130489.33333333333, ans=0.025 2024-09-22 23:51:47,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=130536.0, ans=0.0 2024-09-22 23:51:48,929 INFO [train.py:1198] (0/4) Epoch 8, batch 700, loss[loss=0.3084, ctc_loss=0.2216, cr_loss=0.4336, over 17144.00 frames. ], tot_loss[loss=0.2687, ctc_loss=0.1902, cr_loss=0.3927, over 3261207.07 frames. ], batch size: 48, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:51:50,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=130536.0, ans=0.2 2024-09-22 23:52:04,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=130582.66666666667, ans=0.0 2024-09-22 23:52:26,113 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.77 vs. limit=10.0 2024-09-22 23:52:28,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=130629.33333333333, ans=0.125 2024-09-22 23:52:30,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=130629.33333333333, ans=0.125 2024-09-22 23:52:30,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=130629.33333333333, ans=22.5 2024-09-22 23:52:31,913 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-28000.pt 2024-09-22 23:52:35,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=130629.33333333333, ans=0.0 2024-09-22 23:52:51,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=130676.0, ans=0.125 2024-09-22 23:53:07,237 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.368e+02 1.561e+02 1.822e+02 3.141e+02, threshold=3.123e+02, percent-clipped=1.0 2024-09-22 23:53:07,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=130722.66666666667, ans=0.125 2024-09-22 23:53:10,433 INFO [train.py:1198] (0/4) Epoch 8, batch 750, loss[loss=0.3822, ctc_loss=0.2895, cr_loss=0.4637, over 11372.00 frames. ], tot_loss[loss=0.2676, ctc_loss=0.1894, cr_loss=0.3911, over 3276577.91 frames. ], batch size: 123, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:53:34,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=130816.0, ans=0.0 2024-09-22 23:53:47,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=130862.66666666667, ans=0.125 2024-09-22 23:54:09,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=130909.33333333333, ans=0.1 2024-09-22 23:54:12,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=130909.33333333333, ans=0.0 2024-09-22 23:54:15,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=130956.0, ans=0.025 2024-09-22 23:54:33,314 INFO [train.py:1198] (0/4) Epoch 8, batch 800, loss[loss=0.2922, ctc_loss=0.2143, cr_loss=0.3893, over 17084.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.189, cr_loss=0.3904, over 3302101.46 frames. ], batch size: 46, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:54:34,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-09-22 23:54:37,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.69 vs. limit=15.0 2024-09-22 23:54:47,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=131049.33333333333, ans=0.125 2024-09-22 23:54:59,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=131049.33333333333, ans=0.0 2024-09-22 23:55:07,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=131096.0, ans=0.0 2024-09-22 23:55:22,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.56 vs. limit=15.0 2024-09-22 23:55:27,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=131142.66666666666, ans=0.09899494936611666 2024-09-22 23:55:30,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=131142.66666666666, ans=0.0 2024-09-22 23:55:54,936 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.424e+02 1.641e+02 1.911e+02 2.980e+02, threshold=3.282e+02, percent-clipped=0.0 2024-09-22 23:55:58,177 INFO [train.py:1198] (0/4) Epoch 8, batch 850, loss[loss=0.2956, ctc_loss=0.2123, cr_loss=0.4164, over 16010.00 frames. ], tot_loss[loss=0.2699, ctc_loss=0.1913, cr_loss=0.3931, over 3300855.98 frames. ], batch size: 74, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:56:14,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=131282.66666666666, ans=0.125 2024-09-22 23:56:30,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=131282.66666666666, ans=0.1 2024-09-22 23:56:44,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=131329.33333333334, ans=0.0 2024-09-22 23:57:21,043 INFO [train.py:1198] (0/4) Epoch 8, batch 900, loss[loss=0.2427, ctc_loss=0.1711, cr_loss=0.3583, over 17110.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1901, cr_loss=0.3918, over 3312480.50 frames. ], batch size: 40, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:57:30,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=131469.33333333334, ans=0.09899494936611666 2024-09-22 23:57:38,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=131516.0, ans=0.125 2024-09-22 23:57:56,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=131562.66666666666, ans=0.125 2024-09-22 23:57:59,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=131562.66666666666, ans=0.09899494936611666 2024-09-22 23:58:27,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=131656.0, ans=0.125 2024-09-22 23:58:36,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=131656.0, ans=0.0 2024-09-22 23:58:37,889 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.417e+02 1.553e+02 1.784e+02 2.618e+02, threshold=3.106e+02, percent-clipped=0.0 2024-09-22 23:58:41,090 INFO [train.py:1198] (0/4) Epoch 8, batch 950, loss[loss=0.3269, ctc_loss=0.2372, cr_loss=0.4487, over 17021.00 frames. ], tot_loss[loss=0.2688, ctc_loss=0.1903, cr_loss=0.3925, over 3321772.83 frames. ], batch size: 52, lr: 1.53e-02, grad_scale: 32.0 2024-09-22 23:58:43,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-22 23:58:47,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=131702.66666666666, ans=0.125 2024-09-22 23:58:47,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=131702.66666666666, ans=0.125 2024-09-22 23:59:01,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=131749.33333333334, ans=0.125 2024-09-22 23:59:09,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=131749.33333333334, ans=0.125 2024-09-22 23:59:34,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=131842.66666666666, ans=0.0 2024-09-23 00:00:04,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=131936.0, ans=0.2 2024-09-23 00:00:05,374 INFO [train.py:1198] (0/4) Epoch 8, batch 1000, loss[loss=0.2652, ctc_loss=0.1851, cr_loss=0.4006, over 17283.00 frames. ], tot_loss[loss=0.2695, ctc_loss=0.1908, cr_loss=0.3935, over 3325721.31 frames. ], batch size: 49, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:00:10,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=131936.0, ans=0.0 2024-09-23 00:00:37,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=132029.33333333334, ans=0.0 2024-09-23 00:00:40,853 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:00:57,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132076.0, ans=0.1 2024-09-23 00:01:12,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.96 vs. limit=22.5 2024-09-23 00:01:14,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=132122.66666666666, ans=0.1 2024-09-23 00:01:26,820 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.410e+02 1.544e+02 1.747e+02 2.558e+02, threshold=3.087e+02, percent-clipped=0.0 2024-09-23 00:01:29,993 INFO [train.py:1198] (0/4) Epoch 8, batch 1050, loss[loss=0.2279, ctc_loss=0.1593, cr_loss=0.3427, over 17255.00 frames. ], tot_loss[loss=0.2692, ctc_loss=0.1907, cr_loss=0.3929, over 3326925.97 frames. ], batch size: 44, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:01:41,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=132169.33333333334, ans=0.025 2024-09-23 00:02:02,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=132262.66666666666, ans=0.125 2024-09-23 00:02:08,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132262.66666666666, ans=0.1 2024-09-23 00:02:27,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=132309.33333333334, ans=0.0 2024-09-23 00:02:44,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=132356.0, ans=0.07 2024-09-23 00:02:46,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=132356.0, ans=0.125 2024-09-23 00:02:49,159 INFO [train.py:1198] (0/4) Epoch 8, batch 1100, loss[loss=0.2577, ctc_loss=0.1822, cr_loss=0.3773, over 17312.00 frames. ], tot_loss[loss=0.269, ctc_loss=0.1903, cr_loss=0.3933, over 3338476.68 frames. ], batch size: 46, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:03:02,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132402.66666666666, ans=0.125 2024-09-23 00:03:37,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-23 00:04:05,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=132589.33333333334, ans=0.0 2024-09-23 00:04:07,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=132589.33333333334, ans=0.125 2024-09-23 00:04:08,242 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.381e+02 1.495e+02 1.697e+02 2.213e+02, threshold=2.991e+02, percent-clipped=0.0 2024-09-23 00:04:11,439 INFO [train.py:1198] (0/4) Epoch 8, batch 1150, loss[loss=0.3165, ctc_loss=0.2277, cr_loss=0.4444, over 15894.00 frames. ], tot_loss[loss=0.2688, ctc_loss=0.1902, cr_loss=0.3932, over 3336697.36 frames. ], batch size: 74, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:04:27,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=132682.66666666666, ans=0.035 2024-09-23 00:05:07,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132776.0, ans=0.1 2024-09-23 00:05:33,725 INFO [train.py:1198] (0/4) Epoch 8, batch 1200, loss[loss=0.2725, ctc_loss=0.1968, cr_loss=0.3785, over 16744.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.1896, cr_loss=0.3922, over 3341659.32 frames. ], batch size: 61, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:05:38,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=132869.33333333334, ans=0.125 2024-09-23 00:05:58,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=132916.0, ans=0.0 2024-09-23 00:05:58,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=132916.0, ans=0.125 2024-09-23 00:06:07,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=132962.66666666666, ans=0.125 2024-09-23 00:06:08,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=132962.66666666666, ans=22.5 2024-09-23 00:06:09,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=22.5 2024-09-23 00:06:19,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-09-23 00:06:22,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=132962.66666666666, ans=0.2 2024-09-23 00:06:31,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=12.0 2024-09-23 00:06:32,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=133009.33333333334, ans=0.125 2024-09-23 00:06:41,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=133056.0, ans=10.0 2024-09-23 00:06:57,397 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.352e+02 1.465e+02 1.619e+02 2.564e+02, threshold=2.930e+02, percent-clipped=0.0 2024-09-23 00:06:58,991 INFO [train.py:1198] (0/4) Epoch 8, batch 1250, loss[loss=0.2397, ctc_loss=0.165, cr_loss=0.3737, over 17161.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1886, cr_loss=0.3899, over 3338143.40 frames. ], batch size: 45, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:07:04,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=133102.66666666666, ans=0.0 2024-09-23 00:07:12,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-23 00:07:24,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=133149.33333333334, ans=0.1 2024-09-23 00:07:41,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.46 vs. limit=10.0 2024-09-23 00:07:52,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-09-23 00:07:53,534 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:08:18,297 INFO [train.py:1198] (0/4) Epoch 8, batch 1300, loss[loss=0.2543, ctc_loss=0.1795, cr_loss=0.374, over 17156.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1878, cr_loss=0.3899, over 3354762.91 frames. ], batch size: 45, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:08:20,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-09-23 00:08:59,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=133429.33333333334, ans=10.0 2024-09-23 00:09:33,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=133522.66666666666, ans=0.125 2024-09-23 00:09:38,133 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.339e+02 1.481e+02 1.642e+02 2.207e+02, threshold=2.961e+02, percent-clipped=0.0 2024-09-23 00:09:39,798 INFO [train.py:1198] (0/4) Epoch 8, batch 1350, loss[loss=0.2588, ctc_loss=0.1808, cr_loss=0.3897, over 17104.00 frames. ], tot_loss[loss=0.2661, ctc_loss=0.1881, cr_loss=0.3902, over 3349984.68 frames. ], batch size: 49, lr: 1.52e-02, grad_scale: 32.0 2024-09-23 00:09:55,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=133569.33333333334, ans=0.0 2024-09-23 00:10:06,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=133616.0, ans=0.0 2024-09-23 00:10:12,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133662.66666666666, ans=0.1 2024-09-23 00:10:16,793 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2024-09-23 00:10:22,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=133662.66666666666, ans=0.125 2024-09-23 00:10:27,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=133662.66666666666, ans=0.0 2024-09-23 00:10:30,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=133709.33333333334, ans=0.0 2024-09-23 00:10:39,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=12.0 2024-09-23 00:10:47,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=133756.0, ans=0.125 2024-09-23 00:10:57,061 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-09-23 00:11:06,822 INFO [train.py:1198] (0/4) Epoch 8, batch 1400, loss[loss=0.2676, ctc_loss=0.1894, cr_loss=0.3914, over 17224.00 frames. ], tot_loss[loss=0.2657, ctc_loss=0.1878, cr_loss=0.3896, over 3348772.77 frames. ], batch size: 50, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:11:10,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=133802.66666666666, ans=0.0 2024-09-23 00:11:19,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.21 vs. limit=5.0 2024-09-23 00:12:08,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133989.33333333334, ans=0.1 2024-09-23 00:12:15,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=133989.33333333334, ans=0.125 2024-09-23 00:12:24,402 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.304e+02 1.414e+02 1.596e+02 2.535e+02, threshold=2.828e+02, percent-clipped=0.0 2024-09-23 00:12:25,994 INFO [train.py:1198] (0/4) Epoch 8, batch 1450, loss[loss=0.2554, ctc_loss=0.1823, cr_loss=0.3656, over 17084.00 frames. ], tot_loss[loss=0.2662, ctc_loss=0.188, cr_loss=0.3907, over 3358739.36 frames. ], batch size: 49, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:12:32,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=134036.0, ans=0.125 2024-09-23 00:13:32,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=134222.66666666666, ans=0.125 2024-09-23 00:13:48,035 INFO [train.py:1198] (0/4) Epoch 8, batch 1500, loss[loss=0.3081, ctc_loss=0.2247, cr_loss=0.4169, over 15044.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1885, cr_loss=0.3917, over 3354514.28 frames. ], batch size: 89, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:14:17,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-09-23 00:14:26,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=134362.66666666666, ans=0.0 2024-09-23 00:14:58,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2024-09-23 00:15:09,201 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.316e+02 1.415e+02 1.558e+02 2.036e+02, threshold=2.830e+02, percent-clipped=0.0 2024-09-23 00:15:10,822 INFO [train.py:1198] (0/4) Epoch 8, batch 1550, loss[loss=0.2847, ctc_loss=0.2017, cr_loss=0.4151, over 17036.00 frames. ], tot_loss[loss=0.2666, ctc_loss=0.1882, cr_loss=0.392, over 3357259.23 frames. ], batch size: 52, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:15:12,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=134502.66666666666, ans=0.0 2024-09-23 00:15:15,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=15.0 2024-09-23 00:16:15,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=134642.66666666666, ans=0.125 2024-09-23 00:16:20,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=134689.33333333334, ans=0.2 2024-09-23 00:16:27,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.64 vs. limit=10.0 2024-09-23 00:16:28,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2024-09-23 00:16:35,829 INFO [train.py:1198] (0/4) Epoch 8, batch 1600, loss[loss=0.2606, ctc_loss=0.1821, cr_loss=0.3925, over 17162.00 frames. ], tot_loss[loss=0.2681, ctc_loss=0.1895, cr_loss=0.3934, over 3348475.60 frames. ], batch size: 45, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:16:53,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=134782.66666666666, ans=0.0 2024-09-23 00:17:20,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=134829.33333333334, ans=0.2 2024-09-23 00:17:36,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=134876.0, ans=0.125 2024-09-23 00:17:36,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=134876.0, ans=0.125 2024-09-23 00:17:41,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=134922.66666666666, ans=0.2 2024-09-23 00:17:51,368 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:17:54,116 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.357e+02 1.515e+02 1.820e+02 2.997e+02, threshold=3.030e+02, percent-clipped=2.0 2024-09-23 00:17:55,798 INFO [train.py:1198] (0/4) Epoch 8, batch 1650, loss[loss=0.2993, ctc_loss=0.2138, cr_loss=0.4274, over 17031.00 frames. ], tot_loss[loss=0.2664, ctc_loss=0.1879, cr_loss=0.3923, over 3360918.80 frames. ], batch size: 52, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:18:10,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=135016.0, ans=0.0 2024-09-23 00:18:31,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=22.5 2024-09-23 00:18:35,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2024-09-23 00:18:38,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135062.66666666666, ans=0.1 2024-09-23 00:19:00,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=135156.0, ans=0.025 2024-09-23 00:19:17,539 INFO [train.py:1198] (0/4) Epoch 8, batch 1700, loss[loss=0.3166, ctc_loss=0.2259, cr_loss=0.4532, over 16512.00 frames. ], tot_loss[loss=0.2669, ctc_loss=0.1885, cr_loss=0.3923, over 3360122.69 frames. ], batch size: 66, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:19:33,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=135249.33333333334, ans=0.125 2024-09-23 00:20:20,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=135342.66666666666, ans=0.125 2024-09-23 00:20:38,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2024-09-23 00:20:40,292 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.382e+02 1.518e+02 1.704e+02 3.525e+02, threshold=3.037e+02, percent-clipped=1.0 2024-09-23 00:20:41,978 INFO [train.py:1198] (0/4) Epoch 8, batch 1750, loss[loss=0.2623, ctc_loss=0.1795, cr_loss=0.414, over 17067.00 frames. ], tot_loss[loss=0.266, ctc_loss=0.1877, cr_loss=0.3915, over 3364206.92 frames. ], batch size: 46, lr: 1.51e-02, grad_scale: 32.0 2024-09-23 00:21:02,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=135482.66666666666, ans=0.5 2024-09-23 00:21:03,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=135482.66666666666, ans=0.125 2024-09-23 00:21:22,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=135529.33333333334, ans=0.5 2024-09-23 00:21:32,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-23 00:21:48,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=135622.66666666666, ans=0.125 2024-09-23 00:21:59,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=135622.66666666666, ans=0.2 2024-09-23 00:22:03,866 INFO [train.py:1198] (0/4) Epoch 8, batch 1800, loss[loss=0.2723, ctc_loss=0.1879, cr_loss=0.422, over 17010.00 frames. ], tot_loss[loss=0.2653, ctc_loss=0.1873, cr_loss=0.3901, over 3362446.65 frames. ], batch size: 44, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:22:21,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135716.0, ans=0.1 2024-09-23 00:22:34,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=135762.66666666666, ans=0.09899494936611666 2024-09-23 00:22:39,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=135762.66666666666, ans=0.0 2024-09-23 00:22:50,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-09-23 00:23:09,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=135856.0, ans=0.125 2024-09-23 00:23:15,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=135856.0, ans=0.125 2024-09-23 00:23:21,451 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.352e+02 1.514e+02 1.696e+02 3.020e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 00:23:23,112 INFO [train.py:1198] (0/4) Epoch 8, batch 1850, loss[loss=0.2174, ctc_loss=0.1467, cr_loss=0.3536, over 17304.00 frames. ], tot_loss[loss=0.2655, ctc_loss=0.1875, cr_loss=0.3901, over 3357259.34 frames. ], batch size: 46, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:23:34,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135902.66666666666, ans=0.1 2024-09-23 00:23:40,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2024-09-23 00:24:12,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.57 vs. limit=10.0 2024-09-23 00:24:47,782 INFO [train.py:1198] (0/4) Epoch 8, batch 1900, loss[loss=0.3069, ctc_loss=0.2222, cr_loss=0.4237, over 17029.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1883, cr_loss=0.3902, over 3342239.08 frames. ], batch size: 52, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:24:57,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136136.0, ans=0.1 2024-09-23 00:25:19,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=136229.33333333334, ans=0.125 2024-09-23 00:26:11,059 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.199e+02 1.376e+02 1.564e+02 1.775e+02 2.559e+02, threshold=3.128e+02, percent-clipped=0.0 2024-09-23 00:26:12,647 INFO [train.py:1198] (0/4) Epoch 8, batch 1950, loss[loss=0.238, ctc_loss=0.1626, cr_loss=0.377, over 17093.00 frames. ], tot_loss[loss=0.2662, ctc_loss=0.1883, cr_loss=0.3896, over 3331826.48 frames. ], batch size: 43, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:26:51,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=136462.66666666666, ans=0.2 2024-09-23 00:26:59,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2024-09-23 00:27:02,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=136509.33333333334, ans=0.025 2024-09-23 00:27:05,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=136509.33333333334, ans=0.125 2024-09-23 00:27:32,279 INFO [train.py:1198] (0/4) Epoch 8, batch 2000, loss[loss=0.2988, ctc_loss=0.2137, cr_loss=0.4255, over 16940.00 frames. ], tot_loss[loss=0.2658, ctc_loss=0.188, cr_loss=0.3889, over 3343994.50 frames. ], batch size: 58, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:28:24,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-23 00:28:32,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=136742.66666666666, ans=0.0 2024-09-23 00:28:53,826 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.381e+02 1.478e+02 1.758e+02 2.844e+02, threshold=2.957e+02, percent-clipped=0.0 2024-09-23 00:28:54,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=136836.0, ans=0.0 2024-09-23 00:28:55,408 INFO [train.py:1198] (0/4) Epoch 8, batch 2050, loss[loss=0.3573, ctc_loss=0.2754, cr_loss=0.4093, over 11293.00 frames. ], tot_loss[loss=0.2668, ctc_loss=0.1889, cr_loss=0.3897, over 3328376.57 frames. ], batch size: 123, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:30:16,955 INFO [train.py:1198] (0/4) Epoch 8, batch 2100, loss[loss=0.2454, ctc_loss=0.1722, cr_loss=0.3659, over 17176.00 frames. ], tot_loss[loss=0.2671, ctc_loss=0.1889, cr_loss=0.3906, over 3341465.45 frames. ], batch size: 45, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:30:54,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=137162.66666666666, ans=0.0 2024-09-23 00:31:03,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=137162.66666666666, ans=0.05 2024-09-23 00:31:08,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137209.33333333334, ans=0.1 2024-09-23 00:31:35,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=137256.0, ans=0.0 2024-09-23 00:31:40,396 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.375e+02 1.520e+02 1.695e+02 2.810e+02, threshold=3.041e+02, percent-clipped=0.0 2024-09-23 00:31:41,917 INFO [train.py:1198] (0/4) Epoch 8, batch 2150, loss[loss=0.2591, ctc_loss=0.1833, cr_loss=0.3791, over 17024.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1882, cr_loss=0.3909, over 3351343.69 frames. ], batch size: 44, lr: 1.50e-02, grad_scale: 32.0 2024-09-23 00:31:59,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=137349.33333333334, ans=0.125 2024-09-23 00:32:30,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=137442.66666666666, ans=0.07 2024-09-23 00:32:38,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=137442.66666666666, ans=0.0 2024-09-23 00:32:49,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=137489.33333333334, ans=0.025 2024-09-23 00:33:01,806 INFO [train.py:1198] (0/4) Epoch 8, batch 2200, loss[loss=0.2842, ctc_loss=0.1996, cr_loss=0.4231, over 17027.00 frames. ], tot_loss[loss=0.2654, ctc_loss=0.1873, cr_loss=0.3905, over 3353027.45 frames. ], batch size: 52, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:33:10,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-09-23 00:33:11,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=137536.0, ans=0.0 2024-09-23 00:33:16,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=137582.66666666666, ans=0.1 2024-09-23 00:33:17,967 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:34:01,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=137676.0, ans=0.0 2024-09-23 00:34:09,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=137722.66666666666, ans=0.125 2024-09-23 00:34:21,856 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.429e+02 1.621e+02 1.891e+02 2.462e+02, threshold=3.242e+02, percent-clipped=0.0 2024-09-23 00:34:23,499 INFO [train.py:1198] (0/4) Epoch 8, batch 2250, loss[loss=0.2606, ctc_loss=0.1818, cr_loss=0.3937, over 17153.00 frames. ], tot_loss[loss=0.2659, ctc_loss=0.1876, cr_loss=0.3914, over 3351268.82 frames. ], batch size: 45, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:34:37,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=137769.33333333334, ans=0.0 2024-09-23 00:35:03,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=137862.66666666666, ans=0.2 2024-09-23 00:35:08,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2024-09-23 00:35:50,818 INFO [train.py:1198] (0/4) Epoch 8, batch 2300, loss[loss=0.2484, ctc_loss=0.1737, cr_loss=0.3734, over 17023.00 frames. ], tot_loss[loss=0.2651, ctc_loss=0.187, cr_loss=0.3908, over 3359801.95 frames. ], batch size: 44, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:35:55,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=138002.66666666666, ans=0.125 2024-09-23 00:36:24,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=138096.0, ans=0.125 2024-09-23 00:36:45,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=138142.66666666666, ans=0.09899494936611666 2024-09-23 00:36:45,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=138142.66666666666, ans=0.0 2024-09-23 00:36:53,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=138189.33333333334, ans=0.125 2024-09-23 00:36:53,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=138189.33333333334, ans=0.1 2024-09-23 00:37:01,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=138189.33333333334, ans=0.125 2024-09-23 00:37:09,110 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.383e+02 1.498e+02 1.682e+02 2.493e+02, threshold=2.995e+02, percent-clipped=0.0 2024-09-23 00:37:09,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.25 vs. limit=10.0 2024-09-23 00:37:10,796 INFO [train.py:1198] (0/4) Epoch 8, batch 2350, loss[loss=0.2755, ctc_loss=0.1935, cr_loss=0.4096, over 16426.00 frames. ], tot_loss[loss=0.265, ctc_loss=0.1869, cr_loss=0.3905, over 3360843.73 frames. ], batch size: 66, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:37:17,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=138236.0, ans=0.0 2024-09-23 00:37:40,158 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-09-23 00:38:26,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138422.66666666666, ans=0.1 2024-09-23 00:38:33,072 INFO [train.py:1198] (0/4) Epoch 8, batch 2400, loss[loss=0.3025, ctc_loss=0.2138, cr_loss=0.4435, over 16389.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.188, cr_loss=0.3915, over 3357014.00 frames. ], batch size: 66, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:38:41,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=138469.33333333334, ans=0.125 2024-09-23 00:39:11,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=138562.66666666666, ans=0.125 2024-09-23 00:39:53,485 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.369e+02 1.492e+02 1.774e+02 2.629e+02, threshold=2.985e+02, percent-clipped=0.0 2024-09-23 00:39:55,153 INFO [train.py:1198] (0/4) Epoch 8, batch 2450, loss[loss=0.2602, ctc_loss=0.182, cr_loss=0.3911, over 17291.00 frames. ], tot_loss[loss=0.2663, ctc_loss=0.1881, cr_loss=0.3912, over 3349571.44 frames. ], batch size: 51, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:39:58,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=138702.66666666666, ans=0.025 2024-09-23 00:40:19,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2024-09-23 00:40:25,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=138749.33333333334, ans=0.125 2024-09-23 00:40:37,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=138796.0, ans=0.0 2024-09-23 00:40:53,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=138842.66666666666, ans=0.125 2024-09-23 00:41:20,431 INFO [train.py:1198] (0/4) Epoch 8, batch 2500, loss[loss=0.256, ctc_loss=0.1781, cr_loss=0.3898, over 17305.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1868, cr_loss=0.3899, over 3354997.36 frames. ], batch size: 51, lr: 1.49e-02, grad_scale: 32.0 2024-09-23 00:41:21,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.08 vs. limit=10.0 2024-09-23 00:42:18,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-09-23 00:42:22,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=139122.66666666666, ans=0.035 2024-09-23 00:42:37,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=139122.66666666666, ans=0.0 2024-09-23 00:42:40,013 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.309e+02 1.454e+02 1.673e+02 2.570e+02, threshold=2.909e+02, percent-clipped=0.0 2024-09-23 00:42:40,038 INFO [train.py:1198] (0/4) Epoch 8, batch 2550, loss[loss=0.2502, ctc_loss=0.1738, cr_loss=0.3819, over 17141.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1858, cr_loss=0.3888, over 3356752.34 frames. ], batch size: 48, lr: 1.49e-02, grad_scale: 16.0 2024-09-23 00:43:05,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=139216.0, ans=0.2 2024-09-23 00:43:30,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=139309.33333333334, ans=0.125 2024-09-23 00:43:43,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=139309.33333333334, ans=0.025 2024-09-23 00:43:50,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2024-09-23 00:43:51,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=139356.0, ans=0.1 2024-09-23 00:44:02,588 INFO [train.py:1198] (0/4) Epoch 8, batch 2600, loss[loss=0.2976, ctc_loss=0.2081, cr_loss=0.4472, over 16530.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1861, cr_loss=0.389, over 3352985.62 frames. ], batch size: 66, lr: 1.48e-02, grad_scale: 16.0 2024-09-23 00:44:03,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=12.0 2024-09-23 00:44:09,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139402.66666666666, ans=0.1 2024-09-23 00:44:12,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=139402.66666666666, ans=0.125 2024-09-23 00:44:28,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=139449.33333333334, ans=0.0 2024-09-23 00:44:56,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=139542.66666666666, ans=0.0 2024-09-23 00:45:04,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139542.66666666666, ans=0.1 2024-09-23 00:45:13,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=139589.33333333334, ans=0.125 2024-09-23 00:45:14,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=12.0 2024-09-23 00:45:20,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=12.0 2024-09-23 00:45:27,488 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.349e+02 1.476e+02 1.822e+02 2.637e+02, threshold=2.952e+02, percent-clipped=0.0 2024-09-23 00:45:27,513 INFO [train.py:1198] (0/4) Epoch 8, batch 2650, loss[loss=0.2337, ctc_loss=0.1666, cr_loss=0.3357, over 16966.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1856, cr_loss=0.3888, over 3357140.84 frames. ], batch size: 42, lr: 1.48e-02, grad_scale: 16.0 2024-09-23 00:45:52,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=139682.66666666666, ans=0.95 2024-09-23 00:46:21,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=139776.0, ans=0.125 2024-09-23 00:46:49,544 INFO [train.py:1198] (0/4) Epoch 8, batch 2700, loss[loss=0.2785, ctc_loss=0.1931, cr_loss=0.4273, over 16988.00 frames. ], tot_loss[loss=0.2644, ctc_loss=0.1865, cr_loss=0.3896, over 3345284.04 frames. ], batch size: 53, lr: 1.48e-02, grad_scale: 16.0 2024-09-23 00:48:03,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=140056.0, ans=0.125 2024-09-23 00:48:11,967 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.374e+02 1.576e+02 1.752e+02 3.056e+02, threshold=3.153e+02, percent-clipped=1.0 2024-09-23 00:48:11,992 INFO [train.py:1198] (0/4) Epoch 8, batch 2750, loss[loss=0.2658, ctc_loss=0.1815, cr_loss=0.4216, over 17008.00 frames. ], tot_loss[loss=0.2645, ctc_loss=0.1866, cr_loss=0.3895, over 3349998.23 frames. ], batch size: 44, lr: 1.48e-02, grad_scale: 16.0 2024-09-23 00:48:38,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-09-23 00:48:50,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-09-23 00:48:55,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=140196.0, ans=0.125 2024-09-23 00:49:09,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140242.66666666666, ans=0.125 2024-09-23 00:49:24,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=140289.33333333334, ans=6.0 2024-09-23 00:49:29,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=140289.33333333334, ans=0.0 2024-09-23 00:49:34,156 INFO [train.py:1198] (0/4) Epoch 8, batch 2800, loss[loss=0.2738, ctc_loss=0.188, cr_loss=0.4291, over 17076.00 frames. ], tot_loss[loss=0.2649, ctc_loss=0.1868, cr_loss=0.3905, over 3353349.86 frames. ], batch size: 46, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:49:48,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=140382.66666666666, ans=0.125 2024-09-23 00:50:07,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2024-09-23 00:50:10,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=140429.33333333334, ans=0.125 2024-09-23 00:50:10,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=140429.33333333334, ans=0.125 2024-09-23 00:50:17,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2024-09-23 00:50:44,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140522.66666666666, ans=0.1 2024-09-23 00:50:58,773 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.406e+02 1.667e+02 1.997e+02 3.785e+02, threshold=3.334e+02, percent-clipped=1.0 2024-09-23 00:50:58,797 INFO [train.py:1198] (0/4) Epoch 8, batch 2850, loss[loss=0.297, ctc_loss=0.2121, cr_loss=0.4242, over 17072.00 frames. ], tot_loss[loss=0.2641, ctc_loss=0.1861, cr_loss=0.3899, over 3363143.97 frames. ], batch size: 52, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:51:20,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=140616.0, ans=0.0 2024-09-23 00:52:04,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=140756.0, ans=0.0 2024-09-23 00:52:18,842 INFO [train.py:1198] (0/4) Epoch 8, batch 2900, loss[loss=0.2878, ctc_loss=0.1998, cr_loss=0.4403, over 16880.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1856, cr_loss=0.3892, over 3360989.67 frames. ], batch size: 58, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:52:41,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=140849.33333333334, ans=0.125 2024-09-23 00:53:30,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=140989.33333333334, ans=0.125 2024-09-23 00:53:35,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=140989.33333333334, ans=0.125 2024-09-23 00:53:41,170 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.334e+02 1.488e+02 1.595e+02 2.196e+02, threshold=2.977e+02, percent-clipped=0.0 2024-09-23 00:53:41,194 INFO [train.py:1198] (0/4) Epoch 8, batch 2950, loss[loss=0.2884, ctc_loss=0.2056, cr_loss=0.4143, over 16544.00 frames. ], tot_loss[loss=0.2642, ctc_loss=0.186, cr_loss=0.3913, over 3369432.63 frames. ], batch size: 66, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:53:41,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=141036.0, ans=0.125 2024-09-23 00:53:46,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=141036.0, ans=0.5 2024-09-23 00:53:58,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141082.66666666666, ans=0.1 2024-09-23 00:54:07,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2024-09-23 00:54:08,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=141082.66666666666, ans=0.125 2024-09-23 00:54:14,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-09-23 00:54:31,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=141176.0, ans=0.05 2024-09-23 00:54:34,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141176.0, ans=0.1 2024-09-23 00:54:42,811 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 00:54:59,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=141222.66666666666, ans=0.04949747468305833 2024-09-23 00:55:02,758 INFO [train.py:1198] (0/4) Epoch 8, batch 3000, loss[loss=0.2496, ctc_loss=0.1726, cr_loss=0.3853, over 17310.00 frames. ], tot_loss[loss=0.2646, ctc_loss=0.1865, cr_loss=0.3906, over 3360815.35 frames. ], batch size: 46, lr: 1.48e-02, grad_scale: 32.0 2024-09-23 00:55:02,758 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 00:55:18,811 INFO [train.py:1230] (0/4) Epoch 8, validation: loss=0.05304, ctc_loss=0.05304, cr_loss=7.247e-15, over 944034.00 frames. 2024-09-23 00:55:18,812 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 00:55:20,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=141269.33333333334, ans=0.125 2024-09-23 00:55:39,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=22.5 2024-09-23 00:55:40,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=8.0 2024-09-23 00:55:48,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=141362.66666666666, ans=0.125 2024-09-23 00:55:49,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=141362.66666666666, ans=0.125 2024-09-23 00:55:49,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-23 00:56:00,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=141362.66666666666, ans=0.125 2024-09-23 00:56:22,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=141456.0, ans=0.025 2024-09-23 00:56:23,323 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-23 00:56:39,932 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.352e+02 1.499e+02 1.786e+02 3.736e+02, threshold=2.997e+02, percent-clipped=4.0 2024-09-23 00:56:39,957 INFO [train.py:1198] (0/4) Epoch 8, batch 3050, loss[loss=0.3005, ctc_loss=0.2145, cr_loss=0.4301, over 16063.00 frames. ], tot_loss[loss=0.2647, ctc_loss=0.1865, cr_loss=0.3908, over 3356383.53 frames. ], batch size: 74, lr: 1.47e-02, grad_scale: 32.0 2024-09-23 00:57:16,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=141596.0, ans=0.125 2024-09-23 00:57:19,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141596.0, ans=0.1 2024-09-23 00:57:23,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-09-23 00:57:29,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=141642.66666666666, ans=0.125 2024-09-23 00:57:37,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=141642.66666666666, ans=0.0 2024-09-23 00:57:45,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=141689.33333333334, ans=0.5 2024-09-23 00:57:58,666 INFO [train.py:1198] (0/4) Epoch 8, batch 3100, loss[loss=0.2761, ctc_loss=0.1959, cr_loss=0.401, over 17187.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1866, cr_loss=0.3908, over 3360914.38 frames. ], batch size: 55, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 00:58:19,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.99 vs. limit=10.0 2024-09-23 00:58:26,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=141782.66666666666, ans=0.0 2024-09-23 00:58:47,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141876.0, ans=0.1 2024-09-23 00:59:07,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=141922.66666666666, ans=0.125 2024-09-23 00:59:12,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=141922.66666666666, ans=0.0 2024-09-23 00:59:15,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=141969.33333333334, ans=0.2 2024-09-23 00:59:16,305 INFO [train.py:1198] (0/4) Epoch 8, batch 3150, loss[loss=0.213, ctc_loss=0.146, cr_loss=0.3352, over 17271.00 frames. ], tot_loss[loss=0.2629, ctc_loss=0.1851, cr_loss=0.3888, over 3367241.52 frames. ], batch size: 42, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 00:59:17,877 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.325e+02 1.474e+02 1.672e+02 3.223e+02, threshold=2.948e+02, percent-clipped=1.0 2024-09-23 00:59:19,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=141969.33333333334, ans=0.125 2024-09-23 00:59:44,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=142016.0, ans=0.0 2024-09-23 01:00:17,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=142156.0, ans=0.125 2024-09-23 01:00:17,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=142156.0, ans=0.025 2024-09-23 01:00:34,387 INFO [train.py:1198] (0/4) Epoch 8, batch 3200, loss[loss=0.219, ctc_loss=0.1518, cr_loss=0.3362, over 17172.00 frames. ], tot_loss[loss=0.2615, ctc_loss=0.1841, cr_loss=0.3869, over 3370134.05 frames. ], batch size: 41, lr: 1.47e-02, grad_scale: 32.0 2024-09-23 01:01:02,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142249.33333333334, ans=0.1 2024-09-23 01:01:21,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=142342.66666666666, ans=0.125 2024-09-23 01:01:40,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=142389.33333333334, ans=0.0 2024-09-23 01:01:46,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=142389.33333333334, ans=0.125 2024-09-23 01:01:54,766 INFO [train.py:1198] (0/4) Epoch 8, batch 3250, loss[loss=0.2551, ctc_loss=0.1819, cr_loss=0.3661, over 17022.00 frames. ], tot_loss[loss=0.2605, ctc_loss=0.1834, cr_loss=0.3854, over 3369474.96 frames. ], batch size: 44, lr: 1.47e-02, grad_scale: 32.0 2024-09-23 01:01:56,370 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.501e+02 1.666e+02 1.949e+02 2.835e+02, threshold=3.332e+02, percent-clipped=0.0 2024-09-23 01:01:56,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=142436.0, ans=0.125 2024-09-23 01:02:06,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.29 vs. limit=6.0 2024-09-23 01:02:09,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=142482.66666666666, ans=0.09899494936611666 2024-09-23 01:02:21,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=142482.66666666666, ans=0.0 2024-09-23 01:02:23,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=8.0 2024-09-23 01:02:24,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=12.0 2024-09-23 01:03:12,272 INFO [train.py:1198] (0/4) Epoch 8, batch 3300, loss[loss=0.3188, ctc_loss=0.2302, cr_loss=0.4432, over 16428.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1842, cr_loss=0.3871, over 3371080.31 frames. ], batch size: 66, lr: 1.47e-02, grad_scale: 32.0 2024-09-23 01:03:28,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-09-23 01:04:30,467 INFO [train.py:1198] (0/4) Epoch 8, batch 3350, loss[loss=0.2401, ctc_loss=0.1689, cr_loss=0.3558, over 16940.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1844, cr_loss=0.3878, over 3371678.43 frames. ], batch size: 42, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 01:04:31,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2024-09-23 01:04:33,563 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.197e+02 1.393e+02 1.569e+02 1.774e+02 3.394e+02, threshold=3.137e+02, percent-clipped=1.0 2024-09-23 01:04:40,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=142902.66666666666, ans=0.0 2024-09-23 01:04:42,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=142902.66666666666, ans=0.125 2024-09-23 01:04:45,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=142949.33333333334, ans=0.0 2024-09-23 01:04:55,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2024-09-23 01:05:02,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=142996.0, ans=0.025 2024-09-23 01:05:30,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=143042.66666666666, ans=0.125 2024-09-23 01:05:46,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=143089.33333333334, ans=0.125 2024-09-23 01:05:49,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=143089.33333333334, ans=0.025 2024-09-23 01:05:51,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=143136.0, ans=0.0 2024-09-23 01:05:52,734 INFO [train.py:1198] (0/4) Epoch 8, batch 3400, loss[loss=0.2735, ctc_loss=0.1925, cr_loss=0.4048, over 17149.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.184, cr_loss=0.3879, over 3380429.38 frames. ], batch size: 48, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 01:06:00,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=143136.0, ans=0.1 2024-09-23 01:06:03,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=143136.0, ans=0.125 2024-09-23 01:06:12,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=143182.66666666666, ans=0.125 2024-09-23 01:06:16,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=143182.66666666666, ans=0.125 2024-09-23 01:06:17,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=143182.66666666666, ans=0.1 2024-09-23 01:06:28,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=143229.33333333334, ans=0.2 2024-09-23 01:07:01,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143322.66666666666, ans=0.1 2024-09-23 01:07:12,539 INFO [train.py:1198] (0/4) Epoch 8, batch 3450, loss[loss=0.2572, ctc_loss=0.1826, cr_loss=0.3731, over 17229.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1849, cr_loss=0.3886, over 3384779.45 frames. ], batch size: 47, lr: 1.47e-02, grad_scale: 16.0 2024-09-23 01:07:15,659 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.386e+02 1.521e+02 1.778e+02 2.541e+02, threshold=3.041e+02, percent-clipped=0.0 2024-09-23 01:07:33,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=143416.0, ans=0.125 2024-09-23 01:07:36,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=143416.0, ans=0.125 2024-09-23 01:08:27,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=143556.0, ans=0.125 2024-09-23 01:08:30,382 INFO [train.py:1198] (0/4) Epoch 8, batch 3500, loss[loss=0.2733, ctc_loss=0.1917, cr_loss=0.408, over 17013.00 frames. ], tot_loss[loss=0.2638, ctc_loss=0.1857, cr_loss=0.3902, over 3382024.71 frames. ], batch size: 51, lr: 1.46e-02, grad_scale: 16.0 2024-09-23 01:08:33,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=143602.66666666666, ans=0.125 2024-09-23 01:09:02,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=143696.0, ans=0.2 2024-09-23 01:09:27,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=143742.66666666666, ans=15.0 2024-09-23 01:09:41,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=143789.33333333334, ans=0.125 2024-09-23 01:09:47,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=143836.0, ans=0.95 2024-09-23 01:09:48,668 INFO [train.py:1198] (0/4) Epoch 8, batch 3550, loss[loss=0.2858, ctc_loss=0.2021, cr_loss=0.4185, over 16915.00 frames. ], tot_loss[loss=0.2639, ctc_loss=0.1857, cr_loss=0.3906, over 3380607.55 frames. ], batch size: 58, lr: 1.46e-02, grad_scale: 16.0 2024-09-23 01:09:51,761 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.323e+02 1.423e+02 1.598e+02 2.580e+02, threshold=2.846e+02, percent-clipped=0.0 2024-09-23 01:10:46,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=143976.0, ans=22.5 2024-09-23 01:11:03,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-23 01:11:06,358 INFO [train.py:1198] (0/4) Epoch 8, batch 3600, loss[loss=0.2479, ctc_loss=0.1704, cr_loss=0.3873, over 17314.00 frames. ], tot_loss[loss=0.2631, ctc_loss=0.1852, cr_loss=0.3894, over 3372270.43 frames. ], batch size: 49, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:11:38,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=144162.66666666666, ans=0.035 2024-09-23 01:11:51,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=144162.66666666666, ans=0.025 2024-09-23 01:12:14,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=144256.0, ans=0.05 2024-09-23 01:12:27,263 INFO [train.py:1198] (0/4) Epoch 8, batch 3650, loss[loss=0.2289, ctc_loss=0.1613, cr_loss=0.3381, over 17032.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1847, cr_loss=0.3883, over 3374495.28 frames. ], batch size: 39, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:12:30,477 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.372e+02 1.519e+02 1.739e+02 2.361e+02, threshold=3.037e+02, percent-clipped=0.0 2024-09-23 01:12:33,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=144302.66666666666, ans=0.125 2024-09-23 01:12:53,965 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:13:10,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=144396.0, ans=0.0 2024-09-23 01:13:34,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=144489.33333333334, ans=0.125 2024-09-23 01:13:45,449 INFO [train.py:1198] (0/4) Epoch 8, batch 3700, loss[loss=0.2894, ctc_loss=0.205, cr_loss=0.4219, over 16985.00 frames. ], tot_loss[loss=0.2619, ctc_loss=0.1843, cr_loss=0.3877, over 3363414.03 frames. ], batch size: 53, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:14:05,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-09-23 01:14:10,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=144582.66666666666, ans=0.125 2024-09-23 01:14:10,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=144582.66666666666, ans=0.125 2024-09-23 01:15:03,784 INFO [train.py:1198] (0/4) Epoch 8, batch 3750, loss[loss=0.238, ctc_loss=0.1637, cr_loss=0.3715, over 17294.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1849, cr_loss=0.3878, over 3351298.16 frames. ], batch size: 46, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:15:07,863 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.371e+02 1.582e+02 1.824e+02 4.757e+02, threshold=3.165e+02, percent-clipped=1.0 2024-09-23 01:15:08,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=144769.33333333334, ans=0.125 2024-09-23 01:15:22,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=144816.0, ans=0.125 2024-09-23 01:15:25,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2024-09-23 01:15:31,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=144816.0, ans=0.125 2024-09-23 01:15:45,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-09-23 01:15:52,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2024-09-23 01:16:06,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=144956.0, ans=0.0 2024-09-23 01:16:23,491 INFO [train.py:1198] (0/4) Epoch 8, batch 3800, loss[loss=0.2429, ctc_loss=0.171, cr_loss=0.3595, over 16972.00 frames. ], tot_loss[loss=0.2649, ctc_loss=0.1869, cr_loss=0.39, over 3323120.19 frames. ], batch size: 42, lr: 1.46e-02, grad_scale: 32.0 2024-09-23 01:16:34,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145002.66666666666, ans=0.1 2024-09-23 01:16:36,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=145002.66666666666, ans=0.125 2024-09-23 01:17:13,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=145142.66666666666, ans=0.125 2024-09-23 01:17:19,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=145142.66666666666, ans=0.2 2024-09-23 01:17:27,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2024-09-23 01:17:41,086 INFO [train.py:1198] (0/4) Epoch 8, batch 3850, loss[loss=0.2437, ctc_loss=0.1735, cr_loss=0.3515, over 16969.00 frames. ], tot_loss[loss=0.2685, ctc_loss=0.1901, cr_loss=0.3925, over 3272380.73 frames. ], batch size: 42, lr: 1.46e-02, grad_scale: 16.0 2024-09-23 01:17:44,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=145236.0, ans=10.0 2024-09-23 01:17:45,606 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.439e+02 1.582e+02 1.881e+02 4.076e+02, threshold=3.165e+02, percent-clipped=1.0 2024-09-23 01:17:45,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=145236.0, ans=0.125 2024-09-23 01:17:55,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=145282.66666666666, ans=0.125 2024-09-23 01:18:51,273 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-8.pt 2024-09-23 01:19:42,987 INFO [train.py:1198] (0/4) Epoch 9, batch 0, loss[loss=0.2903, ctc_loss=0.2074, cr_loss=0.4147, over 17230.00 frames. ], tot_loss[loss=0.2903, ctc_loss=0.2074, cr_loss=0.4147, over 17230.00 frames. ], batch size: 55, lr: 1.38e-02, grad_scale: 32.0 2024-09-23 01:19:42,988 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 01:19:58,863 INFO [train.py:1230] (0/4) Epoch 9, validation: loss=0.05451, ctc_loss=0.05451, cr_loss=7.076e-15, over 944034.00 frames. 2024-09-23 01:19:58,863 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 01:20:23,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145497.33333333334, ans=0.1 2024-09-23 01:20:40,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=145544.0, ans=0.0 2024-09-23 01:20:42,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=145544.0, ans=0.07 2024-09-23 01:20:56,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=145590.66666666666, ans=0.0 2024-09-23 01:21:08,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=145637.33333333334, ans=0.2 2024-09-23 01:21:23,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=145637.33333333334, ans=0.05 2024-09-23 01:21:26,167 INFO [train.py:1198] (0/4) Epoch 9, batch 50, loss[loss=0.2714, ctc_loss=0.188, cr_loss=0.4165, over 17056.00 frames. ], tot_loss[loss=0.268, ctc_loss=0.189, cr_loss=0.3951, over 763708.33 frames. ], batch size: 52, lr: 1.38e-02, grad_scale: 32.0 2024-09-23 01:21:26,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=145684.0, ans=0.1 2024-09-23 01:21:37,361 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.229e+02 1.512e+02 1.709e+02 2.026e+02 3.260e+02, threshold=3.417e+02, percent-clipped=2.0 2024-09-23 01:21:50,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145730.66666666666, ans=0.1 2024-09-23 01:22:06,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=145777.33333333334, ans=0.125 2024-09-23 01:22:11,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=145777.33333333334, ans=0.0 2024-09-23 01:22:28,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=145870.66666666666, ans=0.025 2024-09-23 01:22:47,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=145917.33333333334, ans=0.0 2024-09-23 01:22:48,405 INFO [train.py:1198] (0/4) Epoch 9, batch 100, loss[loss=0.2196, ctc_loss=0.1515, cr_loss=0.3407, over 17179.00 frames. ], tot_loss[loss=0.2648, ctc_loss=0.1861, cr_loss=0.3932, over 1338475.89 frames. ], batch size: 41, lr: 1.38e-02, grad_scale: 32.0 2024-09-23 01:23:02,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-23 01:23:12,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.05 vs. limit=15.0 2024-09-23 01:23:15,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=145964.0, ans=0.125 2024-09-23 01:23:19,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=146010.66666666666, ans=0.95 2024-09-23 01:23:57,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146104.0, ans=0.1 2024-09-23 01:24:08,183 INFO [train.py:1198] (0/4) Epoch 9, batch 150, loss[loss=0.2413, ctc_loss=0.1681, cr_loss=0.3659, over 17269.00 frames. ], tot_loss[loss=0.2633, ctc_loss=0.1847, cr_loss=0.393, over 1787861.48 frames. ], batch size: 44, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:24:19,530 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.178e+02 1.303e+02 1.427e+02 1.679e+02 2.380e+02, threshold=2.853e+02, percent-clipped=0.0 2024-09-23 01:24:29,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=146197.33333333334, ans=0.2 2024-09-23 01:24:53,192 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:25:03,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=146290.66666666666, ans=0.2 2024-09-23 01:25:08,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=146290.66666666666, ans=0.025 2024-09-23 01:25:22,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=146337.33333333334, ans=0.0 2024-09-23 01:25:33,202 INFO [train.py:1198] (0/4) Epoch 9, batch 200, loss[loss=0.275, ctc_loss=0.1917, cr_loss=0.4163, over 16969.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1836, cr_loss=0.391, over 2138240.52 frames. ], batch size: 53, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:25:50,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=146430.66666666666, ans=0.2 2024-09-23 01:26:32,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2024-09-23 01:26:38,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=146570.66666666666, ans=0.125 2024-09-23 01:26:53,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=146570.66666666666, ans=0.125 2024-09-23 01:26:53,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=1.97 vs. limit=15.0 2024-09-23 01:26:55,832 INFO [train.py:1198] (0/4) Epoch 9, batch 250, loss[loss=0.2418, ctc_loss=0.1675, cr_loss=0.3714, over 17059.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1822, cr_loss=0.3882, over 2412314.16 frames. ], batch size: 46, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:26:58,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2024-09-23 01:27:05,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=146617.33333333334, ans=0.125 2024-09-23 01:27:06,849 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.339e+02 1.502e+02 1.713e+02 2.875e+02, threshold=3.003e+02, percent-clipped=1.0 2024-09-23 01:27:22,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=146664.0, ans=0.125 2024-09-23 01:27:25,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=146710.66666666666, ans=0.125 2024-09-23 01:27:42,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=146710.66666666666, ans=0.125 2024-09-23 01:27:50,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=146757.33333333334, ans=0.125 2024-09-23 01:27:52,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=146757.33333333334, ans=0.025 2024-09-23 01:27:52,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=146757.33333333334, ans=0.1 2024-09-23 01:28:06,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=146804.0, ans=0.125 2024-09-23 01:28:17,489 INFO [train.py:1198] (0/4) Epoch 9, batch 300, loss[loss=0.2907, ctc_loss=0.2027, cr_loss=0.4402, over 17069.00 frames. ], tot_loss[loss=0.2611, ctc_loss=0.1831, cr_loss=0.3899, over 2627230.36 frames. ], batch size: 46, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:28:19,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=146850.66666666666, ans=0.2 2024-09-23 01:28:20,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=22.5 2024-09-23 01:28:35,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=146897.33333333334, ans=0.1 2024-09-23 01:28:39,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2024-09-23 01:29:12,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=146990.66666666666, ans=0.0 2024-09-23 01:29:37,417 INFO [train.py:1198] (0/4) Epoch 9, batch 350, loss[loss=0.278, ctc_loss=0.197, cr_loss=0.405, over 17234.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1822, cr_loss=0.3885, over 2796548.86 frames. ], batch size: 50, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:29:40,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=147084.0, ans=0.2 2024-09-23 01:29:48,845 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.338e+02 1.473e+02 1.743e+02 2.948e+02, threshold=2.946e+02, percent-clipped=0.0 2024-09-23 01:30:24,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=147177.33333333334, ans=0.2 2024-09-23 01:30:37,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=147224.0, ans=0.125 2024-09-23 01:30:39,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=147224.0, ans=0.025 2024-09-23 01:30:43,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=147224.0, ans=0.025 2024-09-23 01:31:02,655 INFO [train.py:1198] (0/4) Epoch 9, batch 400, loss[loss=0.244, ctc_loss=0.1715, cr_loss=0.3628, over 17257.00 frames. ], tot_loss[loss=0.2616, ctc_loss=0.1837, cr_loss=0.3896, over 2920901.41 frames. ], batch size: 44, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:31:02,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=147317.33333333334, ans=0.125 2024-09-23 01:31:09,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=147317.33333333334, ans=0.025 2024-09-23 01:31:23,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=147364.0, ans=0.125 2024-09-23 01:31:34,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=147364.0, ans=0.125 2024-09-23 01:31:37,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=147410.66666666666, ans=0.125 2024-09-23 01:31:39,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=147410.66666666666, ans=0.0 2024-09-23 01:31:42,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-23 01:31:45,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=147410.66666666666, ans=0.2 2024-09-23 01:32:01,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=22.5 2024-09-23 01:32:24,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.11 vs. limit=6.0 2024-09-23 01:32:25,079 INFO [train.py:1198] (0/4) Epoch 9, batch 450, loss[loss=0.248, ctc_loss=0.1735, cr_loss=0.3722, over 17350.00 frames. ], tot_loss[loss=0.2622, ctc_loss=0.1842, cr_loss=0.3899, over 3011563.78 frames. ], batch size: 48, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:32:37,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147550.66666666666, ans=0.1 2024-09-23 01:32:38,952 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.318e+02 1.484e+02 1.790e+02 2.979e+02, threshold=2.969e+02, percent-clipped=1.0 2024-09-23 01:32:48,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=147597.33333333334, ans=0.0 2024-09-23 01:32:52,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-23 01:33:08,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.63 vs. limit=15.0 2024-09-23 01:33:33,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=147737.33333333334, ans=0.0 2024-09-23 01:33:38,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=147737.33333333334, ans=15.0 2024-09-23 01:33:41,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=147737.33333333334, ans=0.125 2024-09-23 01:33:47,560 INFO [train.py:1198] (0/4) Epoch 9, batch 500, loss[loss=0.2732, ctc_loss=0.1919, cr_loss=0.4066, over 17037.00 frames. ], tot_loss[loss=0.2621, ctc_loss=0.1842, cr_loss=0.39, over 3085865.72 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:33:51,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=147784.0, ans=0.125 2024-09-23 01:34:03,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=147830.66666666666, ans=0.0 2024-09-23 01:34:18,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.07 vs. limit=10.0 2024-09-23 01:34:45,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=147924.0, ans=0.0 2024-09-23 01:34:45,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2024-09-23 01:34:46,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=147924.0, ans=0.125 2024-09-23 01:34:48,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=147924.0, ans=0.125 2024-09-23 01:35:09,828 INFO [train.py:1198] (0/4) Epoch 9, batch 550, loss[loss=0.2294, ctc_loss=0.1554, cr_loss=0.37, over 17121.00 frames. ], tot_loss[loss=0.2617, ctc_loss=0.1838, cr_loss=0.3897, over 3146573.71 frames. ], batch size: 40, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:35:23,367 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.335e+02 1.443e+02 1.626e+02 2.688e+02, threshold=2.885e+02, percent-clipped=0.0 2024-09-23 01:36:34,063 INFO [train.py:1198] (0/4) Epoch 9, batch 600, loss[loss=0.2213, ctc_loss=0.1486, cr_loss=0.3635, over 16962.00 frames. ], tot_loss[loss=0.2625, ctc_loss=0.1844, cr_loss=0.3907, over 3181273.60 frames. ], batch size: 42, lr: 1.37e-02, grad_scale: 32.0 2024-09-23 01:36:50,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=148297.33333333334, ans=0.0 2024-09-23 01:37:23,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=148390.66666666666, ans=0.0 2024-09-23 01:37:27,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=148390.66666666666, ans=0.125 2024-09-23 01:37:52,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=148437.33333333334, ans=0.0 2024-09-23 01:37:56,745 INFO [train.py:1198] (0/4) Epoch 9, batch 650, loss[loss=0.2879, ctc_loss=0.2041, cr_loss=0.4193, over 17004.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1843, cr_loss=0.3904, over 3212169.38 frames. ], batch size: 53, lr: 1.36e-02, grad_scale: 16.0 2024-09-23 01:38:09,415 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.361e+02 1.468e+02 1.616e+02 2.267e+02, threshold=2.935e+02, percent-clipped=0.0 2024-09-23 01:38:30,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-09-23 01:38:35,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=148577.33333333334, ans=0.0 2024-09-23 01:38:42,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=148624.0, ans=0.125 2024-09-23 01:38:49,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=148624.0, ans=0.125 2024-09-23 01:39:15,866 INFO [train.py:1198] (0/4) Epoch 9, batch 700, loss[loss=0.2933, ctc_loss=0.2078, cr_loss=0.4278, over 17011.00 frames. ], tot_loss[loss=0.2623, ctc_loss=0.1843, cr_loss=0.39, over 3245146.53 frames. ], batch size: 52, lr: 1.36e-02, grad_scale: 16.0 2024-09-23 01:39:37,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2024-09-23 01:39:41,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=148764.0, ans=0.125 2024-09-23 01:40:01,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=148810.66666666666, ans=0.0 2024-09-23 01:40:14,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=148857.33333333334, ans=0.125 2024-09-23 01:40:40,042 INFO [train.py:1198] (0/4) Epoch 9, batch 750, loss[loss=0.2706, ctc_loss=0.1905, cr_loss=0.4007, over 17065.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1847, cr_loss=0.3897, over 3255813.08 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 16.0 2024-09-23 01:40:52,737 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.365e+02 1.514e+02 1.779e+02 2.634e+02, threshold=3.027e+02, percent-clipped=0.0 2024-09-23 01:41:24,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149044.0, ans=0.1 2024-09-23 01:41:55,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=12.0 2024-09-23 01:42:02,476 INFO [train.py:1198] (0/4) Epoch 9, batch 800, loss[loss=0.259, ctc_loss=0.1781, cr_loss=0.4044, over 17146.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.184, cr_loss=0.3894, over 3277272.83 frames. ], batch size: 48, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:42:20,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149230.66666666666, ans=0.1 2024-09-23 01:42:42,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=149277.33333333334, ans=0.125 2024-09-23 01:42:55,124 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-32000.pt 2024-09-23 01:43:26,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2024-09-23 01:43:27,284 INFO [train.py:1198] (0/4) Epoch 9, batch 850, loss[loss=0.2663, ctc_loss=0.1851, cr_loss=0.4062, over 16023.00 frames. ], tot_loss[loss=0.2628, ctc_loss=0.1847, cr_loss=0.3906, over 3289002.48 frames. ], batch size: 74, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:43:27,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.08 vs. limit=10.0 2024-09-23 01:43:32,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=149417.33333333334, ans=0.125 2024-09-23 01:43:39,941 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.394e+02 1.570e+02 1.849e+02 3.011e+02, threshold=3.140e+02, percent-clipped=0.0 2024-09-23 01:43:53,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2024-09-23 01:44:13,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2024-09-23 01:44:18,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=149557.33333333334, ans=0.025 2024-09-23 01:44:40,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=22.5 2024-09-23 01:44:49,132 INFO [train.py:1198] (0/4) Epoch 9, batch 900, loss[loss=0.2069, ctc_loss=0.1413, cr_loss=0.3282, over 16984.00 frames. ], tot_loss[loss=0.2624, ctc_loss=0.1844, cr_loss=0.39, over 3302424.34 frames. ], batch size: 42, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:45:05,129 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:45:06,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=149697.33333333334, ans=0.125 2024-09-23 01:45:13,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=149697.33333333334, ans=0.07 2024-09-23 01:45:41,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149790.66666666666, ans=0.1 2024-09-23 01:45:41,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149790.66666666666, ans=0.0 2024-09-23 01:45:42,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=22.5 2024-09-23 01:46:05,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=149837.33333333334, ans=0.2 2024-09-23 01:46:14,334 INFO [train.py:1198] (0/4) Epoch 9, batch 950, loss[loss=0.2659, ctc_loss=0.1879, cr_loss=0.3899, over 17101.00 frames. ], tot_loss[loss=0.263, ctc_loss=0.1849, cr_loss=0.3906, over 3312392.70 frames. ], batch size: 49, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:46:24,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=149884.0, ans=12.0 2024-09-23 01:46:27,021 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.338e+02 1.417e+02 1.570e+02 2.385e+02, threshold=2.834e+02, percent-clipped=0.0 2024-09-23 01:46:49,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=149977.33333333334, ans=0.125 2024-09-23 01:47:15,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=150024.0, ans=0.125 2024-09-23 01:47:37,173 INFO [train.py:1198] (0/4) Epoch 9, batch 1000, loss[loss=0.2519, ctc_loss=0.1764, cr_loss=0.3775, over 17319.00 frames. ], tot_loss[loss=0.2635, ctc_loss=0.1852, cr_loss=0.3913, over 3321736.07 frames. ], batch size: 51, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:48:09,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=150210.66666666666, ans=0.0 2024-09-23 01:48:17,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=150210.66666666666, ans=0.125 2024-09-23 01:48:17,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=150210.66666666666, ans=0.0 2024-09-23 01:48:44,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150304.0, ans=0.1 2024-09-23 01:48:57,105 INFO [train.py:1198] (0/4) Epoch 9, batch 1050, loss[loss=0.2144, ctc_loss=0.1476, cr_loss=0.3343, over 17021.00 frames. ], tot_loss[loss=0.2626, ctc_loss=0.1844, cr_loss=0.3911, over 3329051.94 frames. ], batch size: 39, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:49:02,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=150350.66666666666, ans=0.2 2024-09-23 01:49:07,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2024-09-23 01:49:09,787 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.329e+02 1.425e+02 1.705e+02 2.859e+02, threshold=2.851e+02, percent-clipped=1.0 2024-09-23 01:49:53,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-09-23 01:49:56,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=150490.66666666666, ans=0.125 2024-09-23 01:50:22,530 INFO [train.py:1198] (0/4) Epoch 9, batch 1100, loss[loss=0.2929, ctc_loss=0.2087, cr_loss=0.421, over 15103.00 frames. ], tot_loss[loss=0.261, ctc_loss=0.1832, cr_loss=0.3891, over 3327618.91 frames. ], batch size: 89, lr: 1.36e-02, grad_scale: 32.0 2024-09-23 01:50:26,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=150584.0, ans=0.2 2024-09-23 01:50:32,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150584.0, ans=0.1 2024-09-23 01:50:48,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=150630.66666666666, ans=0.125 2024-09-23 01:51:11,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=150724.0, ans=0.125 2024-09-23 01:51:15,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=150724.0, ans=0.0 2024-09-23 01:51:45,050 INFO [train.py:1198] (0/4) Epoch 9, batch 1150, loss[loss=0.2627, ctc_loss=0.183, cr_loss=0.3981, over 17269.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1833, cr_loss=0.3896, over 3328169.02 frames. ], batch size: 44, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:51:53,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.67 vs. limit=10.0 2024-09-23 01:51:54,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=150817.33333333334, ans=0.125 2024-09-23 01:51:57,742 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.320e+02 1.454e+02 1.675e+02 2.569e+02, threshold=2.907e+02, percent-clipped=0.0 2024-09-23 01:51:59,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150864.0, ans=0.1 2024-09-23 01:51:59,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=150864.0, ans=0.025 2024-09-23 01:52:08,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=22.5 2024-09-23 01:52:09,268 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 01:52:17,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=150910.66666666666, ans=0.125 2024-09-23 01:52:23,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=150910.66666666666, ans=0.0 2024-09-23 01:52:23,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2024-09-23 01:52:41,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=150957.33333333334, ans=0.02 2024-09-23 01:52:43,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150957.33333333334, ans=0.1 2024-09-23 01:52:49,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=151004.0, ans=0.125 2024-09-23 01:52:52,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2024-09-23 01:53:04,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=151004.0, ans=0.025 2024-09-23 01:53:07,105 INFO [train.py:1198] (0/4) Epoch 9, batch 1200, loss[loss=0.2172, ctc_loss=0.147, cr_loss=0.3509, over 17300.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1819, cr_loss=0.3879, over 3334001.61 frames. ], batch size: 42, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:53:29,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=151097.33333333334, ans=0.1 2024-09-23 01:54:10,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=151237.33333333334, ans=0.0 2024-09-23 01:54:27,530 INFO [train.py:1198] (0/4) Epoch 9, batch 1250, loss[loss=0.2851, ctc_loss=0.2, cr_loss=0.4255, over 16988.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1802, cr_loss=0.3857, over 3345656.67 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:54:42,634 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.349e+02 1.478e+02 1.607e+02 2.970e+02, threshold=2.955e+02, percent-clipped=1.0 2024-09-23 01:54:49,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=151330.66666666666, ans=0.5 2024-09-23 01:55:01,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=151330.66666666666, ans=0.0 2024-09-23 01:55:12,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=151377.33333333334, ans=0.125 2024-09-23 01:55:12,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-09-23 01:55:25,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=12.0 2024-09-23 01:55:40,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=151470.66666666666, ans=0.0 2024-09-23 01:55:51,560 INFO [train.py:1198] (0/4) Epoch 9, batch 1300, loss[loss=0.2451, ctc_loss=0.1694, cr_loss=0.3785, over 17210.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1812, cr_loss=0.3872, over 3347242.51 frames. ], batch size: 47, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:55:58,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2024-09-23 01:56:04,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151517.33333333334, ans=0.1 2024-09-23 01:56:10,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=151564.0, ans=0.025 2024-09-23 01:56:21,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=151564.0, ans=0.0 2024-09-23 01:56:44,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=151657.33333333334, ans=0.07 2024-09-23 01:56:45,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=151657.33333333334, ans=0.025 2024-09-23 01:56:50,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=151657.33333333334, ans=0.0 2024-09-23 01:56:52,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=151657.33333333334, ans=0.125 2024-09-23 01:57:12,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=151750.66666666666, ans=0.125 2024-09-23 01:57:13,856 INFO [train.py:1198] (0/4) Epoch 9, batch 1350, loss[loss=0.2268, ctc_loss=0.1572, cr_loss=0.3477, over 17261.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1817, cr_loss=0.3879, over 3352554.27 frames. ], batch size: 42, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:57:29,084 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.345e+02 1.485e+02 1.651e+02 2.569e+02, threshold=2.970e+02, percent-clipped=0.0 2024-09-23 01:58:09,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=12.0 2024-09-23 01:58:16,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=15.0 2024-09-23 01:58:35,902 INFO [train.py:1198] (0/4) Epoch 9, batch 1400, loss[loss=0.2576, ctc_loss=0.1757, cr_loss=0.4093, over 17130.00 frames. ], tot_loss[loss=0.26, ctc_loss=0.1823, cr_loss=0.3885, over 3351851.58 frames. ], batch size: 48, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 01:59:11,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=152077.33333333334, ans=0.1 2024-09-23 01:59:19,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=152077.33333333334, ans=0.125 2024-09-23 01:59:24,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=152124.0, ans=0.125 2024-09-23 02:00:01,052 INFO [train.py:1198] (0/4) Epoch 9, batch 1450, loss[loss=0.311, ctc_loss=0.2234, cr_loss=0.4378, over 15126.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1818, cr_loss=0.3883, over 3363285.63 frames. ], batch size: 89, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 02:00:01,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=152217.33333333334, ans=0.125 2024-09-23 02:00:04,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=152217.33333333334, ans=0.0 2024-09-23 02:00:06,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152217.33333333334, ans=0.1 2024-09-23 02:00:12,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=152217.33333333334, ans=0.125 2024-09-23 02:00:13,888 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.414e+02 1.556e+02 1.752e+02 2.841e+02, threshold=3.113e+02, percent-clipped=0.0 2024-09-23 02:00:14,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=152217.33333333334, ans=0.0 2024-09-23 02:00:16,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.78 vs. limit=15.0 2024-09-23 02:00:22,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=152264.0, ans=0.2 2024-09-23 02:00:50,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=152357.33333333334, ans=0.125 2024-09-23 02:00:50,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=152357.33333333334, ans=0.125 2024-09-23 02:01:14,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=152404.0, ans=0.0 2024-09-23 02:01:23,509 INFO [train.py:1198] (0/4) Epoch 9, batch 1500, loss[loss=0.2434, ctc_loss=0.1686, cr_loss=0.374, over 17223.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1817, cr_loss=0.388, over 3359068.48 frames. ], batch size: 47, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 02:01:28,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=152450.66666666666, ans=0.125 2024-09-23 02:01:32,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2024-09-23 02:02:01,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=152544.0, ans=0.95 2024-09-23 02:02:19,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-09-23 02:02:26,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=152590.66666666666, ans=0.025 2024-09-23 02:02:26,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=152590.66666666666, ans=0.2 2024-09-23 02:02:30,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.54 vs. limit=12.0 2024-09-23 02:02:45,184 INFO [train.py:1198] (0/4) Epoch 9, batch 1550, loss[loss=0.2222, ctc_loss=0.1536, cr_loss=0.3427, over 17304.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1817, cr_loss=0.388, over 3355495.23 frames. ], batch size: 46, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 02:02:45,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=152684.0, ans=0.125 2024-09-23 02:02:47,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=12.0 2024-09-23 02:02:57,972 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.340e+02 1.461e+02 1.652e+02 2.342e+02, threshold=2.922e+02, percent-clipped=0.0 2024-09-23 02:03:01,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=152730.66666666666, ans=0.2 2024-09-23 02:03:04,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=152730.66666666666, ans=0.0 2024-09-23 02:03:10,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=152730.66666666666, ans=0.125 2024-09-23 02:03:22,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.01 vs. limit=15.0 2024-09-23 02:03:57,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=152870.66666666666, ans=0.125 2024-09-23 02:04:04,957 INFO [train.py:1198] (0/4) Epoch 9, batch 1600, loss[loss=0.2267, ctc_loss=0.1556, cr_loss=0.3554, over 17159.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1806, cr_loss=0.386, over 3363560.14 frames. ], batch size: 45, lr: 1.35e-02, grad_scale: 32.0 2024-09-23 02:04:41,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=153010.66666666666, ans=0.0 2024-09-23 02:04:48,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.45 vs. limit=10.0 2024-09-23 02:05:21,086 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:05:22,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=153104.0, ans=0.2 2024-09-23 02:05:30,268 INFO [train.py:1198] (0/4) Epoch 9, batch 1650, loss[loss=0.2546, ctc_loss=0.1785, cr_loss=0.3803, over 17305.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1808, cr_loss=0.3865, over 3371807.42 frames. ], batch size: 46, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:05:42,928 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.315e+02 1.418e+02 1.618e+02 2.447e+02, threshold=2.836e+02, percent-clipped=0.0 2024-09-23 02:06:09,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=153244.0, ans=0.125 2024-09-23 02:06:11,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=153244.0, ans=0.125 2024-09-23 02:06:31,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=12.0 2024-09-23 02:06:52,606 INFO [train.py:1198] (0/4) Epoch 9, batch 1700, loss[loss=0.2296, ctc_loss=0.1613, cr_loss=0.3414, over 17145.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1814, cr_loss=0.388, over 3374550.96 frames. ], batch size: 45, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:06:53,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2024-09-23 02:07:08,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=153430.66666666666, ans=0.0 2024-09-23 02:07:17,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=153430.66666666666, ans=0.125 2024-09-23 02:07:27,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=153477.33333333334, ans=0.1 2024-09-23 02:07:44,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=153524.0, ans=0.015 2024-09-23 02:08:13,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=153617.33333333334, ans=0.0 2024-09-23 02:08:14,978 INFO [train.py:1198] (0/4) Epoch 9, batch 1750, loss[loss=0.2827, ctc_loss=0.1988, cr_loss=0.4196, over 15122.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1803, cr_loss=0.3871, over 3378573.95 frames. ], batch size: 89, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:08:27,631 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.306e+02 1.435e+02 1.601e+02 2.532e+02, threshold=2.871e+02, percent-clipped=0.0 2024-09-23 02:08:47,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.30 vs. limit=15.0 2024-09-23 02:08:55,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=153710.66666666666, ans=0.1 2024-09-23 02:09:07,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2024-09-23 02:09:13,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2024-09-23 02:09:37,155 INFO [train.py:1198] (0/4) Epoch 9, batch 1800, loss[loss=0.2626, ctc_loss=0.1854, cr_loss=0.386, over 17165.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1812, cr_loss=0.3879, over 3368165.23 frames. ], batch size: 45, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:09:42,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=153850.66666666666, ans=0.02 2024-09-23 02:10:13,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=153944.0, ans=0.2 2024-09-23 02:10:15,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=153944.0, ans=0.125 2024-09-23 02:10:20,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=153944.0, ans=0.125 2024-09-23 02:10:31,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=153990.66666666666, ans=0.2 2024-09-23 02:10:44,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=154037.33333333334, ans=0.0 2024-09-23 02:11:02,434 INFO [train.py:1198] (0/4) Epoch 9, batch 1850, loss[loss=0.2654, ctc_loss=0.1859, cr_loss=0.3974, over 14871.00 frames. ], tot_loss[loss=0.2595, ctc_loss=0.1817, cr_loss=0.3891, over 3372720.41 frames. ], batch size: 89, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:11:15,289 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.311e+02 1.455e+02 1.599e+02 2.363e+02, threshold=2.909e+02, percent-clipped=0.0 2024-09-23 02:11:21,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-23 02:11:33,179 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:11:35,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2024-09-23 02:11:37,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=154177.33333333334, ans=0.0 2024-09-23 02:11:37,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=154177.33333333334, ans=0.125 2024-09-23 02:12:24,533 INFO [train.py:1198] (0/4) Epoch 9, batch 1900, loss[loss=0.2423, ctc_loss=0.1737, cr_loss=0.3433, over 17018.00 frames. ], tot_loss[loss=0.2599, ctc_loss=0.1819, cr_loss=0.3896, over 3373753.11 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:13:04,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=154410.66666666666, ans=0.125 2024-09-23 02:13:04,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=154410.66666666666, ans=0.0 2024-09-23 02:13:26,637 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:13:43,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2024-09-23 02:13:43,698 INFO [train.py:1198] (0/4) Epoch 9, batch 1950, loss[loss=0.2688, ctc_loss=0.1871, cr_loss=0.4089, over 17240.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1812, cr_loss=0.3881, over 3377703.76 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:13:53,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=154550.66666666666, ans=0.0 2024-09-23 02:13:56,415 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.387e+02 1.540e+02 1.769e+02 3.177e+02, threshold=3.081e+02, percent-clipped=1.0 2024-09-23 02:14:21,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=154644.0, ans=0.2 2024-09-23 02:14:24,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=154644.0, ans=0.125 2024-09-23 02:14:47,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=154690.66666666666, ans=0.125 2024-09-23 02:14:48,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-09-23 02:14:49,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2024-09-23 02:14:58,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=154737.33333333334, ans=0.125 2024-09-23 02:15:00,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2024-09-23 02:15:09,133 INFO [train.py:1198] (0/4) Epoch 9, batch 2000, loss[loss=0.254, ctc_loss=0.1777, cr_loss=0.3815, over 16872.00 frames. ], tot_loss[loss=0.2589, ctc_loss=0.1812, cr_loss=0.3883, over 3373765.68 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:15:09,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=154784.0, ans=0.2 2024-09-23 02:15:09,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=154784.0, ans=0.125 2024-09-23 02:15:36,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=154830.66666666666, ans=0.0 2024-09-23 02:15:48,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=154877.33333333334, ans=0.125 2024-09-23 02:15:58,097 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:15:59,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=154924.0, ans=0.1 2024-09-23 02:16:01,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=154924.0, ans=0.1 2024-09-23 02:16:31,231 INFO [train.py:1198] (0/4) Epoch 9, batch 2050, loss[loss=0.2295, ctc_loss=0.1595, cr_loss=0.3504, over 17238.00 frames. ], tot_loss[loss=0.2593, ctc_loss=0.1816, cr_loss=0.3887, over 3368359.02 frames. ], batch size: 42, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:16:43,967 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.371e+02 1.510e+02 1.685e+02 3.292e+02, threshold=3.020e+02, percent-clipped=1.0 2024-09-23 02:16:50,806 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:17:14,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=155110.66666666666, ans=0.025 2024-09-23 02:17:18,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=155110.66666666666, ans=0.125 2024-09-23 02:17:33,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=155157.33333333334, ans=0.5 2024-09-23 02:17:47,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=155204.0, ans=0.025 2024-09-23 02:17:53,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2024-09-23 02:17:53,735 INFO [train.py:1198] (0/4) Epoch 9, batch 2100, loss[loss=0.2323, ctc_loss=0.1649, cr_loss=0.337, over 17098.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1814, cr_loss=0.3886, over 3358183.52 frames. ], batch size: 49, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:18:20,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=155297.33333333334, ans=0.0 2024-09-23 02:18:57,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-09-23 02:19:02,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=155437.33333333334, ans=0.025 2024-09-23 02:19:06,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=155437.33333333334, ans=0.125 2024-09-23 02:19:07,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=155437.33333333334, ans=0.125 2024-09-23 02:19:08,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=155437.33333333334, ans=0.125 2024-09-23 02:19:12,973 INFO [train.py:1198] (0/4) Epoch 9, batch 2150, loss[loss=0.2616, ctc_loss=0.1767, cr_loss=0.4245, over 17315.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1818, cr_loss=0.389, over 3348737.92 frames. ], batch size: 51, lr: 1.34e-02, grad_scale: 32.0 2024-09-23 02:19:28,283 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.379e+02 1.514e+02 1.800e+02 2.768e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 02:19:28,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=155484.0, ans=0.125 2024-09-23 02:19:38,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=155530.66666666666, ans=0.125 2024-09-23 02:20:00,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-09-23 02:20:03,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=155577.33333333334, ans=0.0 2024-09-23 02:20:11,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=155624.0, ans=0.95 2024-09-23 02:20:38,245 INFO [train.py:1198] (0/4) Epoch 9, batch 2200, loss[loss=0.2592, ctc_loss=0.1823, cr_loss=0.3847, over 16463.00 frames. ], tot_loss[loss=0.2592, ctc_loss=0.1815, cr_loss=0.3886, over 3353752.05 frames. ], batch size: 66, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:20:49,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=155717.33333333334, ans=0.0 2024-09-23 02:21:16,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=155810.66666666666, ans=0.0 2024-09-23 02:21:31,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2024-09-23 02:21:39,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=155857.33333333334, ans=0.125 2024-09-23 02:21:41,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=155857.33333333334, ans=0.0 2024-09-23 02:21:44,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.30 vs. limit=22.5 2024-09-23 02:21:50,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.11 vs. limit=10.0 2024-09-23 02:22:03,763 INFO [train.py:1198] (0/4) Epoch 9, batch 2250, loss[loss=0.2343, ctc_loss=0.1586, cr_loss=0.3784, over 17108.00 frames. ], tot_loss[loss=0.2596, ctc_loss=0.1817, cr_loss=0.3898, over 3362032.41 frames. ], batch size: 40, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:22:13,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155950.66666666666, ans=0.1 2024-09-23 02:22:16,472 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.340e+02 1.524e+02 1.728e+02 3.121e+02, threshold=3.047e+02, percent-clipped=1.0 2024-09-23 02:22:16,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155950.66666666666, ans=0.1 2024-09-23 02:22:23,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=155997.33333333334, ans=0.0 2024-09-23 02:22:47,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=156044.0, ans=0.2 2024-09-23 02:23:06,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-09-23 02:23:10,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=156137.33333333334, ans=0.09899494936611666 2024-09-23 02:23:16,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.07 vs. limit=15.0 2024-09-23 02:23:18,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=156137.33333333334, ans=0.125 2024-09-23 02:23:23,344 INFO [train.py:1198] (0/4) Epoch 9, batch 2300, loss[loss=0.2445, ctc_loss=0.1739, cr_loss=0.3526, over 17332.00 frames. ], tot_loss[loss=0.2594, ctc_loss=0.1815, cr_loss=0.3894, over 3355113.97 frames. ], batch size: 51, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:23:57,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=156277.33333333334, ans=0.125 2024-09-23 02:24:05,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=156277.33333333334, ans=0.025 2024-09-23 02:24:28,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=156370.66666666666, ans=0.0 2024-09-23 02:24:45,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.74 vs. limit=10.0 2024-09-23 02:24:48,597 INFO [train.py:1198] (0/4) Epoch 9, batch 2350, loss[loss=0.2424, ctc_loss=0.167, cr_loss=0.3769, over 17256.00 frames. ], tot_loss[loss=0.2576, ctc_loss=0.1801, cr_loss=0.3873, over 3356991.98 frames. ], batch size: 44, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:24:52,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-09-23 02:25:01,204 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.341e+02 1.440e+02 1.582e+02 2.935e+02, threshold=2.879e+02, percent-clipped=0.0 2024-09-23 02:25:15,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=156464.0, ans=0.04949747468305833 2024-09-23 02:25:30,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=156510.66666666666, ans=0.1 2024-09-23 02:25:51,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=156557.33333333334, ans=0.2 2024-09-23 02:26:02,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=156604.0, ans=0.0 2024-09-23 02:26:10,108 INFO [train.py:1198] (0/4) Epoch 9, batch 2400, loss[loss=0.247, ctc_loss=0.1683, cr_loss=0.3936, over 17158.00 frames. ], tot_loss[loss=0.2582, ctc_loss=0.1805, cr_loss=0.3884, over 3360610.92 frames. ], batch size: 45, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:26:34,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=156697.33333333334, ans=0.2 2024-09-23 02:26:57,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=156744.0, ans=0.2 2024-09-23 02:26:59,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=156790.66666666666, ans=0.125 2024-09-23 02:27:08,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=156790.66666666666, ans=0.0 2024-09-23 02:27:12,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156790.66666666666, ans=0.1 2024-09-23 02:27:32,781 INFO [train.py:1198] (0/4) Epoch 9, batch 2450, loss[loss=0.2813, ctc_loss=0.2009, cr_loss=0.4021, over 16764.00 frames. ], tot_loss[loss=0.2583, ctc_loss=0.1807, cr_loss=0.388, over 3360536.63 frames. ], batch size: 61, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:27:37,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=156884.0, ans=0.025 2024-09-23 02:27:44,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=156884.0, ans=0.125 2024-09-23 02:27:45,480 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.502e+02 1.626e+02 1.858e+02 2.761e+02, threshold=3.252e+02, percent-clipped=0.0 2024-09-23 02:28:52,859 INFO [train.py:1198] (0/4) Epoch 9, batch 2500, loss[loss=0.201, ctc_loss=0.1373, cr_loss=0.3185, over 16312.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1796, cr_loss=0.3863, over 3368262.43 frames. ], batch size: 36, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:28:59,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=157117.33333333334, ans=0.125 2024-09-23 02:29:21,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=157164.0, ans=0.125 2024-09-23 02:29:31,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-09-23 02:29:48,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=157257.33333333334, ans=0.125 2024-09-23 02:30:06,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=157304.0, ans=0.0 2024-09-23 02:30:18,045 INFO [train.py:1198] (0/4) Epoch 9, batch 2550, loss[loss=0.2358, ctc_loss=0.1667, cr_loss=0.3454, over 17305.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1788, cr_loss=0.3847, over 3371605.60 frames. ], batch size: 49, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:30:30,730 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.377e+02 1.519e+02 1.754e+02 2.605e+02, threshold=3.038e+02, percent-clipped=0.0 2024-09-23 02:30:57,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=157444.0, ans=0.2 2024-09-23 02:30:57,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=157444.0, ans=0.2 2024-09-23 02:31:26,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.93 vs. limit=10.0 2024-09-23 02:31:28,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=157537.33333333334, ans=0.125 2024-09-23 02:31:33,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=157537.33333333334, ans=0.125 2024-09-23 02:31:38,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=157537.33333333334, ans=0.125 2024-09-23 02:31:40,940 INFO [train.py:1198] (0/4) Epoch 9, batch 2600, loss[loss=0.2407, ctc_loss=0.1693, cr_loss=0.3573, over 17086.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1784, cr_loss=0.3848, over 3377449.41 frames. ], batch size: 43, lr: 1.33e-02, grad_scale: 32.0 2024-09-23 02:31:43,004 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:32:11,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=157630.66666666666, ans=0.125 2024-09-23 02:32:11,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=157630.66666666666, ans=0.2 2024-09-23 02:32:18,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=157677.33333333334, ans=22.5 2024-09-23 02:32:26,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=157677.33333333334, ans=15.0 2024-09-23 02:32:41,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=22.5 2024-09-23 02:32:59,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=157770.66666666666, ans=0.2 2024-09-23 02:33:04,041 INFO [train.py:1198] (0/4) Epoch 9, batch 2650, loss[loss=0.265, ctc_loss=0.1839, cr_loss=0.4054, over 17028.00 frames. ], tot_loss[loss=0.2557, ctc_loss=0.1789, cr_loss=0.3842, over 3365541.16 frames. ], batch size: 52, lr: 1.33e-02, grad_scale: 64.0 2024-09-23 02:33:16,814 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.311e+02 1.423e+02 1.645e+02 2.492e+02, threshold=2.847e+02, percent-clipped=0.0 2024-09-23 02:33:25,075 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:33:27,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-23 02:33:42,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2024-09-23 02:34:14,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158004.0, ans=0.1 2024-09-23 02:34:14,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=158004.0, ans=0.025 2024-09-23 02:34:17,425 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=2.962e-03 2024-09-23 02:34:26,716 INFO [train.py:1198] (0/4) Epoch 9, batch 2700, loss[loss=0.21, ctc_loss=0.147, cr_loss=0.3147, over 16760.00 frames. ], tot_loss[loss=0.2563, ctc_loss=0.1793, cr_loss=0.385, over 3352627.22 frames. ], batch size: 37, lr: 1.32e-02, grad_scale: 64.0 2024-09-23 02:34:37,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=158050.66666666666, ans=0.125 2024-09-23 02:34:46,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=158097.33333333334, ans=0.025 2024-09-23 02:34:46,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=158097.33333333334, ans=0.0 2024-09-23 02:35:08,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.17 vs. limit=15.0 2024-09-23 02:35:21,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-09-23 02:35:30,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=158190.66666666666, ans=0.125 2024-09-23 02:35:51,928 INFO [train.py:1198] (0/4) Epoch 9, batch 2750, loss[loss=0.2387, ctc_loss=0.1661, cr_loss=0.3631, over 17210.00 frames. ], tot_loss[loss=0.2558, ctc_loss=0.179, cr_loss=0.3839, over 3351437.18 frames. ], batch size: 47, lr: 1.32e-02, grad_scale: 32.0 2024-09-23 02:36:06,252 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.343e+02 1.485e+02 1.822e+02 3.173e+02, threshold=2.970e+02, percent-clipped=1.0 2024-09-23 02:36:33,868 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:36:37,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=158377.33333333334, ans=10.0 2024-09-23 02:37:14,490 INFO [train.py:1198] (0/4) Epoch 9, batch 2800, loss[loss=0.237, ctc_loss=0.1641, cr_loss=0.3644, over 17014.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1791, cr_loss=0.3841, over 3345350.61 frames. ], batch size: 44, lr: 1.32e-02, grad_scale: 32.0 2024-09-23 02:37:21,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=158517.33333333334, ans=0.025 2024-09-23 02:37:32,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158564.0, ans=0.1 2024-09-23 02:37:52,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-09-23 02:38:05,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.07 vs. limit=10.0 2024-09-23 02:38:06,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=158657.33333333334, ans=0.0 2024-09-23 02:38:09,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=158657.33333333334, ans=0.07 2024-09-23 02:38:23,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=158704.0, ans=0.07 2024-09-23 02:38:34,673 INFO [train.py:1198] (0/4) Epoch 9, batch 2850, loss[loss=0.2338, ctc_loss=0.1637, cr_loss=0.3505, over 17206.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.18, cr_loss=0.3852, over 3344314.34 frames. ], batch size: 47, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:38:50,463 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.314e+02 1.411e+02 1.546e+02 2.607e+02, threshold=2.821e+02, percent-clipped=0.0 2024-09-23 02:38:57,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=158797.33333333334, ans=0.015 2024-09-23 02:39:12,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=158844.0, ans=0.125 2024-09-23 02:39:14,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=158844.0, ans=0.125 2024-09-23 02:39:33,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=158890.66666666666, ans=0.0 2024-09-23 02:39:34,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=158890.66666666666, ans=0.125 2024-09-23 02:39:56,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158937.33333333334, ans=0.1 2024-09-23 02:39:59,461 INFO [train.py:1198] (0/4) Epoch 9, batch 2900, loss[loss=0.2518, ctc_loss=0.1739, cr_loss=0.3898, over 17216.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.18, cr_loss=0.3857, over 3351588.06 frames. ], batch size: 50, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:40:48,607 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:41:17,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=159170.66666666666, ans=0.0 2024-09-23 02:41:21,662 INFO [train.py:1198] (0/4) Epoch 9, batch 2950, loss[loss=0.2474, ctc_loss=0.1733, cr_loss=0.3706, over 17296.00 frames. ], tot_loss[loss=0.2574, ctc_loss=0.1801, cr_loss=0.3861, over 3356091.74 frames. ], batch size: 51, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:41:40,277 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.351e+02 1.516e+02 1.686e+02 2.482e+02, threshold=3.032e+02, percent-clipped=0.0 2024-09-23 02:41:53,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=159264.0, ans=0.125 2024-09-23 02:41:59,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=159310.66666666666, ans=0.09899494936611666 2024-09-23 02:42:43,438 INFO [train.py:1198] (0/4) Epoch 9, batch 3000, loss[loss=0.2766, ctc_loss=0.1898, cr_loss=0.4339, over 17027.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1804, cr_loss=0.3866, over 3358677.73 frames. ], batch size: 52, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:42:43,439 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 02:42:59,050 INFO [train.py:1230] (0/4) Epoch 9, validation: loss=0.05024, ctc_loss=0.05024, cr_loss=7.059e-15, over 944034.00 frames. 2024-09-23 02:42:59,051 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 02:43:46,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=159590.66666666666, ans=0.125 2024-09-23 02:43:51,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=159590.66666666666, ans=0.0 2024-09-23 02:43:59,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.35 vs. limit=22.5 2024-09-23 02:44:11,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=159637.33333333334, ans=0.0 2024-09-23 02:44:17,777 INFO [train.py:1198] (0/4) Epoch 9, batch 3050, loss[loss=0.2952, ctc_loss=0.2148, cr_loss=0.4022, over 16915.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1804, cr_loss=0.3868, over 3360785.40 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:44:33,651 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.314e+02 1.416e+02 1.662e+02 2.316e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 02:45:08,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=159824.0, ans=0.04949747468305833 2024-09-23 02:45:29,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=159870.66666666666, ans=0.1 2024-09-23 02:45:31,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=159870.66666666666, ans=0.125 2024-09-23 02:45:35,917 INFO [train.py:1198] (0/4) Epoch 9, batch 3100, loss[loss=0.2236, ctc_loss=0.1523, cr_loss=0.3566, over 17165.00 frames. ], tot_loss[loss=0.2585, ctc_loss=0.1811, cr_loss=0.3873, over 3353763.97 frames. ], batch size: 45, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:45:39,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=159917.33333333334, ans=0.0 2024-09-23 02:45:54,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=159964.0, ans=0.125 2024-09-23 02:46:02,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=159964.0, ans=0.0 2024-09-23 02:46:35,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=160057.33333333334, ans=0.125 2024-09-23 02:46:59,397 INFO [train.py:1198] (0/4) Epoch 9, batch 3150, loss[loss=0.2342, ctc_loss=0.161, cr_loss=0.3658, over 17156.00 frames. ], tot_loss[loss=0.2578, ctc_loss=0.1805, cr_loss=0.3868, over 3353162.44 frames. ], batch size: 48, lr: 1.32e-02, grad_scale: 16.0 2024-09-23 02:47:04,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=160150.66666666666, ans=0.125 2024-09-23 02:47:15,044 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.398e+02 1.504e+02 1.697e+02 2.388e+02, threshold=3.008e+02, percent-clipped=0.0 2024-09-23 02:47:15,398 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:47:29,167 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:47:54,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=160290.66666666666, ans=0.0 2024-09-23 02:48:10,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=160337.33333333334, ans=0.025 2024-09-23 02:48:16,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=160384.0, ans=0.0 2024-09-23 02:48:17,653 INFO [train.py:1198] (0/4) Epoch 9, batch 3200, loss[loss=0.2743, ctc_loss=0.1946, cr_loss=0.3983, over 16894.00 frames. ], tot_loss[loss=0.2567, ctc_loss=0.1796, cr_loss=0.3857, over 3359568.61 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 32.0 2024-09-23 02:48:36,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=160430.66666666666, ans=0.125 2024-09-23 02:48:51,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-23 02:48:53,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=160477.33333333334, ans=0.2 2024-09-23 02:48:58,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160477.33333333334, ans=0.1 2024-09-23 02:49:09,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=160524.0, ans=0.2 2024-09-23 02:49:29,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=160570.66666666666, ans=0.125 2024-09-23 02:49:29,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-09-23 02:49:38,717 INFO [train.py:1198] (0/4) Epoch 9, batch 3250, loss[loss=0.2558, ctc_loss=0.1771, cr_loss=0.3934, over 17307.00 frames. ], tot_loss[loss=0.2555, ctc_loss=0.1787, cr_loss=0.384, over 3365728.29 frames. ], batch size: 51, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:49:42,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=160617.33333333334, ans=0.0 2024-09-23 02:49:54,285 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.347e+02 1.464e+02 1.659e+02 3.194e+02, threshold=2.929e+02, percent-clipped=1.0 2024-09-23 02:49:59,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=160664.0, ans=0.95 2024-09-23 02:50:02,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=160664.0, ans=0.2 2024-09-23 02:50:05,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=160664.0, ans=0.0 2024-09-23 02:50:19,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=160710.66666666666, ans=0.125 2024-09-23 02:50:28,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=160757.33333333334, ans=0.125 2024-09-23 02:50:33,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=160757.33333333334, ans=0.0 2024-09-23 02:50:48,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=160804.0, ans=0.125 2024-09-23 02:50:53,872 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 02:50:56,581 INFO [train.py:1198] (0/4) Epoch 9, batch 3300, loss[loss=0.2725, ctc_loss=0.1903, cr_loss=0.4111, over 16917.00 frames. ], tot_loss[loss=0.2562, ctc_loss=0.1792, cr_loss=0.3851, over 3371522.76 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:51:09,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=160850.66666666666, ans=0.05 2024-09-23 02:51:15,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=160897.33333333334, ans=0.025 2024-09-23 02:51:31,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=160944.0, ans=0.0 2024-09-23 02:51:35,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=160944.0, ans=15.0 2024-09-23 02:51:42,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-23 02:51:43,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=160990.66666666666, ans=0.0 2024-09-23 02:51:51,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=160990.66666666666, ans=0.125 2024-09-23 02:52:07,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-09-23 02:52:15,711 INFO [train.py:1198] (0/4) Epoch 9, batch 3350, loss[loss=0.261, ctc_loss=0.1807, cr_loss=0.4016, over 17016.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1787, cr_loss=0.3836, over 3360733.53 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:52:26,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=161084.0, ans=0.0 2024-09-23 02:52:29,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=161130.66666666666, ans=0.1 2024-09-23 02:52:31,265 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.345e+02 1.546e+02 1.825e+02 3.271e+02, threshold=3.093e+02, percent-clipped=1.0 2024-09-23 02:52:39,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=161130.66666666666, ans=0.1 2024-09-23 02:53:14,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2024-09-23 02:53:15,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2024-09-23 02:53:18,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=161270.66666666666, ans=0.025 2024-09-23 02:53:33,968 INFO [train.py:1198] (0/4) Epoch 9, batch 3400, loss[loss=0.269, ctc_loss=0.1891, cr_loss=0.3992, over 16729.00 frames. ], tot_loss[loss=0.2554, ctc_loss=0.1788, cr_loss=0.3833, over 3356162.88 frames. ], batch size: 61, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:53:38,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161317.33333333334, ans=0.1 2024-09-23 02:53:43,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2024-09-23 02:54:35,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-09-23 02:54:51,711 INFO [train.py:1198] (0/4) Epoch 9, batch 3450, loss[loss=0.2256, ctc_loss=0.1588, cr_loss=0.3341, over 17260.00 frames. ], tot_loss[loss=0.2561, ctc_loss=0.1792, cr_loss=0.3846, over 3355173.78 frames. ], batch size: 42, lr: 1.31e-02, grad_scale: 16.0 2024-09-23 02:55:08,998 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.381e+02 1.554e+02 1.871e+02 2.951e+02, threshold=3.107e+02, percent-clipped=0.0 2024-09-23 02:55:51,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=161690.66666666666, ans=0.125 2024-09-23 02:55:52,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=161737.33333333334, ans=0.125 2024-09-23 02:56:12,427 INFO [train.py:1198] (0/4) Epoch 9, batch 3500, loss[loss=0.2136, ctc_loss=0.1459, cr_loss=0.3384, over 16973.00 frames. ], tot_loss[loss=0.2571, ctc_loss=0.18, cr_loss=0.3851, over 3351745.33 frames. ], batch size: 42, lr: 1.31e-02, grad_scale: 16.0 2024-09-23 02:56:25,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=161784.0, ans=0.0 2024-09-23 02:56:51,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=161877.33333333334, ans=0.0 2024-09-23 02:57:31,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162017.33333333334, ans=0.1 2024-09-23 02:57:32,437 INFO [train.py:1198] (0/4) Epoch 9, batch 3550, loss[loss=0.2387, ctc_loss=0.1663, cr_loss=0.362, over 17293.00 frames. ], tot_loss[loss=0.258, ctc_loss=0.1807, cr_loss=0.3865, over 3346199.56 frames. ], batch size: 46, lr: 1.31e-02, grad_scale: 16.0 2024-09-23 02:57:45,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=162017.33333333334, ans=0.125 2024-09-23 02:57:49,783 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.397e+02 1.544e+02 1.862e+02 4.630e+02, threshold=3.088e+02, percent-clipped=2.0 2024-09-23 02:57:53,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.68 vs. limit=10.0 2024-09-23 02:57:54,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=162064.0, ans=0.2 2024-09-23 02:57:56,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=162064.0, ans=0.0 2024-09-23 02:58:19,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=162157.33333333334, ans=0.125 2024-09-23 02:58:22,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=162157.33333333334, ans=0.0 2024-09-23 02:58:35,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=162204.0, ans=0.2 2024-09-23 02:58:41,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=162204.0, ans=0.2 2024-09-23 02:58:43,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=162204.0, ans=0.125 2024-09-23 02:58:47,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=162204.0, ans=0.0 2024-09-23 02:58:52,213 INFO [train.py:1198] (0/4) Epoch 9, batch 3600, loss[loss=0.3299, ctc_loss=0.2382, cr_loss=0.4586, over 15939.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1806, cr_loss=0.3858, over 3339070.11 frames. ], batch size: 74, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 02:59:09,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=162297.33333333334, ans=0.125 2024-09-23 02:59:36,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=162344.0, ans=0.125 2024-09-23 02:59:58,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=162437.33333333334, ans=0.125 2024-09-23 03:00:08,997 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:00:10,091 INFO [train.py:1198] (0/4) Epoch 9, batch 3650, loss[loss=0.2755, ctc_loss=0.1946, cr_loss=0.4046, over 16870.00 frames. ], tot_loss[loss=0.2577, ctc_loss=0.1805, cr_loss=0.3863, over 3343177.52 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 03:00:25,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2024-09-23 03:00:27,449 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.388e+02 1.522e+02 1.765e+02 2.573e+02, threshold=3.044e+02, percent-clipped=0.0 2024-09-23 03:00:55,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-09-23 03:01:07,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=162624.0, ans=0.125 2024-09-23 03:01:10,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=162624.0, ans=0.125 2024-09-23 03:01:11,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.23 vs. limit=22.5 2024-09-23 03:01:12,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=162624.0, ans=0.0 2024-09-23 03:01:20,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=162670.66666666666, ans=0.0 2024-09-23 03:01:23,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=162670.66666666666, ans=0.125 2024-09-23 03:01:24,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=162670.66666666666, ans=0.0 2024-09-23 03:01:30,907 INFO [train.py:1198] (0/4) Epoch 9, batch 3700, loss[loss=0.2791, ctc_loss=0.1954, cr_loss=0.4181, over 17088.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.1814, cr_loss=0.3881, over 3342247.27 frames. ], batch size: 49, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 03:01:38,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=162717.33333333334, ans=0.125 2024-09-23 03:01:54,435 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:01:54,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=162764.0, ans=0.0 2024-09-23 03:02:42,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=22.5 2024-09-23 03:02:48,724 INFO [train.py:1198] (0/4) Epoch 9, batch 3750, loss[loss=0.2493, ctc_loss=0.1712, cr_loss=0.3909, over 17026.00 frames. ], tot_loss[loss=0.2587, ctc_loss=0.1812, cr_loss=0.3875, over 3325591.43 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2024-09-23 03:03:05,821 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.286e+02 1.440e+02 1.620e+02 2.372e+02, threshold=2.880e+02, percent-clipped=0.0 2024-09-23 03:03:48,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163090.66666666666, ans=0.1 2024-09-23 03:03:59,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=163137.33333333334, ans=0.125 2024-09-23 03:04:07,106 INFO [train.py:1198] (0/4) Epoch 9, batch 3800, loss[loss=0.2971, ctc_loss=0.2142, cr_loss=0.4146, over 15068.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1798, cr_loss=0.385, over 3326765.37 frames. ], batch size: 89, lr: 1.30e-02, grad_scale: 32.0 2024-09-23 03:04:26,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=163230.66666666666, ans=0.2 2024-09-23 03:04:32,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2024-09-23 03:04:40,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=163277.33333333334, ans=0.125 2024-09-23 03:05:07,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2024-09-23 03:05:20,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=163370.66666666666, ans=15.0 2024-09-23 03:05:24,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=163417.33333333334, ans=0.2 2024-09-23 03:05:25,697 INFO [train.py:1198] (0/4) Epoch 9, batch 3850, loss[loss=0.2085, ctc_loss=0.1406, cr_loss=0.3395, over 17339.00 frames. ], tot_loss[loss=0.2572, ctc_loss=0.1802, cr_loss=0.3846, over 3299050.01 frames. ], batch size: 43, lr: 1.30e-02, grad_scale: 32.0 2024-09-23 03:05:27,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=22.5 2024-09-23 03:05:36,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=163417.33333333334, ans=0.07 2024-09-23 03:05:42,524 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.391e+02 1.511e+02 1.701e+02 2.274e+02, threshold=3.022e+02, percent-clipped=0.0 2024-09-23 03:05:46,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2024-09-23 03:05:55,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=163510.66666666666, ans=0.125 2024-09-23 03:05:55,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=163510.66666666666, ans=0.0 2024-09-23 03:05:57,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=163510.66666666666, ans=0.5 2024-09-23 03:06:01,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=163510.66666666666, ans=0.125 2024-09-23 03:06:17,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=163557.33333333334, ans=0.125 2024-09-23 03:06:35,492 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-9.pt 2024-09-23 03:07:26,831 INFO [train.py:1198] (0/4) Epoch 10, batch 0, loss[loss=0.2665, ctc_loss=0.1823, cr_loss=0.421, over 17309.00 frames. ], tot_loss[loss=0.2665, ctc_loss=0.1823, cr_loss=0.421, over 17309.00 frames. ], batch size: 49, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:07:26,833 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 03:07:41,776 INFO [train.py:1230] (0/4) Epoch 10, validation: loss=0.05143, ctc_loss=0.05143, cr_loss=7.705e-15, over 944034.00 frames. 2024-09-23 03:07:41,777 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 03:07:44,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=15.0 2024-09-23 03:08:04,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=163678.66666666666, ans=0.0 2024-09-23 03:08:34,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=163772.0, ans=0.035 2024-09-23 03:08:34,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=163772.0, ans=0.125 2024-09-23 03:09:05,466 INFO [train.py:1198] (0/4) Epoch 10, batch 50, loss[loss=0.2507, ctc_loss=0.1713, cr_loss=0.3971, over 17060.00 frames. ], tot_loss[loss=0.2598, ctc_loss=0.1821, cr_loss=0.3884, over 745468.05 frames. ], batch size: 46, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:09:10,607 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:09:18,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=163865.33333333334, ans=0.125 2024-09-23 03:09:29,250 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.322e+02 1.460e+02 1.757e+02 2.503e+02, threshold=2.921e+02, percent-clipped=0.0 2024-09-23 03:09:37,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=163958.66666666666, ans=0.025 2024-09-23 03:09:50,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=163958.66666666666, ans=0.2 2024-09-23 03:09:52,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2024-09-23 03:10:01,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164005.33333333334, ans=0.1 2024-09-23 03:10:07,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=164052.0, ans=0.07 2024-09-23 03:10:24,756 INFO [train.py:1198] (0/4) Epoch 10, batch 100, loss[loss=0.2534, ctc_loss=0.1785, cr_loss=0.3742, over 16757.00 frames. ], tot_loss[loss=0.2608, ctc_loss=0.183, cr_loss=0.3892, over 1310868.14 frames. ], batch size: 61, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:10:39,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=12.0 2024-09-23 03:10:42,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=164145.33333333334, ans=0.07 2024-09-23 03:10:45,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=164145.33333333334, ans=0.125 2024-09-23 03:11:11,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=164192.0, ans=0.125 2024-09-23 03:11:16,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=164238.66666666666, ans=0.125 2024-09-23 03:11:16,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=164238.66666666666, ans=0.0 2024-09-23 03:11:46,744 INFO [train.py:1198] (0/4) Epoch 10, batch 150, loss[loss=0.2153, ctc_loss=0.1496, cr_loss=0.3281, over 16942.00 frames. ], tot_loss[loss=0.2581, ctc_loss=0.1806, cr_loss=0.3875, over 1764955.95 frames. ], batch size: 42, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:11:49,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.73 vs. limit=10.0 2024-09-23 03:12:04,793 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=22.5 2024-09-23 03:12:12,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=164378.66666666666, ans=0.1 2024-09-23 03:12:12,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=164378.66666666666, ans=0.0 2024-09-23 03:12:13,436 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.322e+02 1.418e+02 1.649e+02 2.765e+02, threshold=2.835e+02, percent-clipped=0.0 2024-09-23 03:12:14,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2024-09-23 03:12:23,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164425.33333333334, ans=0.1 2024-09-23 03:12:25,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=164425.33333333334, ans=0.025 2024-09-23 03:12:45,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=164472.0, ans=0.125 2024-09-23 03:12:47,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=164472.0, ans=0.0 2024-09-23 03:13:09,148 INFO [train.py:1198] (0/4) Epoch 10, batch 200, loss[loss=0.2448, ctc_loss=0.1716, cr_loss=0.3663, over 17055.00 frames. ], tot_loss[loss=0.259, ctc_loss=0.181, cr_loss=0.3896, over 2121297.81 frames. ], batch size: 56, lr: 1.24e-02, grad_scale: 32.0 2024-09-23 03:13:19,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=164565.33333333334, ans=0.2 2024-09-23 03:13:23,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=164612.0, ans=0.0 2024-09-23 03:13:27,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164612.0, ans=0.1 2024-09-23 03:13:45,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=164658.66666666666, ans=0.2 2024-09-23 03:13:56,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=164658.66666666666, ans=0.125 2024-09-23 03:13:56,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=164658.66666666666, ans=0.0 2024-09-23 03:13:58,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2024-09-23 03:14:01,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=164705.33333333334, ans=0.125 2024-09-23 03:14:25,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=164752.0, ans=0.125 2024-09-23 03:14:33,547 INFO [train.py:1198] (0/4) Epoch 10, batch 250, loss[loss=0.2483, ctc_loss=0.1706, cr_loss=0.3886, over 17210.00 frames. ], tot_loss[loss=0.2591, ctc_loss=0.1811, cr_loss=0.3902, over 2385737.43 frames. ], batch size: 47, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:14:57,019 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.299e+02 1.403e+02 1.575e+02 2.434e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-23 03:14:57,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=164845.33333333334, ans=0.125 2024-09-23 03:15:05,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2024-09-23 03:15:09,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=164892.0, ans=0.5 2024-09-23 03:15:17,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=164892.0, ans=0.125 2024-09-23 03:15:32,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=164938.66666666666, ans=0.0 2024-09-23 03:15:33,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=164938.66666666666, ans=0.125 2024-09-23 03:15:55,551 INFO [train.py:1198] (0/4) Epoch 10, batch 300, loss[loss=0.2587, ctc_loss=0.1816, cr_loss=0.3858, over 17217.00 frames. ], tot_loss[loss=0.2588, ctc_loss=0.1808, cr_loss=0.3903, over 2600524.29 frames. ], batch size: 55, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:16:33,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=165125.33333333334, ans=0.2 2024-09-23 03:16:44,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2024-09-23 03:16:51,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2024-09-23 03:17:03,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=165218.66666666666, ans=0.125 2024-09-23 03:17:17,895 INFO [train.py:1198] (0/4) Epoch 10, batch 350, loss[loss=0.2615, ctc_loss=0.1832, cr_loss=0.3913, over 17305.00 frames. ], tot_loss[loss=0.2568, ctc_loss=0.1791, cr_loss=0.3885, over 2776151.58 frames. ], batch size: 51, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:17:21,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-09-23 03:17:35,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=165312.0, ans=0.125 2024-09-23 03:17:42,146 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.372e+02 1.514e+02 1.677e+02 2.269e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 03:17:42,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165312.0, ans=0.1 2024-09-23 03:18:24,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2024-09-23 03:18:43,356 INFO [train.py:1198] (0/4) Epoch 10, batch 400, loss[loss=0.2731, ctc_loss=0.1924, cr_loss=0.4037, over 17153.00 frames. ], tot_loss[loss=0.2564, ctc_loss=0.1788, cr_loss=0.3878, over 2906796.70 frames. ], batch size: 45, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:19:09,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2024-09-23 03:19:26,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=165592.0, ans=0.025 2024-09-23 03:19:55,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=165685.33333333334, ans=0.125 2024-09-23 03:20:03,261 INFO [train.py:1198] (0/4) Epoch 10, batch 450, loss[loss=0.2555, ctc_loss=0.1727, cr_loss=0.414, over 17015.00 frames. ], tot_loss[loss=0.257, ctc_loss=0.1794, cr_loss=0.3881, over 2990618.63 frames. ], batch size: 44, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:20:19,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=165778.66666666666, ans=0.125 2024-09-23 03:20:26,967 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.328e+02 1.495e+02 1.704e+02 3.618e+02, threshold=2.990e+02, percent-clipped=1.0 2024-09-23 03:20:46,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=165825.33333333334, ans=0.0 2024-09-23 03:21:25,283 INFO [train.py:1198] (0/4) Epoch 10, batch 500, loss[loss=0.2817, ctc_loss=0.1985, cr_loss=0.4163, over 17364.00 frames. ], tot_loss[loss=0.2559, ctc_loss=0.1785, cr_loss=0.3867, over 3075518.26 frames. ], batch size: 48, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:21:28,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=165965.33333333334, ans=0.0 2024-09-23 03:21:47,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2024-09-23 03:22:21,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=166105.33333333334, ans=0.125 2024-09-23 03:22:32,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=166152.0, ans=0.0 2024-09-23 03:22:47,888 INFO [train.py:1198] (0/4) Epoch 10, batch 550, loss[loss=0.2917, ctc_loss=0.206, cr_loss=0.4284, over 14920.00 frames. ], tot_loss[loss=0.2548, ctc_loss=0.1777, cr_loss=0.3854, over 3140068.47 frames. ], batch size: 89, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:22:49,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=15.0 2024-09-23 03:23:01,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=166198.66666666666, ans=0.0 2024-09-23 03:23:11,788 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.308e+02 1.376e+02 1.532e+02 2.311e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 03:23:41,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=166338.66666666666, ans=0.125 2024-09-23 03:23:44,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=166338.66666666666, ans=0.0 2024-09-23 03:24:05,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2024-09-23 03:24:13,175 INFO [train.py:1198] (0/4) Epoch 10, batch 600, loss[loss=0.2385, ctc_loss=0.1633, cr_loss=0.3757, over 16962.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1766, cr_loss=0.3847, over 3190223.68 frames. ], batch size: 42, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:24:25,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-09-23 03:24:34,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=166478.66666666666, ans=0.0 2024-09-23 03:24:38,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=166478.66666666666, ans=0.2 2024-09-23 03:24:40,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=166478.66666666666, ans=0.0 2024-09-23 03:25:32,745 INFO [train.py:1198] (0/4) Epoch 10, batch 650, loss[loss=0.2776, ctc_loss=0.1972, cr_loss=0.4019, over 16669.00 frames. ], tot_loss[loss=0.2538, ctc_loss=0.1769, cr_loss=0.3848, over 3231408.36 frames. ], batch size: 61, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:25:51,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=166712.0, ans=0.125 2024-09-23 03:25:59,421 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.346e+02 1.493e+02 1.797e+02 2.927e+02, threshold=2.987e+02, percent-clipped=1.0 2024-09-23 03:26:18,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2024-09-23 03:26:24,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=166805.33333333334, ans=0.0 2024-09-23 03:26:27,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=12.0 2024-09-23 03:26:36,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=166805.33333333334, ans=10.0 2024-09-23 03:26:38,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=166852.0, ans=0.07 2024-09-23 03:26:49,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=166852.0, ans=0.0 2024-09-23 03:26:49,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=166852.0, ans=0.0 2024-09-23 03:26:54,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=166898.66666666666, ans=0.0 2024-09-23 03:26:55,529 INFO [train.py:1198] (0/4) Epoch 10, batch 700, loss[loss=0.2706, ctc_loss=0.1925, cr_loss=0.3906, over 16986.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1766, cr_loss=0.3845, over 3264828.43 frames. ], batch size: 53, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:27:11,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=166898.66666666666, ans=0.125 2024-09-23 03:27:13,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2024-09-23 03:27:37,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=166992.0, ans=22.5 2024-09-23 03:27:43,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=166992.0, ans=0.0 2024-09-23 03:27:43,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-23 03:28:01,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167085.33333333334, ans=0.1 2024-09-23 03:28:09,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=167085.33333333334, ans=0.125 2024-09-23 03:28:14,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2024-09-23 03:28:20,934 INFO [train.py:1198] (0/4) Epoch 10, batch 750, loss[loss=0.217, ctc_loss=0.1468, cr_loss=0.3511, over 17113.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1747, cr_loss=0.3814, over 3288199.89 frames. ], batch size: 40, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:28:41,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=167178.66666666666, ans=0.125 2024-09-23 03:28:42,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2024-09-23 03:28:47,712 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.321e+02 1.498e+02 1.811e+02 2.765e+02, threshold=2.996e+02, percent-clipped=0.0 2024-09-23 03:28:48,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=167178.66666666666, ans=0.125 2024-09-23 03:29:24,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=167272.0, ans=0.125 2024-09-23 03:29:43,238 INFO [train.py:1198] (0/4) Epoch 10, batch 800, loss[loss=0.2472, ctc_loss=0.1703, cr_loss=0.3844, over 17333.00 frames. ], tot_loss[loss=0.2498, ctc_loss=0.1739, cr_loss=0.3797, over 3310301.81 frames. ], batch size: 48, lr: 1.23e-02, grad_scale: 32.0 2024-09-23 03:30:29,540 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:30:30,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-09-23 03:30:31,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=167505.33333333334, ans=0.0 2024-09-23 03:30:51,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167552.0, ans=0.1 2024-09-23 03:30:52,910 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:30:53,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167552.0, ans=0.1 2024-09-23 03:31:05,260 INFO [train.py:1198] (0/4) Epoch 10, batch 850, loss[loss=0.2949, ctc_loss=0.2084, cr_loss=0.4325, over 16878.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1753, cr_loss=0.3821, over 3319361.63 frames. ], batch size: 58, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:31:16,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=167598.66666666666, ans=0.0 2024-09-23 03:31:29,265 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.338e+02 1.474e+02 1.669e+02 2.399e+02, threshold=2.948e+02, percent-clipped=0.0 2024-09-23 03:31:29,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=167645.33333333334, ans=0.125 2024-09-23 03:31:40,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=167692.0, ans=0.1 2024-09-23 03:31:44,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2024-09-23 03:31:47,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=167692.0, ans=0.125 2024-09-23 03:31:56,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=167738.66666666666, ans=0.125 2024-09-23 03:32:01,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167738.66666666666, ans=0.1 2024-09-23 03:32:03,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-09-23 03:32:07,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=167785.33333333334, ans=0.2 2024-09-23 03:32:08,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-09-23 03:32:18,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=167785.33333333334, ans=0.125 2024-09-23 03:32:27,772 INFO [train.py:1198] (0/4) Epoch 10, batch 900, loss[loss=0.2166, ctc_loss=0.1491, cr_loss=0.3375, over 17169.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1747, cr_loss=0.382, over 3336466.83 frames. ], batch size: 41, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:33:03,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=167925.33333333334, ans=0.125 2024-09-23 03:33:26,427 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-36000.pt 2024-09-23 03:33:41,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=168018.66666666666, ans=0.125 2024-09-23 03:33:55,326 INFO [train.py:1198] (0/4) Epoch 10, batch 950, loss[loss=0.2867, ctc_loss=0.2089, cr_loss=0.3889, over 15955.00 frames. ], tot_loss[loss=0.2534, ctc_loss=0.1766, cr_loss=0.3838, over 3324504.91 frames. ], batch size: 74, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:34:09,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168112.0, ans=0.1 2024-09-23 03:34:16,241 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:34:18,982 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.276e+02 1.450e+02 1.704e+02 3.070e+02, threshold=2.900e+02, percent-clipped=1.0 2024-09-23 03:34:30,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2024-09-23 03:34:31,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2024-09-23 03:34:35,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=168158.66666666666, ans=0.125 2024-09-23 03:34:43,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-09-23 03:35:14,462 INFO [train.py:1198] (0/4) Epoch 10, batch 1000, loss[loss=0.2393, ctc_loss=0.1651, cr_loss=0.3712, over 16985.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1761, cr_loss=0.383, over 3325666.20 frames. ], batch size: 42, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:35:24,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=168298.66666666666, ans=0.025 2024-09-23 03:35:45,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.04 vs. limit=22.5 2024-09-23 03:35:55,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=168392.0, ans=0.0 2024-09-23 03:36:35,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=168532.0, ans=0.0 2024-09-23 03:36:36,366 INFO [train.py:1198] (0/4) Epoch 10, batch 1050, loss[loss=0.305, ctc_loss=0.2175, cr_loss=0.4376, over 15032.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1767, cr_loss=0.384, over 3328465.45 frames. ], batch size: 89, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:36:38,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=168532.0, ans=0.0 2024-09-23 03:36:48,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2024-09-23 03:37:00,709 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.301e+02 1.448e+02 1.658e+02 2.854e+02, threshold=2.897e+02, percent-clipped=0.0 2024-09-23 03:37:04,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=168578.66666666666, ans=0.0 2024-09-23 03:37:16,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=168625.33333333334, ans=0.1 2024-09-23 03:37:49,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=168718.66666666666, ans=0.5 2024-09-23 03:37:53,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=168718.66666666666, ans=0.025 2024-09-23 03:37:58,858 INFO [train.py:1198] (0/4) Epoch 10, batch 1100, loss[loss=0.2423, ctc_loss=0.1648, cr_loss=0.3875, over 17266.00 frames. ], tot_loss[loss=0.2539, ctc_loss=0.1771, cr_loss=0.3841, over 3332868.86 frames. ], batch size: 44, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:38:03,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168765.33333333334, ans=0.1 2024-09-23 03:38:13,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2024-09-23 03:38:14,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=168765.33333333334, ans=0.125 2024-09-23 03:38:20,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168812.0, ans=0.1 2024-09-23 03:38:23,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=168812.0, ans=0.125 2024-09-23 03:38:54,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.00 vs. limit=10.0 2024-09-23 03:39:23,800 INFO [train.py:1198] (0/4) Epoch 10, batch 1150, loss[loss=0.2623, ctc_loss=0.1856, cr_loss=0.3837, over 16759.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1769, cr_loss=0.3834, over 3337143.06 frames. ], batch size: 61, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:39:25,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=168998.66666666666, ans=0.035 2024-09-23 03:39:33,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2024-09-23 03:39:47,599 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.373e+02 1.514e+02 1.720e+02 2.403e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 03:40:08,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=169092.0, ans=0.125 2024-09-23 03:40:15,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-23 03:40:21,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=169138.66666666666, ans=0.125 2024-09-23 03:40:33,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=169185.33333333334, ans=0.125 2024-09-23 03:40:38,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=169185.33333333334, ans=0.2 2024-09-23 03:40:45,948 INFO [train.py:1198] (0/4) Epoch 10, batch 1200, loss[loss=0.2805, ctc_loss=0.1955, cr_loss=0.4253, over 17103.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.1774, cr_loss=0.3848, over 3340427.65 frames. ], batch size: 49, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:40:54,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=169232.0, ans=15.0 2024-09-23 03:40:54,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-09-23 03:41:01,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2024-09-23 03:41:03,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=169278.66666666666, ans=0.2 2024-09-23 03:41:06,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=169278.66666666666, ans=0.025 2024-09-23 03:41:18,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=169325.33333333334, ans=0.125 2024-09-23 03:41:26,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2024-09-23 03:41:48,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=169418.66666666666, ans=10.0 2024-09-23 03:41:54,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169418.66666666666, ans=0.1 2024-09-23 03:42:03,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=169418.66666666666, ans=0.125 2024-09-23 03:42:05,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=169418.66666666666, ans=0.125 2024-09-23 03:42:07,869 INFO [train.py:1198] (0/4) Epoch 10, batch 1250, loss[loss=0.2685, ctc_loss=0.1875, cr_loss=0.4048, over 17344.00 frames. ], tot_loss[loss=0.2547, ctc_loss=0.1776, cr_loss=0.3856, over 3339929.09 frames. ], batch size: 52, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:42:27,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=169512.0, ans=0.125 2024-09-23 03:42:29,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=169512.0, ans=0.125 2024-09-23 03:42:31,957 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.271e+02 1.375e+02 1.546e+02 2.384e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 03:42:45,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-09-23 03:42:49,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=169558.66666666666, ans=0.1 2024-09-23 03:42:55,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169605.33333333334, ans=0.1 2024-09-23 03:43:12,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2024-09-23 03:43:26,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=169652.0, ans=0.0 2024-09-23 03:43:32,877 INFO [train.py:1198] (0/4) Epoch 10, batch 1300, loss[loss=0.2403, ctc_loss=0.1666, cr_loss=0.3684, over 17311.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1768, cr_loss=0.3835, over 3349248.70 frames. ], batch size: 46, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:44:03,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=169792.0, ans=0.1 2024-09-23 03:44:20,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=169838.66666666666, ans=0.125 2024-09-23 03:44:20,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=169838.66666666666, ans=0.0 2024-09-23 03:44:33,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=169838.66666666666, ans=0.125 2024-09-23 03:44:33,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.08 vs. limit=10.0 2024-09-23 03:44:43,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=169885.33333333334, ans=0.1 2024-09-23 03:44:46,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.65 vs. limit=22.5 2024-09-23 03:44:52,274 INFO [train.py:1198] (0/4) Epoch 10, batch 1350, loss[loss=0.2672, ctc_loss=0.1859, cr_loss=0.4061, over 17012.00 frames. ], tot_loss[loss=0.2529, ctc_loss=0.176, cr_loss=0.3847, over 3358602.11 frames. ], batch size: 44, lr: 1.22e-02, grad_scale: 32.0 2024-09-23 03:45:00,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=169932.0, ans=0.0 2024-09-23 03:45:11,891 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:45:15,183 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:45:16,273 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.346e+02 1.479e+02 1.695e+02 2.554e+02, threshold=2.958e+02, percent-clipped=0.0 2024-09-23 03:45:27,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=170025.33333333334, ans=0.035 2024-09-23 03:45:49,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=170072.0, ans=0.2 2024-09-23 03:45:52,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170072.0, ans=0.1 2024-09-23 03:46:05,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=170118.66666666666, ans=0.125 2024-09-23 03:46:13,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=170165.33333333334, ans=0.0 2024-09-23 03:46:14,377 INFO [train.py:1198] (0/4) Epoch 10, batch 1400, loss[loss=0.2559, ctc_loss=0.1785, cr_loss=0.3872, over 16854.00 frames. ], tot_loss[loss=0.2542, ctc_loss=0.1769, cr_loss=0.3861, over 3353395.45 frames. ], batch size: 58, lr: 1.22e-02, grad_scale: 16.0 2024-09-23 03:46:27,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=170165.33333333334, ans=0.125 2024-09-23 03:46:28,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=170212.0, ans=0.0 2024-09-23 03:46:41,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=170212.0, ans=10.0 2024-09-23 03:46:49,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=170258.66666666666, ans=0.125 2024-09-23 03:47:10,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=170305.33333333334, ans=0.125 2024-09-23 03:47:20,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=170352.0, ans=0.125 2024-09-23 03:47:36,190 INFO [train.py:1198] (0/4) Epoch 10, batch 1450, loss[loss=0.2781, ctc_loss=0.1945, cr_loss=0.4178, over 17052.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.176, cr_loss=0.3847, over 3360903.25 frames. ], batch size: 56, lr: 1.22e-02, grad_scale: 16.0 2024-09-23 03:47:36,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=170398.66666666666, ans=0.0 2024-09-23 03:47:38,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=170398.66666666666, ans=0.0 2024-09-23 03:47:41,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=170398.66666666666, ans=0.025 2024-09-23 03:48:04,178 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.358e+02 1.487e+02 1.729e+02 2.482e+02, threshold=2.974e+02, percent-clipped=0.0 2024-09-23 03:48:04,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170445.33333333334, ans=0.1 2024-09-23 03:48:49,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=170585.33333333334, ans=0.125 2024-09-23 03:48:53,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=170585.33333333334, ans=0.2 2024-09-23 03:49:00,804 INFO [train.py:1198] (0/4) Epoch 10, batch 1500, loss[loss=0.2745, ctc_loss=0.1933, cr_loss=0.406, over 16481.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1766, cr_loss=0.3855, over 3355953.89 frames. ], batch size: 66, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:49:37,848 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:49:45,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=170725.33333333334, ans=0.125 2024-09-23 03:50:00,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=170772.0, ans=0.0 2024-09-23 03:50:22,864 INFO [train.py:1198] (0/4) Epoch 10, batch 1550, loss[loss=0.3188, ctc_loss=0.2442, cr_loss=0.3729, over 11546.00 frames. ], tot_loss[loss=0.2527, ctc_loss=0.1759, cr_loss=0.3839, over 3356849.69 frames. ], batch size: 123, lr: 1.21e-02, grad_scale: 8.0 2024-09-23 03:50:46,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=170912.0, ans=0.125 2024-09-23 03:50:49,523 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.376e+02 1.480e+02 1.669e+02 2.824e+02, threshold=2.960e+02, percent-clipped=0.0 2024-09-23 03:50:54,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=170958.66666666666, ans=0.0 2024-09-23 03:50:57,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=8.0 2024-09-23 03:51:01,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2024-09-23 03:51:10,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=171005.33333333334, ans=0.1 2024-09-23 03:51:12,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=171005.33333333334, ans=0.125 2024-09-23 03:51:40,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=171098.66666666666, ans=0.125 2024-09-23 03:51:40,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=171098.66666666666, ans=0.1 2024-09-23 03:51:41,966 INFO [train.py:1198] (0/4) Epoch 10, batch 1600, loss[loss=0.2584, ctc_loss=0.1817, cr_loss=0.3833, over 16999.00 frames. ], tot_loss[loss=0.2526, ctc_loss=0.1759, cr_loss=0.3836, over 3354023.68 frames. ], batch size: 53, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:51:43,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=171098.66666666666, ans=0.0 2024-09-23 03:52:00,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=171145.33333333334, ans=0.1 2024-09-23 03:52:02,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=171145.33333333334, ans=0.0 2024-09-23 03:52:05,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=171145.33333333334, ans=0.125 2024-09-23 03:52:23,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=171192.0, ans=0.125 2024-09-23 03:52:51,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=171285.33333333334, ans=0.125 2024-09-23 03:53:06,735 INFO [train.py:1198] (0/4) Epoch 10, batch 1650, loss[loss=0.2439, ctc_loss=0.1696, cr_loss=0.3715, over 17149.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1761, cr_loss=0.3836, over 3350185.62 frames. ], batch size: 48, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:53:11,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=171332.0, ans=0.0 2024-09-23 03:53:17,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171332.0, ans=0.1 2024-09-23 03:53:32,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=171378.66666666666, ans=0.0 2024-09-23 03:53:36,437 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.322e+02 1.432e+02 1.629e+02 2.855e+02, threshold=2.864e+02, percent-clipped=0.0 2024-09-23 03:53:39,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=171425.33333333334, ans=0.2 2024-09-23 03:53:59,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=171472.0, ans=0.0 2024-09-23 03:54:29,210 INFO [train.py:1198] (0/4) Epoch 10, batch 1700, loss[loss=0.2316, ctc_loss=0.162, cr_loss=0.3479, over 16960.00 frames. ], tot_loss[loss=0.2519, ctc_loss=0.1752, cr_loss=0.3832, over 3360673.30 frames. ], batch size: 42, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:55:01,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=171658.66666666666, ans=0.2 2024-09-23 03:55:03,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=12.0 2024-09-23 03:55:25,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=171705.33333333334, ans=0.5 2024-09-23 03:55:45,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=171752.0, ans=0.125 2024-09-23 03:55:48,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=171752.0, ans=0.1 2024-09-23 03:55:50,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-09-23 03:55:51,291 INFO [train.py:1198] (0/4) Epoch 10, batch 1750, loss[loss=0.2196, ctc_loss=0.1521, cr_loss=0.3377, over 17214.00 frames. ], tot_loss[loss=0.2521, ctc_loss=0.1754, cr_loss=0.3832, over 3362886.15 frames. ], batch size: 41, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:56:01,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=171798.66666666666, ans=0.0 2024-09-23 03:56:09,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=171845.33333333334, ans=0.0 2024-09-23 03:56:18,404 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.367e+02 1.590e+02 1.877e+02 2.998e+02, threshold=3.180e+02, percent-clipped=1.0 2024-09-23 03:56:22,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=171892.0, ans=0.0 2024-09-23 03:56:23,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=171892.0, ans=0.0 2024-09-23 03:56:37,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=171938.66666666666, ans=0.0 2024-09-23 03:56:46,871 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 03:56:51,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=171938.66666666666, ans=0.2 2024-09-23 03:57:02,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=171985.33333333334, ans=0.125 2024-09-23 03:57:08,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=171985.33333333334, ans=0.025 2024-09-23 03:57:13,329 INFO [train.py:1198] (0/4) Epoch 10, batch 1800, loss[loss=0.2164, ctc_loss=0.1506, cr_loss=0.3289, over 17008.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1757, cr_loss=0.3834, over 3348116.40 frames. ], batch size: 44, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:57:16,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=172032.0, ans=0.125 2024-09-23 03:57:29,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=172078.66666666666, ans=0.125 2024-09-23 03:57:50,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-09-23 03:58:31,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=172218.66666666666, ans=0.125 2024-09-23 03:58:34,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=172218.66666666666, ans=10.0 2024-09-23 03:58:37,574 INFO [train.py:1198] (0/4) Epoch 10, batch 1850, loss[loss=0.2339, ctc_loss=0.16, cr_loss=0.37, over 17029.00 frames. ], tot_loss[loss=0.2536, ctc_loss=0.1766, cr_loss=0.3851, over 3349578.87 frames. ], batch size: 39, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:59:04,289 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.347e+02 1.457e+02 1.655e+02 2.749e+02, threshold=2.915e+02, percent-clipped=0.0 2024-09-23 03:59:35,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=172405.33333333334, ans=0.2 2024-09-23 03:59:49,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=172452.0, ans=0.2 2024-09-23 03:59:50,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=172452.0, ans=0.2 2024-09-23 03:59:57,223 INFO [train.py:1198] (0/4) Epoch 10, batch 1900, loss[loss=0.2899, ctc_loss=0.2051, cr_loss=0.424, over 16504.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1765, cr_loss=0.3852, over 3349347.11 frames. ], batch size: 66, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 03:59:59,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172498.66666666666, ans=0.1 2024-09-23 03:59:59,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.51 vs. limit=22.5 2024-09-23 04:00:14,645 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:01:02,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=172685.33333333334, ans=0.125 2024-09-23 04:01:09,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2024-09-23 04:01:18,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=172732.0, ans=0.125 2024-09-23 04:01:19,644 INFO [train.py:1198] (0/4) Epoch 10, batch 1950, loss[loss=0.2436, ctc_loss=0.1674, cr_loss=0.3807, over 17168.00 frames. ], tot_loss[loss=0.253, ctc_loss=0.1761, cr_loss=0.3846, over 3352738.50 frames. ], batch size: 45, lr: 1.21e-02, grad_scale: 16.0 2024-09-23 04:01:42,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172778.66666666666, ans=0.1 2024-09-23 04:01:47,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=172778.66666666666, ans=0.125 2024-09-23 04:01:47,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=172778.66666666666, ans=0.1 2024-09-23 04:01:48,821 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.405e+02 1.574e+02 1.755e+02 2.409e+02, threshold=3.148e+02, percent-clipped=0.0 2024-09-23 04:01:52,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=172825.33333333334, ans=0.125 2024-09-23 04:01:57,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172825.33333333334, ans=0.1 2024-09-23 04:02:27,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=172918.66666666666, ans=0.125 2024-09-23 04:02:40,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=172965.33333333334, ans=0.025 2024-09-23 04:02:43,967 INFO [train.py:1198] (0/4) Epoch 10, batch 2000, loss[loss=0.2858, ctc_loss=0.197, cr_loss=0.4441, over 15110.00 frames. ], tot_loss[loss=0.2543, ctc_loss=0.177, cr_loss=0.3865, over 3347210.93 frames. ], batch size: 89, lr: 1.21e-02, grad_scale: 32.0 2024-09-23 04:03:23,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=173058.66666666666, ans=0.125 2024-09-23 04:03:24,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173058.66666666666, ans=0.1 2024-09-23 04:03:39,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=173105.33333333334, ans=0.025 2024-09-23 04:03:41,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2024-09-23 04:04:06,413 INFO [train.py:1198] (0/4) Epoch 10, batch 2050, loss[loss=0.215, ctc_loss=0.1459, cr_loss=0.3457, over 17155.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1763, cr_loss=0.3846, over 3336109.00 frames. ], batch size: 41, lr: 1.21e-02, grad_scale: 32.0 2024-09-23 04:04:12,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=173198.66666666666, ans=0.0 2024-09-23 04:04:14,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=173198.66666666666, ans=0.125 2024-09-23 04:04:34,982 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.316e+02 1.437e+02 1.596e+02 3.726e+02, threshold=2.874e+02, percent-clipped=1.0 2024-09-23 04:04:43,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173292.0, ans=0.1 2024-09-23 04:05:28,497 INFO [train.py:1198] (0/4) Epoch 10, batch 2100, loss[loss=0.2269, ctc_loss=0.1529, cr_loss=0.3703, over 17149.00 frames. ], tot_loss[loss=0.2533, ctc_loss=0.1763, cr_loss=0.3852, over 3347353.63 frames. ], batch size: 48, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:05:30,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=173432.0, ans=0.125 2024-09-23 04:05:31,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=173432.0, ans=0.2 2024-09-23 04:05:41,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=173432.0, ans=0.0 2024-09-23 04:05:55,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=173478.66666666666, ans=0.125 2024-09-23 04:06:11,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-09-23 04:06:48,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=173665.33333333334, ans=0.125 2024-09-23 04:06:49,963 INFO [train.py:1198] (0/4) Epoch 10, batch 2150, loss[loss=0.2556, ctc_loss=0.1761, cr_loss=0.3975, over 17038.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1769, cr_loss=0.3857, over 3342608.36 frames. ], batch size: 52, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:06:50,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=173665.33333333334, ans=0.125 2024-09-23 04:06:53,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=173665.33333333334, ans=0.2 2024-09-23 04:06:58,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=173665.33333333334, ans=0.0 2024-09-23 04:07:02,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=173665.33333333334, ans=0.125 2024-09-23 04:07:18,784 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.317e+02 1.458e+02 1.666e+02 3.135e+02, threshold=2.917e+02, percent-clipped=1.0 2024-09-23 04:08:14,680 INFO [train.py:1198] (0/4) Epoch 10, batch 2200, loss[loss=0.2077, ctc_loss=0.144, cr_loss=0.3184, over 17169.00 frames. ], tot_loss[loss=0.2537, ctc_loss=0.1767, cr_loss=0.3852, over 3341724.74 frames. ], batch size: 41, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:08:14,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=173898.66666666666, ans=0.0 2024-09-23 04:08:26,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=173898.66666666666, ans=0.09899494936611666 2024-09-23 04:08:45,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=173992.0, ans=0.2 2024-09-23 04:08:48,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=173992.0, ans=0.2 2024-09-23 04:09:10,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=174038.66666666666, ans=0.95 2024-09-23 04:09:26,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174085.33333333334, ans=0.1 2024-09-23 04:09:31,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=174085.33333333334, ans=0.125 2024-09-23 04:09:34,448 INFO [train.py:1198] (0/4) Epoch 10, batch 2250, loss[loss=0.2195, ctc_loss=0.1496, cr_loss=0.3497, over 17077.00 frames. ], tot_loss[loss=0.2532, ctc_loss=0.1763, cr_loss=0.3845, over 3348613.61 frames. ], batch size: 39, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:09:36,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=174132.0, ans=0.025 2024-09-23 04:09:47,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=174132.0, ans=0.05 2024-09-23 04:09:50,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=174178.66666666666, ans=0.125 2024-09-23 04:09:53,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=174178.66666666666, ans=0.07 2024-09-23 04:09:59,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=174178.66666666666, ans=0.0 2024-09-23 04:10:05,825 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.382e+02 1.475e+02 1.727e+02 2.787e+02, threshold=2.949e+02, percent-clipped=0.0 2024-09-23 04:10:33,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174272.0, ans=0.1 2024-09-23 04:10:41,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174318.66666666666, ans=0.1 2024-09-23 04:10:54,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-09-23 04:10:56,694 INFO [train.py:1198] (0/4) Epoch 10, batch 2300, loss[loss=0.269, ctc_loss=0.1856, cr_loss=0.4171, over 17224.00 frames. ], tot_loss[loss=0.2535, ctc_loss=0.1764, cr_loss=0.3856, over 3355619.46 frames. ], batch size: 50, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:11:20,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=174412.0, ans=0.05 2024-09-23 04:12:06,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=174552.0, ans=0.0 2024-09-23 04:12:10,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2024-09-23 04:12:19,123 INFO [train.py:1198] (0/4) Epoch 10, batch 2350, loss[loss=0.292, ctc_loss=0.2066, cr_loss=0.427, over 17016.00 frames. ], tot_loss[loss=0.2544, ctc_loss=0.177, cr_loss=0.387, over 3346577.08 frames. ], batch size: 53, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:12:21,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=174598.66666666666, ans=0.1 2024-09-23 04:12:37,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=22.5 2024-09-23 04:12:44,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=174645.33333333334, ans=0.125 2024-09-23 04:12:47,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=174645.33333333334, ans=0.0 2024-09-23 04:12:50,300 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.308e+02 1.414e+02 1.626e+02 2.428e+02, threshold=2.828e+02, percent-clipped=0.0 2024-09-23 04:13:38,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=174785.33333333334, ans=0.125 2024-09-23 04:13:43,456 INFO [train.py:1198] (0/4) Epoch 10, batch 2400, loss[loss=0.2609, ctc_loss=0.184, cr_loss=0.3846, over 16684.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.1756, cr_loss=0.3856, over 3360201.96 frames. ], batch size: 66, lr: 1.20e-02, grad_scale: 32.0 2024-09-23 04:13:54,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=174832.0, ans=0.025 2024-09-23 04:14:01,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=174878.66666666666, ans=0.125 2024-09-23 04:14:06,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=174878.66666666666, ans=0.2 2024-09-23 04:14:16,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.10 vs. limit=10.0 2024-09-23 04:14:51,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=175018.66666666666, ans=0.125 2024-09-23 04:15:06,054 INFO [train.py:1198] (0/4) Epoch 10, batch 2450, loss[loss=0.2712, ctc_loss=0.1897, cr_loss=0.4078, over 16786.00 frames. ], tot_loss[loss=0.252, ctc_loss=0.175, cr_loss=0.385, over 3363585.76 frames. ], batch size: 61, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:15:17,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=175065.33333333334, ans=0.125 2024-09-23 04:15:20,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=175112.0, ans=0.125 2024-09-23 04:15:36,388 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.295e+02 1.392e+02 1.667e+02 2.800e+02, threshold=2.783e+02, percent-clipped=0.0 2024-09-23 04:15:38,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=175158.66666666666, ans=0.125 2024-09-23 04:15:52,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=175205.33333333334, ans=0.0 2024-09-23 04:16:25,985 INFO [train.py:1198] (0/4) Epoch 10, batch 2500, loss[loss=0.2221, ctc_loss=0.1537, cr_loss=0.3415, over 16957.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1736, cr_loss=0.3829, over 3363844.08 frames. ], batch size: 42, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:16:29,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=175298.66666666666, ans=0.0 2024-09-23 04:16:35,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=175298.66666666666, ans=0.0 2024-09-23 04:16:36,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=175298.66666666666, ans=0.125 2024-09-23 04:17:14,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.13 vs. limit=10.0 2024-09-23 04:17:32,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=175438.66666666666, ans=0.05 2024-09-23 04:17:50,878 INFO [train.py:1198] (0/4) Epoch 10, batch 2550, loss[loss=0.2685, ctc_loss=0.1829, cr_loss=0.4281, over 17065.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1726, cr_loss=0.3814, over 3369262.16 frames. ], batch size: 46, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:18:08,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=175578.66666666666, ans=0.2 2024-09-23 04:18:17,569 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:18:23,452 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.321e+02 1.441e+02 1.691e+02 2.698e+02, threshold=2.882e+02, percent-clipped=0.0 2024-09-23 04:18:28,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=175625.33333333334, ans=10.0 2024-09-23 04:18:30,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175625.33333333334, ans=0.1 2024-09-23 04:18:46,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=12.0 2024-09-23 04:19:08,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=175718.66666666666, ans=0.125 2024-09-23 04:19:12,532 INFO [train.py:1198] (0/4) Epoch 10, batch 2600, loss[loss=0.2704, ctc_loss=0.1878, cr_loss=0.4129, over 17210.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1742, cr_loss=0.3838, over 3367790.91 frames. ], batch size: 47, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:19:50,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-09-23 04:20:09,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=175905.33333333334, ans=0.125 2024-09-23 04:20:23,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=175952.0, ans=0.125 2024-09-23 04:20:30,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=175952.0, ans=0.0 2024-09-23 04:20:34,898 INFO [train.py:1198] (0/4) Epoch 10, batch 2650, loss[loss=0.2664, ctc_loss=0.18, cr_loss=0.4316, over 17300.00 frames. ], tot_loss[loss=0.2517, ctc_loss=0.1748, cr_loss=0.3847, over 3375263.99 frames. ], batch size: 49, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:20:46,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=175998.66666666666, ans=0.0 2024-09-23 04:21:01,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=176045.33333333334, ans=15.0 2024-09-23 04:21:05,299 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.318e+02 1.384e+02 1.550e+02 2.652e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-23 04:21:21,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=176138.66666666666, ans=0.04949747468305833 2024-09-23 04:21:57,391 INFO [train.py:1198] (0/4) Epoch 10, batch 2700, loss[loss=0.2222, ctc_loss=0.1528, cr_loss=0.3467, over 17163.00 frames. ], tot_loss[loss=0.2512, ctc_loss=0.1744, cr_loss=0.384, over 3371593.11 frames. ], batch size: 45, lr: 1.20e-02, grad_scale: 16.0 2024-09-23 04:22:03,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2024-09-23 04:22:36,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-09-23 04:23:14,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.14 vs. limit=22.5 2024-09-23 04:23:20,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-09-23 04:23:22,733 INFO [train.py:1198] (0/4) Epoch 10, batch 2750, loss[loss=0.2128, ctc_loss=0.1445, cr_loss=0.3413, over 17262.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.174, cr_loss=0.3832, over 3369077.25 frames. ], batch size: 44, lr: 1.19e-02, grad_scale: 16.0 2024-09-23 04:23:23,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2024-09-23 04:23:24,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=176465.33333333334, ans=10.0 2024-09-23 04:23:35,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=176465.33333333334, ans=22.5 2024-09-23 04:23:48,682 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:23:53,132 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.357e+02 1.501e+02 1.751e+02 2.551e+02, threshold=3.001e+02, percent-clipped=0.0 2024-09-23 04:24:10,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=176605.33333333334, ans=0.125 2024-09-23 04:24:31,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2024-09-23 04:24:34,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=176652.0, ans=0.0 2024-09-23 04:24:37,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=176652.0, ans=0.125 2024-09-23 04:24:42,410 INFO [train.py:1198] (0/4) Epoch 10, batch 2800, loss[loss=0.2156, ctc_loss=0.148, cr_loss=0.3378, over 17097.00 frames. ], tot_loss[loss=0.2507, ctc_loss=0.1741, cr_loss=0.383, over 3366856.87 frames. ], batch size: 43, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:24:43,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.90 vs. limit=22.5 2024-09-23 04:24:53,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=176698.66666666666, ans=0.0 2024-09-23 04:24:58,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=176698.66666666666, ans=0.125 2024-09-23 04:25:17,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=176792.0, ans=0.0 2024-09-23 04:25:20,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=176792.0, ans=0.0 2024-09-23 04:25:30,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=176792.0, ans=0.125 2024-09-23 04:25:33,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=176838.66666666666, ans=0.125 2024-09-23 04:26:05,071 INFO [train.py:1198] (0/4) Epoch 10, batch 2850, loss[loss=0.2867, ctc_loss=0.199, cr_loss=0.4387, over 17239.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1738, cr_loss=0.3826, over 3369083.93 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:26:08,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=176932.0, ans=0.2 2024-09-23 04:26:19,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176978.66666666666, ans=0.1 2024-09-23 04:26:26,194 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:26:38,109 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.308e+02 1.482e+02 1.673e+02 2.195e+02, threshold=2.964e+02, percent-clipped=0.0 2024-09-23 04:26:52,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2024-09-23 04:26:59,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=177072.0, ans=0.125 2024-09-23 04:27:05,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177072.0, ans=0.1 2024-09-23 04:27:30,214 INFO [train.py:1198] (0/4) Epoch 10, batch 2900, loss[loss=0.2909, ctc_loss=0.2091, cr_loss=0.4089, over 16713.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1745, cr_loss=0.3833, over 3363329.43 frames. ], batch size: 61, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:27:36,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=177165.33333333334, ans=0.0 2024-09-23 04:27:54,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-09-23 04:28:40,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=177352.0, ans=0.125 2024-09-23 04:28:52,969 INFO [train.py:1198] (0/4) Epoch 10, batch 2950, loss[loss=0.2432, ctc_loss=0.1661, cr_loss=0.3855, over 17027.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1737, cr_loss=0.3831, over 3369478.11 frames. ], batch size: 51, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:28:53,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177398.66666666666, ans=0.1 2024-09-23 04:29:21,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=177445.33333333334, ans=0.125 2024-09-23 04:29:23,162 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.333e+02 1.430e+02 1.579e+02 2.314e+02, threshold=2.860e+02, percent-clipped=0.0 2024-09-23 04:29:32,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=177492.0, ans=0.125 2024-09-23 04:29:51,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=177538.66666666666, ans=0.125 2024-09-23 04:29:57,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=177585.33333333334, ans=0.2 2024-09-23 04:30:14,433 INFO [train.py:1198] (0/4) Epoch 10, batch 3000, loss[loss=0.2405, ctc_loss=0.1633, cr_loss=0.3861, over 17025.00 frames. ], tot_loss[loss=0.2515, ctc_loss=0.1748, cr_loss=0.3835, over 3354840.10 frames. ], batch size: 44, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:30:14,434 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 04:30:26,291 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.2139, 4.1864, 3.9716, 4.2461], device='cuda:0') 2024-09-23 04:30:30,502 INFO [train.py:1230] (0/4) Epoch 10, validation: loss=0.04843, ctc_loss=0.04843, cr_loss=7.942e-15, over 944034.00 frames. 2024-09-23 04:30:30,502 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 04:30:38,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=177632.0, ans=0.125 2024-09-23 04:30:46,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=177678.66666666666, ans=0.125 2024-09-23 04:30:52,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=177678.66666666666, ans=0.125 2024-09-23 04:31:12,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=177725.33333333334, ans=0.125 2024-09-23 04:31:16,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.81 vs. limit=10.0 2024-09-23 04:31:20,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=177772.0, ans=0.125 2024-09-23 04:31:38,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.42 vs. limit=15.0 2024-09-23 04:31:39,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=177818.66666666666, ans=0.0 2024-09-23 04:31:40,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177818.66666666666, ans=0.1 2024-09-23 04:31:48,171 INFO [train.py:1198] (0/4) Epoch 10, batch 3050, loss[loss=0.2388, ctc_loss=0.1644, cr_loss=0.372, over 17097.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1738, cr_loss=0.3825, over 3360967.66 frames. ], batch size: 49, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:31:56,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=177865.33333333334, ans=0.2 2024-09-23 04:31:59,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=177865.33333333334, ans=0.07 2024-09-23 04:32:04,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=177912.0, ans=10.0 2024-09-23 04:32:05,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177912.0, ans=0.1 2024-09-23 04:32:11,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=177912.0, ans=0.125 2024-09-23 04:32:17,671 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.279e+02 1.408e+02 1.568e+02 2.746e+02, threshold=2.816e+02, percent-clipped=0.0 2024-09-23 04:32:29,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-09-23 04:32:34,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2024-09-23 04:32:37,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2024-09-23 04:33:00,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=178052.0, ans=0.125 2024-09-23 04:33:06,426 INFO [train.py:1198] (0/4) Epoch 10, batch 3100, loss[loss=0.2205, ctc_loss=0.149, cr_loss=0.3574, over 16747.00 frames. ], tot_loss[loss=0.2504, ctc_loss=0.1738, cr_loss=0.3828, over 3364092.52 frames. ], batch size: 37, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:33:11,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=178098.66666666666, ans=0.2 2024-09-23 04:33:18,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=178098.66666666666, ans=0.125 2024-09-23 04:33:23,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2024-09-23 04:33:34,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=178145.33333333334, ans=0.125 2024-09-23 04:33:34,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178145.33333333334, ans=0.1 2024-09-23 04:33:40,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=178192.0, ans=0.025 2024-09-23 04:34:16,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=178285.33333333334, ans=0.0 2024-09-23 04:34:27,443 INFO [train.py:1198] (0/4) Epoch 10, batch 3150, loss[loss=0.2127, ctc_loss=0.1448, cr_loss=0.3395, over 17214.00 frames. ], tot_loss[loss=0.25, ctc_loss=0.1737, cr_loss=0.3819, over 3364173.28 frames. ], batch size: 41, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:34:29,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=178332.0, ans=0.0 2024-09-23 04:34:47,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178378.66666666666, ans=0.1 2024-09-23 04:34:56,601 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.289e+02 1.420e+02 1.589e+02 2.247e+02, threshold=2.840e+02, percent-clipped=0.0 2024-09-23 04:35:16,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=178472.0, ans=0.125 2024-09-23 04:35:30,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=178518.66666666666, ans=0.125 2024-09-23 04:35:49,622 INFO [train.py:1198] (0/4) Epoch 10, batch 3200, loss[loss=0.241, ctc_loss=0.1657, cr_loss=0.3765, over 17220.00 frames. ], tot_loss[loss=0.2511, ctc_loss=0.1745, cr_loss=0.383, over 3359001.59 frames. ], batch size: 47, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:36:39,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=178705.33333333334, ans=0.2 2024-09-23 04:36:59,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=178752.0, ans=0.0 2024-09-23 04:37:01,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=178752.0, ans=0.125 2024-09-23 04:37:07,406 INFO [train.py:1198] (0/4) Epoch 10, batch 3250, loss[loss=0.2894, ctc_loss=0.2029, cr_loss=0.4327, over 17038.00 frames. ], tot_loss[loss=0.2514, ctc_loss=0.1748, cr_loss=0.3832, over 3362486.74 frames. ], batch size: 52, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:37:14,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=12.0 2024-09-23 04:37:34,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=178845.33333333334, ans=0.125 2024-09-23 04:37:37,059 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.295e+02 1.395e+02 1.531e+02 3.942e+02, threshold=2.791e+02, percent-clipped=1.0 2024-09-23 04:37:51,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-09-23 04:37:56,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=178938.66666666666, ans=0.2 2024-09-23 04:37:56,639 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2024-09-23 04:38:25,404 INFO [train.py:1198] (0/4) Epoch 10, batch 3300, loss[loss=0.2188, ctc_loss=0.1498, cr_loss=0.3453, over 16957.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1738, cr_loss=0.3822, over 3364884.38 frames. ], batch size: 42, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:38:30,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=179032.0, ans=0.05 2024-09-23 04:38:39,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=179078.66666666666, ans=0.1 2024-09-23 04:38:45,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=179078.66666666666, ans=0.2 2024-09-23 04:38:48,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179078.66666666666, ans=0.1 2024-09-23 04:38:48,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=179078.66666666666, ans=0.0 2024-09-23 04:38:49,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-09-23 04:38:56,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=179125.33333333334, ans=0.2 2024-09-23 04:39:04,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=22.5 2024-09-23 04:39:04,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.84 vs. limit=10.0 2024-09-23 04:39:07,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=179125.33333333334, ans=0.125 2024-09-23 04:39:23,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=22.5 2024-09-23 04:39:27,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179218.66666666666, ans=0.1 2024-09-23 04:39:29,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=179218.66666666666, ans=0.125 2024-09-23 04:39:34,622 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:39:37,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=179218.66666666666, ans=0.0 2024-09-23 04:39:40,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=179218.66666666666, ans=0.125 2024-09-23 04:39:43,188 INFO [train.py:1198] (0/4) Epoch 10, batch 3350, loss[loss=0.2501, ctc_loss=0.1743, cr_loss=0.3788, over 17003.00 frames. ], tot_loss[loss=0.2501, ctc_loss=0.1739, cr_loss=0.3814, over 3364162.74 frames. ], batch size: 51, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:40:15,064 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.385e+02 1.553e+02 1.795e+02 2.936e+02, threshold=3.106e+02, percent-clipped=1.0 2024-09-23 04:40:20,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=22.5 2024-09-23 04:40:35,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=179405.33333333334, ans=0.125 2024-09-23 04:40:37,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2024-09-23 04:40:41,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=22.5 2024-09-23 04:40:47,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=179452.0, ans=0.125 2024-09-23 04:41:00,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=179452.0, ans=0.0 2024-09-23 04:41:03,099 INFO [train.py:1198] (0/4) Epoch 10, batch 3400, loss[loss=0.2351, ctc_loss=0.164, cr_loss=0.3554, over 16323.00 frames. ], tot_loss[loss=0.2506, ctc_loss=0.1743, cr_loss=0.3811, over 3369204.56 frames. ], batch size: 36, lr: 1.19e-02, grad_scale: 32.0 2024-09-23 04:41:14,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=179498.66666666666, ans=0.125 2024-09-23 04:41:37,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179592.0, ans=0.0 2024-09-23 04:41:39,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=179592.0, ans=0.1 2024-09-23 04:42:03,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=179638.66666666666, ans=15.0 2024-09-23 04:42:21,370 INFO [train.py:1198] (0/4) Epoch 10, batch 3450, loss[loss=0.2805, ctc_loss=0.1962, cr_loss=0.4217, over 17355.00 frames. ], tot_loss[loss=0.2516, ctc_loss=0.1752, cr_loss=0.3822, over 3346571.54 frames. ], batch size: 48, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:42:27,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=179732.0, ans=0.125 2024-09-23 04:42:50,603 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.316e+02 1.451e+02 1.743e+02 2.784e+02, threshold=2.902e+02, percent-clipped=0.0 2024-09-23 04:42:59,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2024-09-23 04:43:06,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=179872.0, ans=0.0 2024-09-23 04:43:16,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2024-09-23 04:43:37,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=179918.66666666666, ans=0.125 2024-09-23 04:43:40,113 INFO [train.py:1198] (0/4) Epoch 10, batch 3500, loss[loss=0.2454, ctc_loss=0.1709, cr_loss=0.3721, over 17256.00 frames. ], tot_loss[loss=0.2524, ctc_loss=0.1757, cr_loss=0.3833, over 3344759.35 frames. ], batch size: 44, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:44:13,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=12.0 2024-09-23 04:44:40,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-09-23 04:44:51,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=180152.0, ans=0.025 2024-09-23 04:45:02,141 INFO [train.py:1198] (0/4) Epoch 10, batch 3550, loss[loss=0.245, ctc_loss=0.1719, cr_loss=0.3654, over 17017.00 frames. ], tot_loss[loss=0.2528, ctc_loss=0.176, cr_loss=0.3839, over 3342429.09 frames. ], batch size: 51, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:45:10,430 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:45:18,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=180245.33333333334, ans=0.07 2024-09-23 04:45:32,191 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.310e+02 1.457e+02 1.783e+02 3.691e+02, threshold=2.913e+02, percent-clipped=1.0 2024-09-23 04:45:44,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=180292.0, ans=0.125 2024-09-23 04:45:48,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=180338.66666666666, ans=0.1 2024-09-23 04:45:49,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=180338.66666666666, ans=0.0 2024-09-23 04:46:04,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=180385.33333333334, ans=0.2 2024-09-23 04:46:16,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=180385.33333333334, ans=0.0 2024-09-23 04:46:20,830 INFO [train.py:1198] (0/4) Epoch 10, batch 3600, loss[loss=0.2367, ctc_loss=0.1629, cr_loss=0.3687, over 16442.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1741, cr_loss=0.3814, over 3338847.86 frames. ], batch size: 66, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:46:41,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.68 vs. limit=6.0 2024-09-23 04:46:44,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=180478.66666666666, ans=0.05 2024-09-23 04:46:58,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=180525.33333333334, ans=0.0 2024-09-23 04:47:22,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=180618.66666666666, ans=0.2 2024-09-23 04:47:31,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=180618.66666666666, ans=0.125 2024-09-23 04:47:34,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=180618.66666666666, ans=0.125 2024-09-23 04:47:35,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=180618.66666666666, ans=0.125 2024-09-23 04:47:38,886 INFO [train.py:1198] (0/4) Epoch 10, batch 3650, loss[loss=0.2437, ctc_loss=0.1708, cr_loss=0.3646, over 17248.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1746, cr_loss=0.3823, over 3336407.18 frames. ], batch size: 42, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:47:42,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=180665.33333333334, ans=0.125 2024-09-23 04:48:08,242 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.307e+02 1.399e+02 1.525e+02 2.249e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-23 04:48:16,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-09-23 04:48:49,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=180852.0, ans=0.125 2024-09-23 04:48:57,315 INFO [train.py:1198] (0/4) Epoch 10, batch 3700, loss[loss=0.2711, ctc_loss=0.1883, cr_loss=0.4139, over 17218.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1744, cr_loss=0.383, over 3347827.56 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:49:14,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=180945.33333333334, ans=0.125 2024-09-23 04:49:33,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=180992.0, ans=0.125 2024-09-23 04:49:40,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=180992.0, ans=0.0 2024-09-23 04:50:16,593 INFO [train.py:1198] (0/4) Epoch 10, batch 3750, loss[loss=0.2427, ctc_loss=0.1666, cr_loss=0.3801, over 16971.00 frames. ], tot_loss[loss=0.2509, ctc_loss=0.1744, cr_loss=0.3828, over 3344583.84 frames. ], batch size: 42, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:50:42,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2024-09-23 04:50:46,085 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.346e+02 1.480e+02 1.709e+02 3.123e+02, threshold=2.960e+02, percent-clipped=1.0 2024-09-23 04:50:58,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=181225.33333333334, ans=0.0 2024-09-23 04:51:03,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=181272.0, ans=0.125 2024-09-23 04:51:23,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=181318.66666666666, ans=0.0 2024-09-23 04:51:28,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=181318.66666666666, ans=0.0 2024-09-23 04:51:28,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=181318.66666666666, ans=0.0 2024-09-23 04:51:34,560 INFO [train.py:1198] (0/4) Epoch 10, batch 3800, loss[loss=0.2399, ctc_loss=0.164, cr_loss=0.3796, over 17177.00 frames. ], tot_loss[loss=0.2488, ctc_loss=0.1726, cr_loss=0.381, over 3358531.57 frames. ], batch size: 41, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:51:41,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=181365.33333333334, ans=0.0 2024-09-23 04:51:53,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-09-23 04:51:58,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=181412.0, ans=0.04949747468305833 2024-09-23 04:52:08,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-09-23 04:52:30,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=181505.33333333334, ans=0.125 2024-09-23 04:52:53,255 INFO [train.py:1198] (0/4) Epoch 10, batch 3850, loss[loss=0.229, ctc_loss=0.157, cr_loss=0.36, over 16235.00 frames. ], tot_loss[loss=0.2482, ctc_loss=0.1722, cr_loss=0.38, over 3347726.38 frames. ], batch size: 36, lr: 1.18e-02, grad_scale: 32.0 2024-09-23 04:53:01,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2024-09-23 04:53:04,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=181598.66666666666, ans=0.07 2024-09-23 04:53:22,408 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.346e+02 1.514e+02 1.745e+02 2.888e+02, threshold=3.027e+02, percent-clipped=0.0 2024-09-23 04:53:22,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=181692.0, ans=0.1 2024-09-23 04:53:31,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=181692.0, ans=0.125 2024-09-23 04:53:41,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=181738.66666666666, ans=0.125 2024-09-23 04:53:47,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=181738.66666666666, ans=0.125 2024-09-23 04:53:55,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=22.5 2024-09-23 04:54:03,359 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-10.pt 2024-09-23 04:54:55,685 INFO [train.py:1198] (0/4) Epoch 11, batch 0, loss[loss=0.2618, ctc_loss=0.1847, cr_loss=0.3854, over 17056.00 frames. ], tot_loss[loss=0.2618, ctc_loss=0.1847, cr_loss=0.3854, over 17056.00 frames. ], batch size: 46, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 04:54:55,686 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 04:55:11,311 INFO [train.py:1230] (0/4) Epoch 11, validation: loss=0.04963, ctc_loss=0.04963, cr_loss=7.372e-15, over 944034.00 frames. 2024-09-23 04:55:11,311 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 04:55:41,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-09-23 04:55:44,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=181906.66666666666, ans=0.125 2024-09-23 04:55:52,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=181906.66666666666, ans=0.0 2024-09-23 04:55:54,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=181906.66666666666, ans=0.125 2024-09-23 04:56:02,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=181953.33333333334, ans=0.125 2024-09-23 04:56:29,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=182000.0, ans=0.025 2024-09-23 04:56:30,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.86 vs. limit=15.0 2024-09-23 04:56:34,325 INFO [train.py:1198] (0/4) Epoch 11, batch 50, loss[loss=0.2315, ctc_loss=0.1552, cr_loss=0.3818, over 17265.00 frames. ], tot_loss[loss=0.254, ctc_loss=0.1764, cr_loss=0.388, over 760357.33 frames. ], batch size: 42, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 04:56:36,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=182046.66666666666, ans=0.1 2024-09-23 04:56:45,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182046.66666666666, ans=0.1 2024-09-23 04:56:50,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=182093.33333333334, ans=0.125 2024-09-23 04:57:03,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=182093.33333333334, ans=0.125 2024-09-23 04:57:11,448 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:57:12,614 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.435e+02 1.595e+02 1.830e+02 2.692e+02, threshold=3.191e+02, percent-clipped=0.0 2024-09-23 04:57:17,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=182140.0, ans=0.0 2024-09-23 04:57:17,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=182140.0, ans=0.125 2024-09-23 04:57:35,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=182186.66666666666, ans=0.0 2024-09-23 04:57:38,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-23 04:57:47,897 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 04:57:53,901 INFO [train.py:1198] (0/4) Epoch 11, batch 100, loss[loss=0.2363, ctc_loss=0.1622, cr_loss=0.3707, over 17068.00 frames. ], tot_loss[loss=0.247, ctc_loss=0.1712, cr_loss=0.3791, over 1339141.03 frames. ], batch size: 46, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 04:58:16,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=182326.66666666666, ans=0.125 2024-09-23 04:58:19,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=182326.66666666666, ans=0.0 2024-09-23 04:58:29,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=182373.33333333334, ans=0.125 2024-09-23 04:58:37,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=22.5 2024-09-23 04:58:39,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=182373.33333333334, ans=0.0 2024-09-23 04:59:16,235 INFO [train.py:1198] (0/4) Epoch 11, batch 150, loss[loss=0.2063, ctc_loss=0.1413, cr_loss=0.3249, over 17038.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1698, cr_loss=0.3791, over 1793233.01 frames. ], batch size: 39, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 04:59:45,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=182560.0, ans=0.0 2024-09-23 04:59:50,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.90 vs. limit=10.0 2024-09-23 04:59:57,641 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.353e+02 1.511e+02 1.758e+02 2.701e+02, threshold=3.021e+02, percent-clipped=0.0 2024-09-23 05:00:21,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=22.5 2024-09-23 05:00:24,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=182700.0, ans=0.2 2024-09-23 05:00:30,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=182700.0, ans=0.125 2024-09-23 05:00:31,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.61 vs. limit=15.0 2024-09-23 05:00:32,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=182700.0, ans=0.125 2024-09-23 05:00:34,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=182700.0, ans=0.125 2024-09-23 05:00:41,714 INFO [train.py:1198] (0/4) Epoch 11, batch 200, loss[loss=0.2974, ctc_loss=0.2114, cr_loss=0.43, over 16833.00 frames. ], tot_loss[loss=0.251, ctc_loss=0.1741, cr_loss=0.3845, over 2128358.48 frames. ], batch size: 61, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 05:00:42,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=182746.66666666666, ans=0.125 2024-09-23 05:01:09,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=182793.33333333334, ans=0.0 2024-09-23 05:01:26,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.17 vs. limit=10.0 2024-09-23 05:01:28,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182886.66666666666, ans=0.1 2024-09-23 05:01:33,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=182886.66666666666, ans=0.125 2024-09-23 05:01:42,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-09-23 05:01:58,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=182933.33333333334, ans=0.07 2024-09-23 05:02:01,467 INFO [train.py:1198] (0/4) Epoch 11, batch 250, loss[loss=0.2516, ctc_loss=0.1797, cr_loss=0.3595, over 16912.00 frames. ], tot_loss[loss=0.2497, ctc_loss=0.1733, cr_loss=0.3822, over 2401040.47 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 05:02:06,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=182980.0, ans=0.025 2024-09-23 05:02:06,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=182980.0, ans=0.125 2024-09-23 05:02:09,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=182980.0, ans=0.0 2024-09-23 05:02:29,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=183026.66666666666, ans=0.125 2024-09-23 05:02:35,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=183073.33333333334, ans=0.125 2024-09-23 05:02:39,687 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.357e+02 1.521e+02 1.743e+02 2.579e+02, threshold=3.043e+02, percent-clipped=0.0 2024-09-23 05:03:20,852 INFO [train.py:1198] (0/4) Epoch 11, batch 300, loss[loss=0.2012, ctc_loss=0.1337, cr_loss=0.3375, over 15901.00 frames. ], tot_loss[loss=0.2503, ctc_loss=0.1736, cr_loss=0.3831, over 2599681.34 frames. ], batch size: 35, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 05:03:51,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=183260.0, ans=0.125 2024-09-23 05:04:01,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=183306.66666666666, ans=0.125 2024-09-23 05:04:25,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2024-09-23 05:04:27,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=183400.0, ans=0.125 2024-09-23 05:04:40,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=183400.0, ans=0.0 2024-09-23 05:04:43,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=183400.0, ans=0.0 2024-09-23 05:04:49,184 INFO [train.py:1198] (0/4) Epoch 11, batch 350, loss[loss=0.2559, ctc_loss=0.1747, cr_loss=0.4058, over 17301.00 frames. ], tot_loss[loss=0.2475, ctc_loss=0.1714, cr_loss=0.3803, over 2774358.76 frames. ], batch size: 46, lr: 1.12e-02, grad_scale: 16.0 2024-09-23 05:04:55,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2024-09-23 05:05:02,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=183446.66666666666, ans=0.2 2024-09-23 05:05:22,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=183540.0, ans=0.125 2024-09-23 05:05:30,065 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.290e+02 1.386e+02 1.605e+02 2.385e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-23 05:06:08,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=183633.33333333334, ans=0.025 2024-09-23 05:06:11,273 INFO [train.py:1198] (0/4) Epoch 11, batch 400, loss[loss=0.2453, ctc_loss=0.1751, cr_loss=0.3507, over 16224.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1712, cr_loss=0.3795, over 2903453.61 frames. ], batch size: 74, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:06:22,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=183680.0, ans=0.125 2024-09-23 05:07:07,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=183820.0, ans=0.125 2024-09-23 05:07:20,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=183866.66666666666, ans=0.0 2024-09-23 05:07:31,441 INFO [train.py:1198] (0/4) Epoch 11, batch 450, loss[loss=0.2695, ctc_loss=0.1972, cr_loss=0.3616, over 11841.00 frames. ], tot_loss[loss=0.2481, ctc_loss=0.1721, cr_loss=0.38, over 2993989.37 frames. ], batch size: 126, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:07:49,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=183960.0, ans=0.125 2024-09-23 05:07:55,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=183960.0, ans=0.125 2024-09-23 05:08:09,515 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.312e+02 1.426e+02 1.601e+02 2.161e+02, threshold=2.852e+02, percent-clipped=0.0 2024-09-23 05:08:17,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=184053.33333333334, ans=0.125 2024-09-23 05:08:25,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=184053.33333333334, ans=0.125 2024-09-23 05:08:53,365 INFO [train.py:1198] (0/4) Epoch 11, batch 500, loss[loss=0.225, ctc_loss=0.153, cr_loss=0.3597, over 17087.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1715, cr_loss=0.3796, over 3073004.76 frames. ], batch size: 43, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:09:05,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=184146.66666666666, ans=0.0 2024-09-23 05:09:08,314 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:09:52,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=184286.66666666666, ans=0.1 2024-09-23 05:10:22,037 INFO [train.py:1198] (0/4) Epoch 11, batch 550, loss[loss=0.2741, ctc_loss=0.1914, cr_loss=0.4135, over 17384.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1704, cr_loss=0.3786, over 3142157.21 frames. ], batch size: 48, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:10:43,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-23 05:10:47,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=184426.66666666666, ans=0.125 2024-09-23 05:10:49,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=184426.66666666666, ans=0.2 2024-09-23 05:11:00,280 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.305e+02 1.444e+02 1.636e+02 3.823e+02, threshold=2.888e+02, percent-clipped=1.0 2024-09-23 05:11:41,810 INFO [train.py:1198] (0/4) Epoch 11, batch 600, loss[loss=0.2901, ctc_loss=0.1998, cr_loss=0.4514, over 17094.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1702, cr_loss=0.3779, over 3192486.70 frames. ], batch size: 49, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:12:23,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=184706.66666666666, ans=0.04949747468305833 2024-09-23 05:12:34,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=184753.33333333334, ans=0.025 2024-09-23 05:12:44,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=184800.0, ans=0.0 2024-09-23 05:13:01,769 INFO [train.py:1198] (0/4) Epoch 11, batch 650, loss[loss=0.3009, ctc_loss=0.2126, cr_loss=0.4412, over 16993.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1696, cr_loss=0.3767, over 3232459.90 frames. ], batch size: 53, lr: 1.12e-02, grad_scale: 32.0 2024-09-23 05:13:11,585 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:13:15,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-09-23 05:13:20,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.07 vs. limit=22.5 2024-09-23 05:13:21,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=184893.33333333334, ans=0.125 2024-09-23 05:13:31,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=184893.33333333334, ans=0.2 2024-09-23 05:13:43,066 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.331e+02 1.445e+02 1.618e+02 2.572e+02, threshold=2.890e+02, percent-clipped=0.0 2024-09-23 05:14:16,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2024-09-23 05:14:20,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2024-09-23 05:14:27,583 INFO [train.py:1198] (0/4) Epoch 11, batch 700, loss[loss=0.2161, ctc_loss=0.1454, cr_loss=0.3534, over 17299.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1701, cr_loss=0.3777, over 3255578.28 frames. ], batch size: 46, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:14:27,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=185080.0, ans=0.09899494936611666 2024-09-23 05:14:52,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=185126.66666666666, ans=0.125 2024-09-23 05:15:34,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2024-09-23 05:15:39,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=185266.66666666666, ans=0.0 2024-09-23 05:15:41,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=185266.66666666666, ans=0.95 2024-09-23 05:15:50,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=185313.33333333334, ans=0.125 2024-09-23 05:15:52,063 INFO [train.py:1198] (0/4) Epoch 11, batch 750, loss[loss=0.1966, ctc_loss=0.1335, cr_loss=0.3154, over 17098.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1703, cr_loss=0.3789, over 3280200.95 frames. ], batch size: 43, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:15:52,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=185313.33333333334, ans=0.02 2024-09-23 05:15:57,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=185313.33333333334, ans=0.125 2024-09-23 05:16:16,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-09-23 05:16:18,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-09-23 05:16:30,319 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.321e+02 1.443e+02 1.699e+02 2.904e+02, threshold=2.886e+02, percent-clipped=1.0 2024-09-23 05:16:32,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=185406.66666666666, ans=0.125 2024-09-23 05:16:35,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=185406.66666666666, ans=12.0 2024-09-23 05:16:36,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=185406.66666666666, ans=0.125 2024-09-23 05:16:51,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2024-09-23 05:17:04,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=185500.0, ans=0.07 2024-09-23 05:17:11,521 INFO [train.py:1198] (0/4) Epoch 11, batch 800, loss[loss=0.2387, ctc_loss=0.166, cr_loss=0.3633, over 17311.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1696, cr_loss=0.3784, over 3311600.71 frames. ], batch size: 46, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:17:11,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=185546.66666666666, ans=0.0 2024-09-23 05:17:18,212 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:17:22,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=185546.66666666666, ans=0.05 2024-09-23 05:17:26,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=185593.33333333334, ans=0.125 2024-09-23 05:17:29,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=185593.33333333334, ans=0.125 2024-09-23 05:17:35,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.09 vs. limit=10.0 2024-09-23 05:18:07,476 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:18:31,204 INFO [train.py:1198] (0/4) Epoch 11, batch 850, loss[loss=0.2309, ctc_loss=0.1564, cr_loss=0.3721, over 17040.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1693, cr_loss=0.3779, over 3324613.06 frames. ], batch size: 39, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:18:46,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=185780.0, ans=0.05 2024-09-23 05:19:06,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=185873.33333333334, ans=0.125 2024-09-23 05:19:10,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=185873.33333333334, ans=0.125 2024-09-23 05:19:12,272 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.355e+02 1.504e+02 1.794e+02 2.469e+02, threshold=3.008e+02, percent-clipped=0.0 2024-09-23 05:19:29,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.10 vs. limit=10.0 2024-09-23 05:19:55,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=22.5 2024-09-23 05:19:59,415 INFO [train.py:1198] (0/4) Epoch 11, batch 900, loss[loss=0.267, ctc_loss=0.1898, cr_loss=0.3861, over 16479.00 frames. ], tot_loss[loss=0.2453, ctc_loss=0.1697, cr_loss=0.3781, over 3331568.88 frames. ], batch size: 66, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:20:12,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=186013.33333333334, ans=0.2 2024-09-23 05:20:12,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2024-09-23 05:20:19,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-23 05:20:26,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=186060.0, ans=0.125 2024-09-23 05:20:34,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=186106.66666666666, ans=0.0 2024-09-23 05:20:56,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=186153.33333333334, ans=0.125 2024-09-23 05:21:20,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=186246.66666666666, ans=0.125 2024-09-23 05:21:22,042 INFO [train.py:1198] (0/4) Epoch 11, batch 950, loss[loss=0.2343, ctc_loss=0.1605, cr_loss=0.3691, over 17174.00 frames. ], tot_loss[loss=0.2468, ctc_loss=0.1708, cr_loss=0.3799, over 3343843.17 frames. ], batch size: 41, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:21:38,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=186293.33333333334, ans=0.0 2024-09-23 05:21:40,247 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:21:56,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=186340.0, ans=0.125 2024-09-23 05:22:00,860 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.261e+02 1.404e+02 1.572e+02 2.138e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-23 05:22:04,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=186340.0, ans=0.125 2024-09-23 05:22:04,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=186340.0, ans=0.0 2024-09-23 05:22:06,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-09-23 05:22:42,316 INFO [train.py:1198] (0/4) Epoch 11, batch 1000, loss[loss=0.2208, ctc_loss=0.148, cr_loss=0.364, over 17112.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1704, cr_loss=0.3797, over 3349207.23 frames. ], batch size: 40, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:22:57,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=186526.66666666666, ans=0.2 2024-09-23 05:23:02,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2024-09-23 05:23:22,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=186573.33333333334, ans=0.035 2024-09-23 05:23:28,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=186620.0, ans=0.2 2024-09-23 05:23:47,301 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-40000.pt 2024-09-23 05:23:49,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=186666.66666666666, ans=0.0 2024-09-23 05:23:52,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=186666.66666666666, ans=0.125 2024-09-23 05:23:57,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.34 vs. limit=10.0 2024-09-23 05:24:06,721 INFO [train.py:1198] (0/4) Epoch 11, batch 1050, loss[loss=0.2437, ctc_loss=0.1678, cr_loss=0.3796, over 17289.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1704, cr_loss=0.3799, over 3353072.52 frames. ], batch size: 46, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:24:08,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=186713.33333333334, ans=0.0 2024-09-23 05:24:50,587 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.363e+02 1.550e+02 2.011e+02 3.304e+02, threshold=3.099e+02, percent-clipped=2.0 2024-09-23 05:25:23,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=186900.0, ans=0.05 2024-09-23 05:25:28,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-09-23 05:25:34,181 INFO [train.py:1198] (0/4) Epoch 11, batch 1100, loss[loss=0.2603, ctc_loss=0.183, cr_loss=0.3862, over 16775.00 frames. ], tot_loss[loss=0.2473, ctc_loss=0.171, cr_loss=0.3812, over 3361031.23 frames. ], batch size: 61, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:25:47,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=186946.66666666666, ans=0.125 2024-09-23 05:25:55,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2024-09-23 05:26:08,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-09-23 05:26:41,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=187133.33333333334, ans=0.125 2024-09-23 05:26:41,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2024-09-23 05:26:42,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.15 vs. limit=15.0 2024-09-23 05:26:43,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187133.33333333334, ans=0.1 2024-09-23 05:26:47,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=12.0 2024-09-23 05:26:47,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=187133.33333333334, ans=0.125 2024-09-23 05:26:49,630 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:26:53,985 INFO [train.py:1198] (0/4) Epoch 11, batch 1150, loss[loss=0.2114, ctc_loss=0.1422, cr_loss=0.3461, over 17086.00 frames. ], tot_loss[loss=0.2477, ctc_loss=0.1711, cr_loss=0.3827, over 3366940.84 frames. ], batch size: 43, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:26:55,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=187180.0, ans=0.0 2024-09-23 05:27:18,177 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:27:22,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=187226.66666666666, ans=0.2 2024-09-23 05:27:24,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=187273.33333333334, ans=0.09899494936611666 2024-09-23 05:27:32,067 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.352e+02 1.474e+02 1.715e+02 2.168e+02, threshold=2.948e+02, percent-clipped=0.0 2024-09-23 05:27:56,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=187366.66666666666, ans=0.2 2024-09-23 05:28:13,188 INFO [train.py:1198] (0/4) Epoch 11, batch 1200, loss[loss=0.2093, ctc_loss=0.1414, cr_loss=0.3394, over 17262.00 frames. ], tot_loss[loss=0.2479, ctc_loss=0.1713, cr_loss=0.3827, over 3369634.86 frames. ], batch size: 44, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:28:15,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=12.0 2024-09-23 05:28:18,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=187413.33333333334, ans=0.0 2024-09-23 05:28:19,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=187413.33333333334, ans=0.125 2024-09-23 05:28:35,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=187460.0, ans=0.0 2024-09-23 05:28:44,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=187460.0, ans=0.0 2024-09-23 05:29:18,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=187600.0, ans=0.125 2024-09-23 05:29:41,443 INFO [train.py:1198] (0/4) Epoch 11, batch 1250, loss[loss=0.2055, ctc_loss=0.1394, cr_loss=0.3307, over 17097.00 frames. ], tot_loss[loss=0.2491, ctc_loss=0.1722, cr_loss=0.3841, over 3369257.67 frames. ], batch size: 40, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:29:48,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=187646.66666666666, ans=10.0 2024-09-23 05:29:56,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=187693.33333333334, ans=0.2 2024-09-23 05:30:13,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=187693.33333333334, ans=0.2 2024-09-23 05:30:22,691 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.347e+02 1.488e+02 1.647e+02 3.078e+02, threshold=2.976e+02, percent-clipped=1.0 2024-09-23 05:30:22,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=187740.0, ans=0.125 2024-09-23 05:30:38,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=187786.66666666666, ans=0.0 2024-09-23 05:31:04,206 INFO [train.py:1198] (0/4) Epoch 11, batch 1300, loss[loss=0.2552, ctc_loss=0.175, cr_loss=0.4014, over 16573.00 frames. ], tot_loss[loss=0.2489, ctc_loss=0.1723, cr_loss=0.3832, over 3358671.89 frames. ], batch size: 66, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:31:26,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=187926.66666666666, ans=0.0 2024-09-23 05:31:28,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2024-09-23 05:31:34,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-23 05:31:36,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=187973.33333333334, ans=0.125 2024-09-23 05:31:47,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-23 05:32:22,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=188113.33333333334, ans=0.125 2024-09-23 05:32:23,727 INFO [train.py:1198] (0/4) Epoch 11, batch 1350, loss[loss=0.2531, ctc_loss=0.1735, cr_loss=0.3977, over 17307.00 frames. ], tot_loss[loss=0.2484, ctc_loss=0.1718, cr_loss=0.3828, over 3360375.56 frames. ], batch size: 49, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:32:54,343 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:33:02,037 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.248e+02 1.402e+02 1.590e+02 2.779e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-23 05:33:18,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=12.0 2024-09-23 05:33:45,864 INFO [train.py:1198] (0/4) Epoch 11, batch 1400, loss[loss=0.269, ctc_loss=0.1829, cr_loss=0.4304, over 15176.00 frames. ], tot_loss[loss=0.2471, ctc_loss=0.1709, cr_loss=0.381, over 3356320.01 frames. ], batch size: 89, lr: 1.11e-02, grad_scale: 32.0 2024-09-23 05:33:54,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=12.0 2024-09-23 05:34:00,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=188393.33333333334, ans=0.09899494936611666 2024-09-23 05:34:10,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-23 05:34:11,795 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:34:14,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=188393.33333333334, ans=0.125 2024-09-23 05:34:30,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=188440.0, ans=0.125 2024-09-23 05:34:51,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=188486.66666666666, ans=0.0 2024-09-23 05:35:12,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=188580.0, ans=0.025 2024-09-23 05:35:13,557 INFO [train.py:1198] (0/4) Epoch 11, batch 1450, loss[loss=0.268, ctc_loss=0.1871, cr_loss=0.4041, over 16421.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1703, cr_loss=0.3797, over 3348836.12 frames. ], batch size: 66, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:35:25,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2024-09-23 05:35:42,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=188626.66666666666, ans=0.125 2024-09-23 05:35:51,362 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.355e+02 1.506e+02 1.702e+02 2.506e+02, threshold=3.011e+02, percent-clipped=0.0 2024-09-23 05:35:59,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=188720.0, ans=0.125 2024-09-23 05:36:02,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=188720.0, ans=0.125 2024-09-23 05:36:10,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=188720.0, ans=0.0 2024-09-23 05:36:22,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-09-23 05:36:28,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188766.66666666666, ans=0.1 2024-09-23 05:36:32,872 INFO [train.py:1198] (0/4) Epoch 11, batch 1500, loss[loss=0.2613, ctc_loss=0.1807, cr_loss=0.4029, over 17052.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1699, cr_loss=0.3801, over 3362886.34 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:36:59,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=188860.0, ans=0.0 2024-09-23 05:37:13,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=188906.66666666666, ans=0.125 2024-09-23 05:37:21,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=188953.33333333334, ans=0.025 2024-09-23 05:37:38,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=189000.0, ans=0.0 2024-09-23 05:37:48,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=189000.0, ans=0.0 2024-09-23 05:37:52,941 INFO [train.py:1198] (0/4) Epoch 11, batch 1550, loss[loss=0.2404, ctc_loss=0.1642, cr_loss=0.3807, over 16959.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1689, cr_loss=0.3793, over 3370051.03 frames. ], batch size: 42, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:37:58,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189046.66666666666, ans=0.1 2024-09-23 05:38:09,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=189093.33333333334, ans=0.125 2024-09-23 05:38:14,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189093.33333333334, ans=0.1 2024-09-23 05:38:33,046 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.268e+02 1.378e+02 1.550e+02 2.066e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 05:38:36,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=189140.0, ans=0.0 2024-09-23 05:38:48,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=189186.66666666666, ans=0.125 2024-09-23 05:39:04,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=189233.33333333334, ans=0.125 2024-09-23 05:39:15,065 INFO [train.py:1198] (0/4) Epoch 11, batch 1600, loss[loss=0.259, ctc_loss=0.1822, cr_loss=0.3837, over 17356.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1687, cr_loss=0.3784, over 3371649.14 frames. ], batch size: 48, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:39:25,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=22.5 2024-09-23 05:39:56,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=189373.33333333334, ans=0.125 2024-09-23 05:40:42,830 INFO [train.py:1198] (0/4) Epoch 11, batch 1650, loss[loss=0.2649, ctc_loss=0.1832, cr_loss=0.4089, over 16645.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1689, cr_loss=0.3788, over 3364948.50 frames. ], batch size: 66, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:41:13,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=189606.66666666666, ans=0.0 2024-09-23 05:41:21,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=189606.66666666666, ans=0.2 2024-09-23 05:41:22,847 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.323e+02 1.468e+02 1.701e+02 2.632e+02, threshold=2.937e+02, percent-clipped=0.0 2024-09-23 05:41:35,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=189653.33333333334, ans=0.125 2024-09-23 05:41:39,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2024-09-23 05:42:02,591 INFO [train.py:1198] (0/4) Epoch 11, batch 1700, loss[loss=0.2942, ctc_loss=0.2027, cr_loss=0.4572, over 17013.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1691, cr_loss=0.3796, over 3373130.16 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:42:07,659 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:42:07,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=189746.66666666666, ans=0.125 2024-09-23 05:42:25,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=22.5 2024-09-23 05:42:34,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=189840.0, ans=0.125 2024-09-23 05:42:41,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-09-23 05:42:55,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=189886.66666666666, ans=0.125 2024-09-23 05:43:22,422 INFO [train.py:1198] (0/4) Epoch 11, batch 1750, loss[loss=0.2466, ctc_loss=0.1695, cr_loss=0.3856, over 17029.00 frames. ], tot_loss[loss=0.2452, ctc_loss=0.1692, cr_loss=0.3798, over 3367433.15 frames. ], batch size: 44, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:43:34,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=189980.0, ans=0.125 2024-09-23 05:43:39,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=190026.66666666666, ans=0.0 2024-09-23 05:43:51,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=190026.66666666666, ans=0.0 2024-09-23 05:43:54,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-09-23 05:43:55,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=190073.33333333334, ans=0.0 2024-09-23 05:44:00,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=190073.33333333334, ans=0.125 2024-09-23 05:44:03,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=190073.33333333334, ans=0.0 2024-09-23 05:44:04,904 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.343e+02 1.475e+02 1.671e+02 2.216e+02, threshold=2.950e+02, percent-clipped=0.0 2024-09-23 05:44:10,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=12.0 2024-09-23 05:44:32,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=190166.66666666666, ans=0.0 2024-09-23 05:44:33,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190166.66666666666, ans=0.1 2024-09-23 05:44:52,207 INFO [train.py:1198] (0/4) Epoch 11, batch 1800, loss[loss=0.2703, ctc_loss=0.1897, cr_loss=0.4027, over 16492.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1698, cr_loss=0.3816, over 3374707.60 frames. ], batch size: 66, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:45:11,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=190260.0, ans=0.125 2024-09-23 05:45:11,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=190260.0, ans=0.0 2024-09-23 05:45:40,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=190353.33333333334, ans=0.125 2024-09-23 05:45:48,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=190353.33333333334, ans=0.0 2024-09-23 05:45:50,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=190353.33333333334, ans=0.2 2024-09-23 05:46:11,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=190446.66666666666, ans=22.5 2024-09-23 05:46:12,759 INFO [train.py:1198] (0/4) Epoch 11, batch 1850, loss[loss=0.2357, ctc_loss=0.1617, cr_loss=0.3698, over 17028.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1703, cr_loss=0.3817, over 3367922.36 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:46:25,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=190446.66666666666, ans=0.125 2024-09-23 05:46:25,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=190446.66666666666, ans=0.0 2024-09-23 05:46:27,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=190493.33333333334, ans=0.125 2024-09-23 05:46:30,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190493.33333333334, ans=0.1 2024-09-23 05:46:46,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=22.5 2024-09-23 05:46:52,386 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.336e+02 1.526e+02 1.737e+02 2.806e+02, threshold=3.051e+02, percent-clipped=0.0 2024-09-23 05:47:25,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-23 05:47:32,659 INFO [train.py:1198] (0/4) Epoch 11, batch 1900, loss[loss=0.2277, ctc_loss=0.1553, cr_loss=0.3621, over 16956.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1711, cr_loss=0.3813, over 3349197.31 frames. ], batch size: 42, lr: 1.10e-02, grad_scale: 16.0 2024-09-23 05:47:43,954 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:47:45,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190680.0, ans=0.1 2024-09-23 05:47:55,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=190726.66666666666, ans=0.125 2024-09-23 05:48:36,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2024-09-23 05:48:55,012 INFO [train.py:1198] (0/4) Epoch 11, batch 1950, loss[loss=0.1988, ctc_loss=0.1357, cr_loss=0.3154, over 17088.00 frames. ], tot_loss[loss=0.2474, ctc_loss=0.1711, cr_loss=0.3814, over 3349225.21 frames. ], batch size: 43, lr: 1.10e-02, grad_scale: 16.0 2024-09-23 05:49:00,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=190913.33333333334, ans=0.0 2024-09-23 05:49:00,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.11 vs. limit=6.0 2024-09-23 05:49:08,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2024-09-23 05:49:41,889 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.367e+02 1.554e+02 1.740e+02 3.825e+02, threshold=3.108e+02, percent-clipped=1.0 2024-09-23 05:50:11,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=191100.0, ans=0.125 2024-09-23 05:50:22,739 INFO [train.py:1198] (0/4) Epoch 11, batch 2000, loss[loss=0.2564, ctc_loss=0.1848, cr_loss=0.3581, over 17100.00 frames. ], tot_loss[loss=0.2463, ctc_loss=0.1704, cr_loss=0.3797, over 3359342.17 frames. ], batch size: 49, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:50:34,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=191146.66666666666, ans=0.0 2024-09-23 05:50:45,545 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 05:50:50,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=191193.33333333334, ans=0.025 2024-09-23 05:51:03,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=22.5 2024-09-23 05:51:08,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=191240.0, ans=0.0 2024-09-23 05:51:17,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=191286.66666666666, ans=0.0 2024-09-23 05:51:42,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=191380.0, ans=0.0 2024-09-23 05:51:43,322 INFO [train.py:1198] (0/4) Epoch 11, batch 2050, loss[loss=0.1887, ctc_loss=0.1249, cr_loss=0.3191, over 16348.00 frames. ], tot_loss[loss=0.2462, ctc_loss=0.1703, cr_loss=0.3797, over 3360408.43 frames. ], batch size: 36, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:52:24,542 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.258e+02 1.361e+02 1.544e+02 2.924e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-23 05:52:28,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=191473.33333333334, ans=6.0 2024-09-23 05:52:45,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=191566.66666666666, ans=0.0 2024-09-23 05:53:03,035 INFO [train.py:1198] (0/4) Epoch 11, batch 2100, loss[loss=0.2969, ctc_loss=0.2112, cr_loss=0.4288, over 16721.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1697, cr_loss=0.3792, over 3359508.25 frames. ], batch size: 61, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:53:03,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191613.33333333334, ans=0.1 2024-09-23 05:53:07,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2024-09-23 05:53:11,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=191613.33333333334, ans=0.09899494936611666 2024-09-23 05:53:17,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=191660.0, ans=0.1 2024-09-23 05:53:22,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=191660.0, ans=0.0 2024-09-23 05:53:22,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=191660.0, ans=0.1 2024-09-23 05:53:22,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-09-23 05:53:53,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=191753.33333333334, ans=0.0 2024-09-23 05:54:06,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=191753.33333333334, ans=12.0 2024-09-23 05:54:09,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=191800.0, ans=0.2 2024-09-23 05:54:14,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=191800.0, ans=0.0 2024-09-23 05:54:30,862 INFO [train.py:1198] (0/4) Epoch 11, batch 2150, loss[loss=0.2487, ctc_loss=0.1767, cr_loss=0.3603, over 16710.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1688, cr_loss=0.3787, over 3370322.46 frames. ], batch size: 61, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:55:14,825 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.341e+02 1.514e+02 1.705e+02 2.483e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-23 05:55:34,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=191986.66666666666, ans=0.125 2024-09-23 05:55:49,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-09-23 05:55:53,282 INFO [train.py:1198] (0/4) Epoch 11, batch 2200, loss[loss=0.2765, ctc_loss=0.1927, cr_loss=0.4189, over 17036.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1692, cr_loss=0.3797, over 3371285.46 frames. ], batch size: 52, lr: 1.10e-02, grad_scale: 32.0 2024-09-23 05:56:12,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2024-09-23 05:56:28,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=192173.33333333334, ans=0.09899494936611666 2024-09-23 05:57:04,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=192266.66666666666, ans=0.125 2024-09-23 05:57:13,774 INFO [train.py:1198] (0/4) Epoch 11, batch 2250, loss[loss=0.2827, ctc_loss=0.1963, cr_loss=0.432, over 17292.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1696, cr_loss=0.3801, over 3370024.03 frames. ], batch size: 49, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 05:57:25,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=192313.33333333334, ans=0.125 2024-09-23 05:57:33,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.07 vs. limit=10.0 2024-09-23 05:57:50,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=192406.66666666666, ans=0.0 2024-09-23 05:57:52,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=192406.66666666666, ans=0.125 2024-09-23 05:57:55,416 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.304e+02 1.378e+02 1.493e+02 3.334e+02, threshold=2.757e+02, percent-clipped=1.0 2024-09-23 05:58:24,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=192500.0, ans=0.0 2024-09-23 05:58:36,212 INFO [train.py:1198] (0/4) Epoch 11, batch 2300, loss[loss=0.2576, ctc_loss=0.1789, cr_loss=0.3935, over 17020.00 frames. ], tot_loss[loss=0.2457, ctc_loss=0.1696, cr_loss=0.3803, over 3366749.50 frames. ], batch size: 56, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 05:58:44,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=192546.66666666666, ans=0.0 2024-09-23 05:59:12,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=192640.0, ans=0.125 2024-09-23 05:59:14,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=192640.0, ans=0.0 2024-09-23 05:59:17,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=192640.0, ans=0.5 2024-09-23 06:00:01,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2024-09-23 06:00:03,694 INFO [train.py:1198] (0/4) Epoch 11, batch 2350, loss[loss=0.2369, ctc_loss=0.1609, cr_loss=0.38, over 17297.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1692, cr_loss=0.3796, over 3364927.68 frames. ], batch size: 46, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:00:07,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=192780.0, ans=0.1 2024-09-23 06:00:32,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=192826.66666666666, ans=0.125 2024-09-23 06:00:44,551 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.266e+02 1.351e+02 1.522e+02 2.296e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-23 06:00:51,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=192920.0, ans=0.125 2024-09-23 06:00:51,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2024-09-23 06:01:20,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=192966.66666666666, ans=0.2 2024-09-23 06:01:22,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-23 06:01:23,282 INFO [train.py:1198] (0/4) Epoch 11, batch 2400, loss[loss=0.2597, ctc_loss=0.1769, cr_loss=0.4142, over 16978.00 frames. ], tot_loss[loss=0.2445, ctc_loss=0.1686, cr_loss=0.3794, over 3365002.19 frames. ], batch size: 53, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:01:25,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=193013.33333333334, ans=0.125 2024-09-23 06:01:46,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=193060.0, ans=0.125 2024-09-23 06:01:55,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=193106.66666666666, ans=0.125 2024-09-23 06:01:55,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=193106.66666666666, ans=0.0 2024-09-23 06:02:40,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=193200.0, ans=0.125 2024-09-23 06:02:41,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=193246.66666666666, ans=0.125 2024-09-23 06:02:41,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=193246.66666666666, ans=0.0 2024-09-23 06:02:43,184 INFO [train.py:1198] (0/4) Epoch 11, batch 2450, loss[loss=0.2607, ctc_loss=0.1843, cr_loss=0.3819, over 14884.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1699, cr_loss=0.3796, over 3353998.06 frames. ], batch size: 89, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:02:59,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193293.33333333334, ans=0.1 2024-09-23 06:03:04,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=193293.33333333334, ans=0.09899494936611666 2024-09-23 06:03:24,780 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.301e+02 1.390e+02 1.544e+02 1.973e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-23 06:03:45,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.00 vs. limit=10.0 2024-09-23 06:03:54,405 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:04:07,698 INFO [train.py:1198] (0/4) Epoch 11, batch 2500, loss[loss=0.2418, ctc_loss=0.1679, cr_loss=0.3691, over 17035.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1691, cr_loss=0.3792, over 3360611.45 frames. ], batch size: 52, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:04:16,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=193480.0, ans=0.2 2024-09-23 06:04:34,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=193526.66666666666, ans=0.125 2024-09-23 06:04:36,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=193526.66666666666, ans=0.5 2024-09-23 06:04:45,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=193573.33333333334, ans=0.2 2024-09-23 06:04:50,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=12.0 2024-09-23 06:05:18,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=193666.66666666666, ans=0.0 2024-09-23 06:05:20,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=193666.66666666666, ans=0.1 2024-09-23 06:05:32,500 INFO [train.py:1198] (0/4) Epoch 11, batch 2550, loss[loss=0.2818, ctc_loss=0.1981, cr_loss=0.4182, over 17026.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1685, cr_loss=0.3782, over 3355739.44 frames. ], batch size: 52, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:06:07,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=193806.66666666666, ans=0.125 2024-09-23 06:06:11,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=193806.66666666666, ans=0.1 2024-09-23 06:06:11,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=193806.66666666666, ans=0.125 2024-09-23 06:06:13,861 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.308e+02 1.472e+02 1.682e+02 2.409e+02, threshold=2.944e+02, percent-clipped=0.0 2024-09-23 06:06:18,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=193853.33333333334, ans=0.2 2024-09-23 06:06:33,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=193853.33333333334, ans=0.125 2024-09-23 06:06:33,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-09-23 06:06:50,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=193946.66666666666, ans=0.0 2024-09-23 06:06:51,768 INFO [train.py:1198] (0/4) Epoch 11, batch 2600, loss[loss=0.2656, ctc_loss=0.1828, cr_loss=0.4143, over 16550.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1691, cr_loss=0.3796, over 3352564.16 frames. ], batch size: 66, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:07:10,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=193993.33333333334, ans=0.2 2024-09-23 06:07:33,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=194040.0, ans=0.125 2024-09-23 06:08:03,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=194133.33333333334, ans=0.0 2024-09-23 06:08:10,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=194180.0, ans=0.0 2024-09-23 06:08:11,671 INFO [train.py:1198] (0/4) Epoch 11, batch 2650, loss[loss=0.2397, ctc_loss=0.1665, cr_loss=0.3659, over 17318.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1685, cr_loss=0.3797, over 3362005.70 frames. ], batch size: 51, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:08:13,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=194180.0, ans=0.125 2024-09-23 06:08:18,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-23 06:08:21,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=194180.0, ans=0.125 2024-09-23 06:08:39,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=194226.66666666666, ans=0.125 2024-09-23 06:08:44,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=194273.33333333334, ans=0.1 2024-09-23 06:08:51,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=8.0 2024-09-23 06:08:57,081 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.341e+02 1.529e+02 1.824e+02 3.038e+02, threshold=3.058e+02, percent-clipped=1.0 2024-09-23 06:09:14,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=194320.0, ans=0.0 2024-09-23 06:09:30,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-09-23 06:09:34,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=194366.66666666666, ans=0.025 2024-09-23 06:09:41,681 INFO [train.py:1198] (0/4) Epoch 11, batch 2700, loss[loss=0.2139, ctc_loss=0.148, cr_loss=0.3298, over 17271.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1687, cr_loss=0.3794, over 3363792.66 frames. ], batch size: 42, lr: 1.09e-02, grad_scale: 16.0 2024-09-23 06:09:48,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=194413.33333333334, ans=0.025 2024-09-23 06:10:04,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=194460.0, ans=0.125 2024-09-23 06:10:14,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=194506.66666666666, ans=0.125 2024-09-23 06:10:40,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=22.5 2024-09-23 06:10:58,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=194600.0, ans=0.125 2024-09-23 06:11:00,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=194646.66666666666, ans=0.125 2024-09-23 06:11:01,390 INFO [train.py:1198] (0/4) Epoch 11, batch 2750, loss[loss=0.2453, ctc_loss=0.1659, cr_loss=0.3971, over 17347.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1689, cr_loss=0.3794, over 3364454.81 frames. ], batch size: 48, lr: 1.09e-02, grad_scale: 16.0 2024-09-23 06:11:36,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=194740.0, ans=0.05 2024-09-23 06:11:44,296 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.275e+02 1.375e+02 1.599e+02 2.337e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 06:11:47,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=194786.66666666666, ans=0.2 2024-09-23 06:11:49,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194786.66666666666, ans=0.1 2024-09-23 06:12:10,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=194833.33333333334, ans=0.125 2024-09-23 06:12:16,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=194833.33333333334, ans=0.07 2024-09-23 06:12:21,172 INFO [train.py:1198] (0/4) Epoch 11, batch 2800, loss[loss=0.2507, ctc_loss=0.1698, cr_loss=0.4041, over 17306.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1702, cr_loss=0.3813, over 3360577.93 frames. ], batch size: 51, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:12:31,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.05 vs. limit=10.0 2024-09-23 06:12:34,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=194880.0, ans=0.0 2024-09-23 06:12:41,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=194926.66666666666, ans=0.125 2024-09-23 06:13:37,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=195066.66666666666, ans=0.125 2024-09-23 06:13:44,869 INFO [train.py:1198] (0/4) Epoch 11, batch 2850, loss[loss=0.2117, ctc_loss=0.1434, cr_loss=0.3415, over 17261.00 frames. ], tot_loss[loss=0.2466, ctc_loss=0.1703, cr_loss=0.3817, over 3353488.39 frames. ], batch size: 44, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:14:14,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=195160.0, ans=0.5 2024-09-23 06:14:22,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2024-09-23 06:14:32,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195206.66666666666, ans=0.1 2024-09-23 06:14:35,500 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.307e+02 1.410e+02 1.605e+02 2.111e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-23 06:14:47,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=22.5 2024-09-23 06:14:54,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=195300.0, ans=0.025 2024-09-23 06:15:11,918 INFO [train.py:1198] (0/4) Epoch 11, batch 2900, loss[loss=0.2311, ctc_loss=0.1601, cr_loss=0.3553, over 17034.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1693, cr_loss=0.3793, over 3354773.91 frames. ], batch size: 39, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:15:28,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195393.33333333334, ans=0.1 2024-09-23 06:15:29,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=12.0 2024-09-23 06:15:53,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=195440.0, ans=0.125 2024-09-23 06:15:55,542 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:15:57,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2024-09-23 06:16:06,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=195486.66666666666, ans=0.125 2024-09-23 06:16:31,837 INFO [train.py:1198] (0/4) Epoch 11, batch 2950, loss[loss=0.2732, ctc_loss=0.1877, cr_loss=0.4277, over 16470.00 frames. ], tot_loss[loss=0.2451, ctc_loss=0.1692, cr_loss=0.3796, over 3355970.74 frames. ], batch size: 66, lr: 1.09e-02, grad_scale: 32.0 2024-09-23 06:16:43,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=195580.0, ans=0.0 2024-09-23 06:17:07,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-09-23 06:17:14,835 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.272e+02 1.364e+02 1.479e+02 2.031e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 06:17:16,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195673.33333333334, ans=0.0 2024-09-23 06:17:45,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=195766.66666666666, ans=0.125 2024-09-23 06:17:50,962 INFO [train.py:1198] (0/4) Epoch 11, batch 3000, loss[loss=0.2071, ctc_loss=0.1436, cr_loss=0.3175, over 17064.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1689, cr_loss=0.3789, over 3354697.36 frames. ], batch size: 46, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:17:50,962 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 06:18:02,759 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3361, 3.9481, 3.5096, 3.7579, 3.7637, 3.1002, 3.4510, 2.5125], device='cuda:0') 2024-09-23 06:18:06,137 INFO [train.py:1230] (0/4) Epoch 11, validation: loss=0.04835, ctc_loss=0.04835, cr_loss=7.412e-15, over 944034.00 frames. 2024-09-23 06:18:06,137 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 06:18:21,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2024-09-23 06:18:48,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195906.66666666666, ans=0.1 2024-09-23 06:18:49,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195906.66666666666, ans=0.1 2024-09-23 06:18:49,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=195906.66666666666, ans=0.2 2024-09-23 06:19:13,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=196000.0, ans=0.125 2024-09-23 06:19:27,162 INFO [train.py:1198] (0/4) Epoch 11, batch 3050, loss[loss=0.2921, ctc_loss=0.2046, cr_loss=0.4372, over 17007.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1686, cr_loss=0.3786, over 3367762.52 frames. ], batch size: 53, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:19:35,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-09-23 06:19:50,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=196093.33333333334, ans=0.0 2024-09-23 06:19:55,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=196093.33333333334, ans=0.125 2024-09-23 06:19:58,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-09-23 06:20:14,516 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.336e+02 1.544e+02 1.822e+02 2.726e+02, threshold=3.088e+02, percent-clipped=0.0 2024-09-23 06:20:25,384 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:20:26,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=196186.66666666666, ans=0.0 2024-09-23 06:20:51,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=196280.0, ans=0.1 2024-09-23 06:20:52,725 INFO [train.py:1198] (0/4) Epoch 11, batch 3100, loss[loss=0.2367, ctc_loss=0.1621, cr_loss=0.3727, over 17220.00 frames. ], tot_loss[loss=0.2449, ctc_loss=0.1691, cr_loss=0.3791, over 3354271.04 frames. ], batch size: 47, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:20:54,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=196280.0, ans=0.2 2024-09-23 06:21:22,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=196373.33333333334, ans=0.125 2024-09-23 06:21:32,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=196373.33333333334, ans=0.125 2024-09-23 06:21:36,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=196373.33333333334, ans=0.95 2024-09-23 06:22:11,263 INFO [train.py:1198] (0/4) Epoch 11, batch 3150, loss[loss=0.194, ctc_loss=0.1306, cr_loss=0.3168, over 16968.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1687, cr_loss=0.3797, over 3360624.73 frames. ], batch size: 42, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:22:23,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=22.5 2024-09-23 06:22:28,682 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=1.277e-02 2024-09-23 06:22:53,534 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.336e+02 1.469e+02 1.670e+02 2.912e+02, threshold=2.938e+02, percent-clipped=0.0 2024-09-23 06:22:56,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=196653.33333333334, ans=0.125 2024-09-23 06:23:22,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=196700.0, ans=0.0 2024-09-23 06:23:25,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=196700.0, ans=0.09899494936611666 2024-09-23 06:23:29,552 INFO [train.py:1198] (0/4) Epoch 11, batch 3200, loss[loss=0.2436, ctc_loss=0.166, cr_loss=0.3882, over 17305.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1697, cr_loss=0.3809, over 3353098.49 frames. ], batch size: 49, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:23:40,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=196746.66666666666, ans=0.2 2024-09-23 06:23:45,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2024-09-23 06:23:54,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2024-09-23 06:23:56,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=196793.33333333334, ans=0.2 2024-09-23 06:24:11,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196840.0, ans=0.1 2024-09-23 06:24:47,353 INFO [train.py:1198] (0/4) Epoch 11, batch 3250, loss[loss=0.2334, ctc_loss=0.1621, cr_loss=0.3564, over 17022.00 frames. ], tot_loss[loss=0.2454, ctc_loss=0.1694, cr_loss=0.38, over 3345553.95 frames. ], batch size: 44, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:24:56,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=196980.0, ans=0.125 2024-09-23 06:25:18,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197073.33333333334, ans=0.1 2024-09-23 06:25:30,818 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.294e+02 1.384e+02 1.561e+02 5.237e+02, threshold=2.769e+02, percent-clipped=1.0 2024-09-23 06:25:39,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-09-23 06:25:42,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=197120.0, ans=0.125 2024-09-23 06:26:05,132 INFO [train.py:1198] (0/4) Epoch 11, batch 3300, loss[loss=0.2632, ctc_loss=0.1867, cr_loss=0.3824, over 16063.00 frames. ], tot_loss[loss=0.2456, ctc_loss=0.1696, cr_loss=0.3801, over 3352592.78 frames. ], batch size: 74, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:26:45,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=197306.66666666666, ans=0.125 2024-09-23 06:26:53,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=197353.33333333334, ans=0.125 2024-09-23 06:27:22,860 INFO [train.py:1198] (0/4) Epoch 11, batch 3350, loss[loss=0.2368, ctc_loss=0.1625, cr_loss=0.372, over 17177.00 frames. ], tot_loss[loss=0.2464, ctc_loss=0.1703, cr_loss=0.3805, over 3339222.95 frames. ], batch size: 45, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:28:06,586 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.310e+02 1.428e+02 1.594e+02 2.854e+02, threshold=2.856e+02, percent-clipped=1.0 2024-09-23 06:28:08,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-09-23 06:28:12,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=22.5 2024-09-23 06:28:19,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2024-09-23 06:28:36,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=197633.33333333334, ans=0.0 2024-09-23 06:28:41,095 INFO [train.py:1198] (0/4) Epoch 11, batch 3400, loss[loss=0.2732, ctc_loss=0.1844, cr_loss=0.444, over 17356.00 frames. ], tot_loss[loss=0.2461, ctc_loss=0.1699, cr_loss=0.3811, over 3349124.60 frames. ], batch size: 48, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:28:50,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=197680.0, ans=0.0 2024-09-23 06:29:03,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=197726.66666666666, ans=0.025 2024-09-23 06:29:14,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=197773.33333333334, ans=0.125 2024-09-23 06:29:35,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=197820.0, ans=0.0 2024-09-23 06:29:41,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=197820.0, ans=0.025 2024-09-23 06:30:02,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-09-23 06:30:04,326 INFO [train.py:1198] (0/4) Epoch 11, batch 3450, loss[loss=0.2264, ctc_loss=0.1544, cr_loss=0.3596, over 16959.00 frames. ], tot_loss[loss=0.2458, ctc_loss=0.1697, cr_loss=0.3804, over 3358102.56 frames. ], batch size: 42, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:30:05,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=12.0 2024-09-23 06:30:26,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=197960.0, ans=0.125 2024-09-23 06:30:49,691 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.318e+02 1.398e+02 1.528e+02 2.343e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 06:30:59,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=198053.33333333334, ans=0.0 2024-09-23 06:30:59,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=198053.33333333334, ans=0.04949747468305833 2024-09-23 06:31:24,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-09-23 06:31:26,438 INFO [train.py:1198] (0/4) Epoch 11, batch 3500, loss[loss=0.2293, ctc_loss=0.1555, cr_loss=0.369, over 17028.00 frames. ], tot_loss[loss=0.2459, ctc_loss=0.1698, cr_loss=0.3807, over 3360771.09 frames. ], batch size: 44, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:31:29,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=198146.66666666666, ans=0.0 2024-09-23 06:31:33,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2024-09-23 06:31:36,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=198146.66666666666, ans=0.125 2024-09-23 06:31:45,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=198193.33333333334, ans=0.125 2024-09-23 06:32:04,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=198240.0, ans=0.1 2024-09-23 06:32:08,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=8.0 2024-09-23 06:32:16,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=198286.66666666666, ans=0.09899494936611666 2024-09-23 06:32:24,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=198286.66666666666, ans=0.125 2024-09-23 06:32:27,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=198333.33333333334, ans=0.0 2024-09-23 06:32:36,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2024-09-23 06:32:44,007 INFO [train.py:1198] (0/4) Epoch 11, batch 3550, loss[loss=0.2054, ctc_loss=0.1381, cr_loss=0.3364, over 17266.00 frames. ], tot_loss[loss=0.2455, ctc_loss=0.1694, cr_loss=0.3804, over 3364731.27 frames. ], batch size: 42, lr: 1.08e-02, grad_scale: 16.0 2024-09-23 06:32:47,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=198380.0, ans=0.125 2024-09-23 06:33:10,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198426.66666666666, ans=0.1 2024-09-23 06:33:21,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198473.33333333334, ans=0.1 2024-09-23 06:33:27,727 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.313e+02 1.389e+02 1.597e+02 2.419e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-23 06:34:02,341 INFO [train.py:1198] (0/4) Epoch 11, batch 3600, loss[loss=0.2307, ctc_loss=0.1604, cr_loss=0.3512, over 17280.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1692, cr_loss=0.3788, over 3364301.57 frames. ], batch size: 44, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:34:07,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-23 06:34:14,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2024-09-23 06:34:25,063 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2024-09-23 06:34:43,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=198706.66666666666, ans=0.04949747468305833 2024-09-23 06:34:57,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=198753.33333333334, ans=0.2 2024-09-23 06:35:21,046 INFO [train.py:1198] (0/4) Epoch 11, batch 3650, loss[loss=0.2712, ctc_loss=0.1895, cr_loss=0.4083, over 16900.00 frames. ], tot_loss[loss=0.2446, ctc_loss=0.1689, cr_loss=0.3786, over 3368900.79 frames. ], batch size: 58, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:35:26,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=198846.66666666666, ans=0.125 2024-09-23 06:36:04,372 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.052e+02 1.307e+02 1.406e+02 1.514e+02 2.450e+02, threshold=2.812e+02, percent-clipped=0.0 2024-09-23 06:36:15,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=198986.66666666666, ans=0.125 2024-09-23 06:36:39,290 INFO [train.py:1198] (0/4) Epoch 11, batch 3700, loss[loss=0.2685, ctc_loss=0.1864, cr_loss=0.4107, over 15861.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1681, cr_loss=0.3771, over 3369918.71 frames. ], batch size: 74, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:36:41,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=199080.0, ans=0.0 2024-09-23 06:36:49,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=199080.0, ans=0.125 2024-09-23 06:36:50,910 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-23 06:37:10,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=199173.33333333334, ans=0.0 2024-09-23 06:37:31,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=199220.0, ans=0.2 2024-09-23 06:37:32,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=199220.0, ans=0.125 2024-09-23 06:37:57,659 INFO [train.py:1198] (0/4) Epoch 11, batch 3750, loss[loss=0.2136, ctc_loss=0.1428, cr_loss=0.3542, over 17218.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1686, cr_loss=0.3778, over 3356439.31 frames. ], batch size: 41, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:38:14,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199360.0, ans=0.1 2024-09-23 06:38:24,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-09-23 06:38:33,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=199406.66666666666, ans=0.035 2024-09-23 06:38:36,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=199406.66666666666, ans=0.025 2024-09-23 06:38:39,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2024-09-23 06:38:41,139 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.417e+02 1.573e+02 1.870e+02 3.069e+02, threshold=3.146e+02, percent-clipped=3.0 2024-09-23 06:38:47,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=199453.33333333334, ans=0.125 2024-09-23 06:38:57,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2024-09-23 06:39:16,470 INFO [train.py:1198] (0/4) Epoch 11, batch 3800, loss[loss=0.2096, ctc_loss=0.1444, cr_loss=0.3261, over 17308.00 frames. ], tot_loss[loss=0.245, ctc_loss=0.1693, cr_loss=0.3786, over 3339657.01 frames. ], batch size: 46, lr: 1.08e-02, grad_scale: 32.0 2024-09-23 06:40:00,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2024-09-23 06:40:34,880 INFO [train.py:1198] (0/4) Epoch 11, batch 3850, loss[loss=0.3024, ctc_loss=0.2227, cr_loss=0.3984, over 11347.00 frames. ], tot_loss[loss=0.2493, ctc_loss=0.1728, cr_loss=0.3826, over 3297578.71 frames. ], batch size: 123, lr: 1.07e-02, grad_scale: 16.0 2024-09-23 06:40:47,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=199780.0, ans=0.1 2024-09-23 06:41:18,606 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 1.500e+02 1.672e+02 1.857e+02 2.511e+02, threshold=3.343e+02, percent-clipped=0.0 2024-09-23 06:41:30,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2024-09-23 06:41:40,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.62 vs. limit=15.0 2024-09-23 06:41:44,112 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-11.pt 2024-09-23 06:42:36,530 INFO [train.py:1198] (0/4) Epoch 12, batch 0, loss[loss=0.2382, ctc_loss=0.1655, cr_loss=0.3639, over 17303.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1655, cr_loss=0.3639, over 17303.00 frames. ], batch size: 49, lr: 1.03e-02, grad_scale: 32.0 2024-09-23 06:42:36,531 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 06:42:52,081 INFO [train.py:1230] (0/4) Epoch 12, validation: loss=0.0478, ctc_loss=0.0478, cr_loss=7.52e-15, over 944034.00 frames. 2024-09-23 06:42:52,082 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 06:43:10,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.90 vs. limit=22.5 2024-09-23 06:43:21,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=200041.33333333334, ans=0.125 2024-09-23 06:43:22,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=200088.0, ans=0.125 2024-09-23 06:43:43,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=200134.66666666666, ans=0.0 2024-09-23 06:43:57,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=200181.33333333334, ans=0.125 2024-09-23 06:44:01,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.54 vs. limit=10.0 2024-09-23 06:44:07,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=200181.33333333334, ans=0.0 2024-09-23 06:44:11,656 INFO [train.py:1198] (0/4) Epoch 12, batch 50, loss[loss=0.2486, ctc_loss=0.1704, cr_loss=0.391, over 17024.00 frames. ], tot_loss[loss=0.2438, ctc_loss=0.1687, cr_loss=0.3755, over 749061.98 frames. ], batch size: 52, lr: 1.03e-02, grad_scale: 32.0 2024-09-23 06:44:13,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=200228.0, ans=0.125 2024-09-23 06:44:16,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=200228.0, ans=0.125 2024-09-23 06:44:46,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=200274.66666666666, ans=0.0 2024-09-23 06:44:51,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2024-09-23 06:45:05,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=200368.0, ans=10.0 2024-09-23 06:45:12,537 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.318e+02 1.438e+02 1.597e+02 2.419e+02, threshold=2.876e+02, percent-clipped=0.0 2024-09-23 06:45:30,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=200414.66666666666, ans=0.0 2024-09-23 06:45:39,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2024-09-23 06:45:40,750 INFO [train.py:1198] (0/4) Epoch 12, batch 100, loss[loss=0.2342, ctc_loss=0.1602, cr_loss=0.37, over 17030.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1678, cr_loss=0.3764, over 1329963.17 frames. ], batch size: 51, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:46:53,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-23 06:47:00,676 INFO [train.py:1198] (0/4) Epoch 12, batch 150, loss[loss=0.2559, ctc_loss=0.1766, cr_loss=0.3965, over 17015.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1652, cr_loss=0.3742, over 1784088.17 frames. ], batch size: 51, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:47:05,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=200694.66666666666, ans=0.125 2024-09-23 06:47:11,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200694.66666666666, ans=0.0 2024-09-23 06:47:51,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=200834.66666666666, ans=0.0 2024-09-23 06:47:54,507 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.306e+02 1.461e+02 1.660e+02 2.321e+02, threshold=2.923e+02, percent-clipped=0.0 2024-09-23 06:48:18,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200928.0, ans=0.1 2024-09-23 06:48:19,936 INFO [train.py:1198] (0/4) Epoch 12, batch 200, loss[loss=0.2097, ctc_loss=0.145, cr_loss=0.3236, over 17281.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1646, cr_loss=0.3738, over 2142301.33 frames. ], batch size: 42, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:49:18,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=201068.0, ans=0.0 2024-09-23 06:49:26,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=201114.66666666666, ans=0.2 2024-09-23 06:49:28,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=201114.66666666666, ans=0.2 2024-09-23 06:49:47,799 INFO [train.py:1198] (0/4) Epoch 12, batch 250, loss[loss=0.2616, ctc_loss=0.1793, cr_loss=0.4112, over 17286.00 frames. ], tot_loss[loss=0.2411, ctc_loss=0.1658, cr_loss=0.3765, over 2413279.49 frames. ], batch size: 46, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:49:48,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=201161.33333333334, ans=0.0 2024-09-23 06:49:53,060 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:50:45,032 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.293e+02 1.367e+02 1.506e+02 2.268e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-23 06:50:51,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=201301.33333333334, ans=0.0 2024-09-23 06:51:10,826 INFO [train.py:1198] (0/4) Epoch 12, batch 300, loss[loss=0.2503, ctc_loss=0.1741, cr_loss=0.381, over 17309.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1668, cr_loss=0.3788, over 2632426.73 frames. ], batch size: 51, lr: 1.03e-02, grad_scale: 16.0 2024-09-23 06:51:21,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-09-23 06:51:52,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=201488.0, ans=0.125 2024-09-23 06:52:05,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=201534.66666666666, ans=0.125 2024-09-23 06:52:30,549 INFO [train.py:1198] (0/4) Epoch 12, batch 350, loss[loss=0.2663, ctc_loss=0.184, cr_loss=0.4113, over 17098.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1648, cr_loss=0.376, over 2800876.10 frames. ], batch size: 49, lr: 1.02e-02, grad_scale: 16.0 2024-09-23 06:53:25,562 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.271e+02 1.388e+02 1.541e+02 2.258e+02, threshold=2.777e+02, percent-clipped=0.0 2024-09-23 06:53:25,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=201768.0, ans=0.2 2024-09-23 06:53:43,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-09-23 06:53:51,108 INFO [train.py:1198] (0/4) Epoch 12, batch 400, loss[loss=0.2467, ctc_loss=0.1709, cr_loss=0.3789, over 17041.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1665, cr_loss=0.3782, over 2927030.03 frames. ], batch size: 44, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:54:25,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=201954.66666666666, ans=0.125 2024-09-23 06:54:35,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=201954.66666666666, ans=0.025 2024-09-23 06:54:35,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=12.0 2024-09-23 06:54:44,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=201954.66666666666, ans=0.125 2024-09-23 06:54:52,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=202001.33333333334, ans=0.125 2024-09-23 06:54:52,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=12.0 2024-09-23 06:55:07,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.54 vs. limit=15.0 2024-09-23 06:55:18,827 INFO [train.py:1198] (0/4) Epoch 12, batch 450, loss[loss=0.2151, ctc_loss=0.1438, cr_loss=0.3567, over 17076.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1667, cr_loss=0.3782, over 3022583.53 frames. ], batch size: 39, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:55:34,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=202094.66666666666, ans=0.125 2024-09-23 06:55:45,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=202141.33333333334, ans=0.0 2024-09-23 06:55:47,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-09-23 06:56:07,947 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:56:14,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.07 vs. limit=10.0 2024-09-23 06:56:15,459 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.330e+02 1.519e+02 1.735e+02 3.092e+02, threshold=3.038e+02, percent-clipped=2.0 2024-09-23 06:56:40,845 INFO [train.py:1198] (0/4) Epoch 12, batch 500, loss[loss=0.2708, ctc_loss=0.1948, cr_loss=0.3801, over 15087.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1668, cr_loss=0.3783, over 3091613.17 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:56:55,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-23 06:56:56,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=202374.66666666666, ans=0.0 2024-09-23 06:57:27,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=202468.0, ans=0.025 2024-09-23 06:57:32,270 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:57:33,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=202468.0, ans=0.125 2024-09-23 06:57:54,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=202514.66666666666, ans=0.025 2024-09-23 06:58:00,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2024-09-23 06:58:00,477 INFO [train.py:1198] (0/4) Epoch 12, batch 550, loss[loss=0.2605, ctc_loss=0.1792, cr_loss=0.4064, over 17054.00 frames. ], tot_loss[loss=0.2428, ctc_loss=0.1672, cr_loss=0.3781, over 3147125.19 frames. ], batch size: 46, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:58:14,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=202608.0, ans=0.025 2024-09-23 06:58:20,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2024-09-23 06:58:54,272 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.042e+02 1.321e+02 1.421e+02 1.522e+02 2.085e+02, threshold=2.842e+02, percent-clipped=0.0 2024-09-23 06:58:54,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202701.33333333334, ans=0.1 2024-09-23 06:59:07,398 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 06:59:17,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=202748.0, ans=0.125 2024-09-23 06:59:19,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=202748.0, ans=0.0 2024-09-23 06:59:22,365 INFO [train.py:1198] (0/4) Epoch 12, batch 600, loss[loss=0.2358, ctc_loss=0.1593, cr_loss=0.3827, over 17301.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1677, cr_loss=0.379, over 3190127.37 frames. ], batch size: 51, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 06:59:45,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=202841.33333333334, ans=0.07 2024-09-23 07:00:25,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.37 vs. limit=15.0 2024-09-23 07:00:36,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-09-23 07:00:50,059 INFO [train.py:1198] (0/4) Epoch 12, batch 650, loss[loss=0.2569, ctc_loss=0.1762, cr_loss=0.4033, over 17005.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1653, cr_loss=0.3758, over 3232319.24 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:00:53,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=203028.0, ans=0.125 2024-09-23 07:00:53,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=203028.0, ans=0.2 2024-09-23 07:01:12,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=203074.66666666666, ans=0.125 2024-09-23 07:01:16,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2024-09-23 07:01:19,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=203074.66666666666, ans=22.5 2024-09-23 07:01:22,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=203121.33333333334, ans=0.0 2024-09-23 07:01:38,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=12.0 2024-09-23 07:01:44,224 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.271e+02 1.387e+02 1.567e+02 2.285e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-23 07:01:48,494 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.57 vs. limit=15.0 2024-09-23 07:01:57,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=203214.66666666666, ans=0.125 2024-09-23 07:02:04,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.58 vs. limit=22.5 2024-09-23 07:02:09,752 INFO [train.py:1198] (0/4) Epoch 12, batch 700, loss[loss=0.2565, ctc_loss=0.1794, cr_loss=0.3859, over 17294.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1666, cr_loss=0.3781, over 3267390.36 frames. ], batch size: 51, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:02:56,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=203401.33333333334, ans=0.125 2024-09-23 07:03:23,881 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 07:03:29,849 INFO [train.py:1198] (0/4) Epoch 12, batch 750, loss[loss=0.2476, ctc_loss=0.1719, cr_loss=0.3784, over 16859.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1666, cr_loss=0.3772, over 3272670.90 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:03:31,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=203494.66666666666, ans=0.02 2024-09-23 07:03:33,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=203494.66666666666, ans=0.125 2024-09-23 07:03:47,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=203541.33333333334, ans=0.125 2024-09-23 07:03:47,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=203541.33333333334, ans=0.125 2024-09-23 07:04:02,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=203588.0, ans=0.1 2024-09-23 07:04:12,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=203588.0, ans=0.125 2024-09-23 07:04:16,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-23 07:04:22,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=203634.66666666666, ans=0.125 2024-09-23 07:04:29,347 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.298e+02 1.398e+02 1.549e+02 2.740e+02, threshold=2.796e+02, percent-clipped=0.0 2024-09-23 07:04:44,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=203681.33333333334, ans=0.0 2024-09-23 07:04:57,397 INFO [train.py:1198] (0/4) Epoch 12, batch 800, loss[loss=0.2514, ctc_loss=0.1721, cr_loss=0.3964, over 17306.00 frames. ], tot_loss[loss=0.2421, ctc_loss=0.1666, cr_loss=0.3777, over 3294841.56 frames. ], batch size: 51, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:05:02,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=203728.0, ans=0.125 2024-09-23 07:05:06,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2024-09-23 07:05:21,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=203774.66666666666, ans=0.125 2024-09-23 07:05:22,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=203774.66666666666, ans=0.0 2024-09-23 07:05:28,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=203774.66666666666, ans=0.125 2024-09-23 07:05:45,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=203868.0, ans=0.125 2024-09-23 07:06:04,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=203914.66666666666, ans=0.2 2024-09-23 07:06:18,742 INFO [train.py:1198] (0/4) Epoch 12, batch 850, loss[loss=0.2889, ctc_loss=0.2032, cr_loss=0.4285, over 14769.00 frames. ], tot_loss[loss=0.2422, ctc_loss=0.1665, cr_loss=0.3781, over 3312043.90 frames. ], batch size: 88, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:06:38,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=204008.0, ans=0.0 2024-09-23 07:06:49,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204054.66666666666, ans=0.1 2024-09-23 07:06:50,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=204054.66666666666, ans=0.0 2024-09-23 07:07:12,568 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.288e+02 1.402e+02 1.556e+02 2.231e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-23 07:07:18,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=22.5 2024-09-23 07:07:21,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2024-09-23 07:07:35,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=204148.0, ans=0.125 2024-09-23 07:07:38,075 INFO [train.py:1198] (0/4) Epoch 12, batch 900, loss[loss=0.2504, ctc_loss=0.1706, cr_loss=0.3989, over 17369.00 frames. ], tot_loss[loss=0.2429, ctc_loss=0.167, cr_loss=0.3791, over 3323964.94 frames. ], batch size: 48, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:07:44,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=204194.66666666666, ans=0.2 2024-09-23 07:07:58,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=204241.33333333334, ans=0.0 2024-09-23 07:08:00,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=204241.33333333334, ans=0.0 2024-09-23 07:08:05,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=204241.33333333334, ans=0.025 2024-09-23 07:08:26,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=204334.66666666666, ans=0.2 2024-09-23 07:08:37,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204334.66666666666, ans=0.1 2024-09-23 07:08:57,660 INFO [train.py:1198] (0/4) Epoch 12, batch 950, loss[loss=0.2719, ctc_loss=0.1883, cr_loss=0.4179, over 16787.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1665, cr_loss=0.3789, over 3334615.77 frames. ], batch size: 61, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:09:26,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=204474.66666666666, ans=0.05 2024-09-23 07:09:42,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204521.33333333334, ans=0.1 2024-09-23 07:09:45,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204521.33333333334, ans=0.1 2024-09-23 07:09:58,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=204568.0, ans=0.2 2024-09-23 07:09:59,531 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.295e+02 1.392e+02 1.548e+02 2.202e+02, threshold=2.785e+02, percent-clipped=0.0 2024-09-23 07:10:25,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=204614.66666666666, ans=0.125 2024-09-23 07:10:28,003 INFO [train.py:1198] (0/4) Epoch 12, batch 1000, loss[loss=0.2228, ctc_loss=0.1516, cr_loss=0.3556, over 16999.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1665, cr_loss=0.3791, over 3344939.22 frames. ], batch size: 44, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:10:31,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=204661.33333333334, ans=0.0 2024-09-23 07:10:33,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=204661.33333333334, ans=0.125 2024-09-23 07:10:34,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=204661.33333333334, ans=0.125 2024-09-23 07:11:04,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204754.66666666666, ans=0.1 2024-09-23 07:11:05,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=204754.66666666666, ans=0.125 2024-09-23 07:11:17,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=204801.33333333334, ans=0.2 2024-09-23 07:11:47,282 INFO [train.py:1198] (0/4) Epoch 12, batch 1050, loss[loss=0.2367, ctc_loss=0.1607, cr_loss=0.3799, over 17150.00 frames. ], tot_loss[loss=0.2434, ctc_loss=0.1674, cr_loss=0.3804, over 3351080.29 frames. ], batch size: 48, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:11:52,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=204894.66666666666, ans=0.125 2024-09-23 07:11:57,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=204894.66666666666, ans=0.125 2024-09-23 07:12:27,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=204988.0, ans=0.2 2024-09-23 07:12:41,478 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.299e+02 1.476e+02 1.734e+02 2.912e+02, threshold=2.951e+02, percent-clipped=1.0 2024-09-23 07:13:07,027 INFO [train.py:1198] (0/4) Epoch 12, batch 1100, loss[loss=0.274, ctc_loss=0.1858, cr_loss=0.4409, over 17304.00 frames. ], tot_loss[loss=0.2447, ctc_loss=0.1684, cr_loss=0.3817, over 3346072.73 frames. ], batch size: 51, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:13:07,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=205128.0, ans=0.2 2024-09-23 07:13:39,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=205221.33333333334, ans=0.125 2024-09-23 07:13:45,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=205221.33333333334, ans=0.95 2024-09-23 07:14:20,913 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-44000.pt 2024-09-23 07:14:23,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=205314.66666666666, ans=0.125 2024-09-23 07:14:29,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-09-23 07:14:36,515 INFO [train.py:1198] (0/4) Epoch 12, batch 1150, loss[loss=0.2574, ctc_loss=0.1822, cr_loss=0.3756, over 16563.00 frames. ], tot_loss[loss=0.2442, ctc_loss=0.1681, cr_loss=0.3803, over 3344544.19 frames. ], batch size: 66, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:14:49,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=205361.33333333334, ans=0.125 2024-09-23 07:15:04,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=205408.0, ans=0.05 2024-09-23 07:15:09,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=205454.66666666666, ans=0.0 2024-09-23 07:15:12,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=205454.66666666666, ans=0.0 2024-09-23 07:15:14,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=205454.66666666666, ans=0.125 2024-09-23 07:15:21,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=205454.66666666666, ans=0.1 2024-09-23 07:15:28,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=205501.33333333334, ans=0.125 2024-09-23 07:15:32,628 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.321e+02 1.590e+02 1.805e+02 2.687e+02, threshold=3.179e+02, percent-clipped=0.0 2024-09-23 07:15:36,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.36 vs. limit=15.0 2024-09-23 07:15:58,029 INFO [train.py:1198] (0/4) Epoch 12, batch 1200, loss[loss=0.2131, ctc_loss=0.1432, cr_loss=0.3499, over 17027.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1675, cr_loss=0.3792, over 3340394.63 frames. ], batch size: 44, lr: 1.02e-02, grad_scale: 32.0 2024-09-23 07:16:22,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-09-23 07:17:17,302 INFO [train.py:1198] (0/4) Epoch 12, batch 1250, loss[loss=0.2447, ctc_loss=0.1666, cr_loss=0.3906, over 17272.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1673, cr_loss=0.3786, over 3337777.59 frames. ], batch size: 42, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:17:52,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=205921.33333333334, ans=0.125 2024-09-23 07:18:12,962 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.281e+02 1.384e+02 1.564e+02 2.899e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-23 07:18:30,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=206014.66666666666, ans=0.025 2024-09-23 07:18:32,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=206014.66666666666, ans=0.125 2024-09-23 07:18:36,853 INFO [train.py:1198] (0/4) Epoch 12, batch 1300, loss[loss=0.2629, ctc_loss=0.1828, cr_loss=0.4009, over 17308.00 frames. ], tot_loss[loss=0.2433, ctc_loss=0.1676, cr_loss=0.3787, over 3339884.46 frames. ], batch size: 51, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:18:45,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2024-09-23 07:19:00,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=206108.0, ans=0.2 2024-09-23 07:19:16,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=206154.66666666666, ans=0.125 2024-09-23 07:19:17,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=206154.66666666666, ans=0.04949747468305833 2024-09-23 07:19:46,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=206248.0, ans=0.125 2024-09-23 07:19:51,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.43 vs. limit=10.0 2024-09-23 07:19:54,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=206248.0, ans=0.1 2024-09-23 07:20:03,898 INFO [train.py:1198] (0/4) Epoch 12, batch 1350, loss[loss=0.2673, ctc_loss=0.1872, cr_loss=0.4007, over 15065.00 frames. ], tot_loss[loss=0.2437, ctc_loss=0.1679, cr_loss=0.3786, over 3334582.91 frames. ], batch size: 89, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:20:17,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=206294.66666666666, ans=0.125 2024-09-23 07:20:17,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=206294.66666666666, ans=0.125 2024-09-23 07:20:33,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=206341.33333333334, ans=0.0 2024-09-23 07:21:02,411 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.350e+02 1.483e+02 1.734e+02 2.832e+02, threshold=2.966e+02, percent-clipped=2.0 2024-09-23 07:21:13,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=206481.33333333334, ans=0.125 2024-09-23 07:21:26,076 INFO [train.py:1198] (0/4) Epoch 12, batch 1400, loss[loss=0.2391, ctc_loss=0.1639, cr_loss=0.3762, over 16990.00 frames. ], tot_loss[loss=0.2444, ctc_loss=0.1685, cr_loss=0.3793, over 3325795.17 frames. ], batch size: 53, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:21:31,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2024-09-23 07:21:37,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=206528.0, ans=0.125 2024-09-23 07:21:43,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.51 vs. limit=10.0 2024-09-23 07:21:45,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=206574.66666666666, ans=0.125 2024-09-23 07:21:57,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2024-09-23 07:22:11,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=206621.33333333334, ans=0.0 2024-09-23 07:22:20,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=206668.0, ans=0.125 2024-09-23 07:22:32,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=206714.66666666666, ans=0.2 2024-09-23 07:22:46,291 INFO [train.py:1198] (0/4) Epoch 12, batch 1450, loss[loss=0.2283, ctc_loss=0.1579, cr_loss=0.3516, over 17103.00 frames. ], tot_loss[loss=0.243, ctc_loss=0.1674, cr_loss=0.3779, over 3336632.64 frames. ], batch size: 49, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:23:14,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-23 07:23:36,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=206901.33333333334, ans=0.0 2024-09-23 07:23:42,178 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.321e+02 1.435e+02 1.538e+02 2.142e+02, threshold=2.870e+02, percent-clipped=0.0 2024-09-23 07:23:45,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=206901.33333333334, ans=0.0 2024-09-23 07:23:59,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=206948.0, ans=0.025 2024-09-23 07:24:10,786 INFO [train.py:1198] (0/4) Epoch 12, batch 1500, loss[loss=0.2492, ctc_loss=0.1738, cr_loss=0.3768, over 17230.00 frames. ], tot_loss[loss=0.2409, ctc_loss=0.1658, cr_loss=0.3756, over 3353408.94 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:24:11,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=206994.66666666666, ans=0.125 2024-09-23 07:24:25,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=206994.66666666666, ans=15.0 2024-09-23 07:24:27,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=207041.33333333334, ans=0.1 2024-09-23 07:24:46,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=207088.0, ans=0.0 2024-09-23 07:24:57,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=207088.0, ans=0.125 2024-09-23 07:25:21,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=207181.33333333334, ans=0.125 2024-09-23 07:25:28,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2024-09-23 07:25:32,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=207181.33333333334, ans=0.125 2024-09-23 07:25:35,347 INFO [train.py:1198] (0/4) Epoch 12, batch 1550, loss[loss=0.2384, ctc_loss=0.1643, cr_loss=0.3708, over 17357.00 frames. ], tot_loss[loss=0.2418, ctc_loss=0.1664, cr_loss=0.3768, over 3359942.60 frames. ], batch size: 48, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:25:37,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-09-23 07:25:53,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=207274.66666666666, ans=0.125 2024-09-23 07:25:54,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=207274.66666666666, ans=0.95 2024-09-23 07:26:12,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=207321.33333333334, ans=0.125 2024-09-23 07:26:30,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=207368.0, ans=0.125 2024-09-23 07:26:31,712 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.334e+02 1.473e+02 1.633e+02 2.267e+02, threshold=2.947e+02, percent-clipped=0.0 2024-09-23 07:26:44,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207414.66666666666, ans=0.1 2024-09-23 07:26:46,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=207414.66666666666, ans=0.07 2024-09-23 07:26:55,795 INFO [train.py:1198] (0/4) Epoch 12, batch 1600, loss[loss=0.209, ctc_loss=0.144, cr_loss=0.3248, over 17113.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.1661, cr_loss=0.3762, over 3361435.04 frames. ], batch size: 40, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:26:56,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=207461.33333333334, ans=0.1 2024-09-23 07:27:28,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=12.0 2024-09-23 07:27:43,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=207601.33333333334, ans=0.125 2024-09-23 07:28:15,591 INFO [train.py:1198] (0/4) Epoch 12, batch 1650, loss[loss=0.2642, ctc_loss=0.1819, cr_loss=0.4114, over 16181.00 frames. ], tot_loss[loss=0.2413, ctc_loss=0.166, cr_loss=0.3766, over 3355336.13 frames. ], batch size: 74, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:28:15,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=207694.66666666666, ans=0.125 2024-09-23 07:28:42,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=207741.33333333334, ans=0.125 2024-09-23 07:28:42,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=207741.33333333334, ans=0.0 2024-09-23 07:29:13,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=207834.66666666666, ans=0.0 2024-09-23 07:29:16,235 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.260e+02 1.353e+02 1.462e+02 2.073e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-23 07:29:33,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-09-23 07:29:42,767 INFO [train.py:1198] (0/4) Epoch 12, batch 1700, loss[loss=0.2389, ctc_loss=0.1618, cr_loss=0.3855, over 16999.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1657, cr_loss=0.3768, over 3363434.36 frames. ], batch size: 44, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:30:01,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=207974.66666666666, ans=0.125 2024-09-23 07:30:05,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=207974.66666666666, ans=0.125 2024-09-23 07:30:53,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2024-09-23 07:31:06,204 INFO [train.py:1198] (0/4) Epoch 12, batch 1750, loss[loss=0.244, ctc_loss=0.1677, cr_loss=0.3815, over 17302.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1661, cr_loss=0.3769, over 3358330.32 frames. ], batch size: 49, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:31:24,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2024-09-23 07:31:31,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=22.5 2024-09-23 07:31:46,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=208254.66666666666, ans=0.125 2024-09-23 07:31:53,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=208301.33333333334, ans=0.2 2024-09-23 07:31:57,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=208301.33333333334, ans=0.0 2024-09-23 07:32:02,197 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.329e+02 1.442e+02 1.637e+02 2.396e+02, threshold=2.884e+02, percent-clipped=0.0 2024-09-23 07:32:08,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=208348.0, ans=0.125 2024-09-23 07:32:13,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=208348.0, ans=0.2 2024-09-23 07:32:25,988 INFO [train.py:1198] (0/4) Epoch 12, batch 1800, loss[loss=0.2381, ctc_loss=0.1655, cr_loss=0.3629, over 17048.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1655, cr_loss=0.3761, over 3361363.98 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:32:26,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=208394.66666666666, ans=0.0 2024-09-23 07:33:03,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2024-09-23 07:33:04,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=208488.0, ans=0.2 2024-09-23 07:33:45,667 INFO [train.py:1198] (0/4) Epoch 12, batch 1850, loss[loss=0.2669, ctc_loss=0.1872, cr_loss=0.3989, over 17207.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1655, cr_loss=0.3757, over 3370735.96 frames. ], batch size: 47, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:33:47,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=208628.0, ans=0.125 2024-09-23 07:34:07,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-23 07:34:19,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=208674.66666666666, ans=0.125 2024-09-23 07:34:48,756 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.290e+02 1.399e+02 1.526e+02 2.337e+02, threshold=2.797e+02, percent-clipped=0.0 2024-09-23 07:34:49,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=208768.0, ans=0.0 2024-09-23 07:34:50,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=208768.0, ans=0.04949747468305833 2024-09-23 07:34:53,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=208768.0, ans=0.125 2024-09-23 07:34:58,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=208814.66666666666, ans=0.0 2024-09-23 07:35:10,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=208814.66666666666, ans=0.125 2024-09-23 07:35:15,125 INFO [train.py:1198] (0/4) Epoch 12, batch 1900, loss[loss=0.298, ctc_loss=0.2151, cr_loss=0.4142, over 12135.00 frames. ], tot_loss[loss=0.241, ctc_loss=0.1658, cr_loss=0.3762, over 3360359.25 frames. ], batch size: 123, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:35:20,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=208861.33333333334, ans=0.2 2024-09-23 07:35:20,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=208861.33333333334, ans=0.1 2024-09-23 07:35:42,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=208908.0, ans=0.125 2024-09-23 07:35:51,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=208954.66666666666, ans=0.04949747468305833 2024-09-23 07:35:56,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=208954.66666666666, ans=0.125 2024-09-23 07:36:12,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-09-23 07:36:34,862 INFO [train.py:1198] (0/4) Epoch 12, batch 1950, loss[loss=0.2786, ctc_loss=0.1982, cr_loss=0.4017, over 11354.00 frames. ], tot_loss[loss=0.2408, ctc_loss=0.1656, cr_loss=0.3761, over 3356007.85 frames. ], batch size: 124, lr: 1.01e-02, grad_scale: 16.0 2024-09-23 07:37:05,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=209188.0, ans=0.0 2024-09-23 07:37:06,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=209188.0, ans=0.0 2024-09-23 07:37:31,804 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.280e+02 1.393e+02 1.533e+02 3.563e+02, threshold=2.786e+02, percent-clipped=1.0 2024-09-23 07:37:54,116 INFO [train.py:1198] (0/4) Epoch 12, batch 2000, loss[loss=0.248, ctc_loss=0.1715, cr_loss=0.3824, over 17021.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1652, cr_loss=0.3761, over 3362392.47 frames. ], batch size: 53, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:38:47,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=209468.0, ans=0.125 2024-09-23 07:38:57,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=209468.0, ans=0.0 2024-09-23 07:39:21,410 INFO [train.py:1198] (0/4) Epoch 12, batch 2050, loss[loss=0.2452, ctc_loss=0.1682, cr_loss=0.3853, over 16033.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1651, cr_loss=0.3759, over 3360397.00 frames. ], batch size: 74, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:39:51,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=209654.66666666666, ans=0.025 2024-09-23 07:40:07,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=209654.66666666666, ans=0.125 2024-09-23 07:40:16,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=12.0 2024-09-23 07:40:21,041 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.313e+02 1.485e+02 1.661e+02 2.450e+02, threshold=2.969e+02, percent-clipped=0.0 2024-09-23 07:40:21,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=209701.33333333334, ans=0.125 2024-09-23 07:40:34,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209748.0, ans=0.1 2024-09-23 07:40:43,461 INFO [train.py:1198] (0/4) Epoch 12, batch 2100, loss[loss=0.2833, ctc_loss=0.1962, cr_loss=0.4353, over 16551.00 frames. ], tot_loss[loss=0.2407, ctc_loss=0.1654, cr_loss=0.3764, over 3359213.24 frames. ], batch size: 66, lr: 1.01e-02, grad_scale: 32.0 2024-09-23 07:40:46,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=209794.66666666666, ans=0.125 2024-09-23 07:40:51,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=209794.66666666666, ans=0.125 2024-09-23 07:40:56,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=209794.66666666666, ans=0.04949747468305833 2024-09-23 07:41:06,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=209841.33333333334, ans=0.125 2024-09-23 07:41:31,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209934.66666666666, ans=0.1 2024-09-23 07:41:34,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=209934.66666666666, ans=0.2 2024-09-23 07:41:37,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=209934.66666666666, ans=0.125 2024-09-23 07:42:00,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=209981.33333333334, ans=0.125 2024-09-23 07:42:02,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=210028.0, ans=0.07 2024-09-23 07:42:03,575 INFO [train.py:1198] (0/4) Epoch 12, batch 2150, loss[loss=0.2687, ctc_loss=0.1868, cr_loss=0.4097, over 17021.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1652, cr_loss=0.3763, over 3355714.02 frames. ], batch size: 51, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:42:03,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=210028.0, ans=0.025 2024-09-23 07:42:29,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210074.66666666666, ans=0.1 2024-09-23 07:42:32,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=210074.66666666666, ans=0.2 2024-09-23 07:42:43,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=210121.33333333334, ans=0.2 2024-09-23 07:43:00,522 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.318e+02 1.400e+02 1.574e+02 2.180e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-23 07:43:20,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=22.5 2024-09-23 07:43:22,705 INFO [train.py:1198] (0/4) Epoch 12, batch 2200, loss[loss=0.249, ctc_loss=0.172, cr_loss=0.3847, over 17008.00 frames. ], tot_loss[loss=0.2423, ctc_loss=0.1665, cr_loss=0.3789, over 3355118.11 frames. ], batch size: 51, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:43:27,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=210261.33333333334, ans=0.125 2024-09-23 07:44:01,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=210354.66666666666, ans=0.0 2024-09-23 07:44:04,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=210354.66666666666, ans=0.125 2024-09-23 07:44:06,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=210354.66666666666, ans=0.0 2024-09-23 07:44:15,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=210354.66666666666, ans=0.025 2024-09-23 07:44:40,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=210448.0, ans=0.025 2024-09-23 07:44:50,081 INFO [train.py:1198] (0/4) Epoch 12, batch 2250, loss[loss=0.2342, ctc_loss=0.1584, cr_loss=0.379, over 16726.00 frames. ], tot_loss[loss=0.2427, ctc_loss=0.1667, cr_loss=0.3799, over 3355536.72 frames. ], batch size: 61, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:45:10,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2024-09-23 07:45:18,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=210541.33333333334, ans=0.125 2024-09-23 07:45:40,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-09-23 07:45:43,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=210634.66666666666, ans=0.0 2024-09-23 07:45:46,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=210634.66666666666, ans=0.2 2024-09-23 07:45:49,556 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.306e+02 1.486e+02 1.708e+02 2.318e+02, threshold=2.971e+02, percent-clipped=0.0 2024-09-23 07:45:54,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=210681.33333333334, ans=0.125 2024-09-23 07:46:04,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=210681.33333333334, ans=0.125 2024-09-23 07:46:11,908 INFO [train.py:1198] (0/4) Epoch 12, batch 2300, loss[loss=0.2438, ctc_loss=0.1647, cr_loss=0.3958, over 17018.00 frames. ], tot_loss[loss=0.2443, ctc_loss=0.1682, cr_loss=0.3808, over 3350367.89 frames. ], batch size: 51, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:46:17,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=210728.0, ans=0.125 2024-09-23 07:46:19,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.97 vs. limit=10.0 2024-09-23 07:46:21,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=210728.0, ans=0.0 2024-09-23 07:46:44,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=210821.33333333334, ans=0.5 2024-09-23 07:46:52,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=210821.33333333334, ans=0.125 2024-09-23 07:46:55,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210821.33333333334, ans=0.1 2024-09-23 07:47:03,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=210868.0, ans=0.125 2024-09-23 07:47:16,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=210914.66666666666, ans=0.125 2024-09-23 07:47:26,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=210914.66666666666, ans=0.0 2024-09-23 07:47:32,371 INFO [train.py:1198] (0/4) Epoch 12, batch 2350, loss[loss=0.2336, ctc_loss=0.1618, cr_loss=0.3592, over 17311.00 frames. ], tot_loss[loss=0.2435, ctc_loss=0.1675, cr_loss=0.38, over 3355163.42 frames. ], batch size: 51, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:47:56,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211008.0, ans=0.1 2024-09-23 07:48:07,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2024-09-23 07:48:09,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2024-09-23 07:48:29,120 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.329e+02 1.457e+02 1.570e+02 2.040e+02, threshold=2.915e+02, percent-clipped=0.0 2024-09-23 07:48:33,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.53 vs. limit=15.0 2024-09-23 07:48:56,506 INFO [train.py:1198] (0/4) Epoch 12, batch 2400, loss[loss=0.2765, ctc_loss=0.2015, cr_loss=0.3753, over 12349.00 frames. ], tot_loss[loss=0.2431, ctc_loss=0.1672, cr_loss=0.3795, over 3348430.70 frames. ], batch size: 126, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:49:01,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=211194.66666666666, ans=0.2 2024-09-23 07:49:09,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=12.0 2024-09-23 07:50:21,841 INFO [train.py:1198] (0/4) Epoch 12, batch 2450, loss[loss=0.2005, ctc_loss=0.1362, cr_loss=0.3212, over 17075.00 frames. ], tot_loss[loss=0.2425, ctc_loss=0.1668, cr_loss=0.3785, over 3346500.50 frames. ], batch size: 39, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:50:31,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2024-09-23 07:50:57,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211521.33333333334, ans=0.1 2024-09-23 07:51:19,220 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.359e+02 1.540e+02 1.815e+02 2.529e+02, threshold=3.080e+02, percent-clipped=0.0 2024-09-23 07:51:21,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=211568.0, ans=0.125 2024-09-23 07:51:35,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=211614.66666666666, ans=0.125 2024-09-23 07:51:41,364 INFO [train.py:1198] (0/4) Epoch 12, batch 2500, loss[loss=0.2517, ctc_loss=0.1705, cr_loss=0.4063, over 17013.00 frames. ], tot_loss[loss=0.2415, ctc_loss=0.1659, cr_loss=0.378, over 3359357.12 frames. ], batch size: 53, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:51:46,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=211661.33333333334, ans=0.0 2024-09-23 07:52:20,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211754.66666666666, ans=0.1 2024-09-23 07:52:22,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2024-09-23 07:52:42,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2024-09-23 07:53:00,936 INFO [train.py:1198] (0/4) Epoch 12, batch 2550, loss[loss=0.1824, ctc_loss=0.1208, cr_loss=0.3082, over 16756.00 frames. ], tot_loss[loss=0.242, ctc_loss=0.1664, cr_loss=0.378, over 3354960.08 frames. ], batch size: 37, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:53:26,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=211941.33333333334, ans=0.0 2024-09-23 07:53:45,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=211988.0, ans=0.125 2024-09-23 07:54:00,930 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.326e+02 1.484e+02 1.777e+02 2.748e+02, threshold=2.968e+02, percent-clipped=0.0 2024-09-23 07:54:05,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=212034.66666666666, ans=0.0 2024-09-23 07:54:25,811 INFO [train.py:1198] (0/4) Epoch 12, batch 2600, loss[loss=0.2565, ctc_loss=0.1754, cr_loss=0.4053, over 17005.00 frames. ], tot_loss[loss=0.2397, ctc_loss=0.1645, cr_loss=0.3758, over 3363147.82 frames. ], batch size: 53, lr: 1.00e-02, grad_scale: 32.0 2024-09-23 07:54:54,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=212174.66666666666, ans=0.0 2024-09-23 07:55:20,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-09-23 07:55:43,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=212314.66666666666, ans=0.0 2024-09-23 07:55:43,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=212314.66666666666, ans=0.125 2024-09-23 07:55:46,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=212361.33333333334, ans=0.1 2024-09-23 07:55:47,928 INFO [train.py:1198] (0/4) Epoch 12, batch 2650, loss[loss=0.2314, ctc_loss=0.1556, cr_loss=0.3793, over 17069.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1648, cr_loss=0.3767, over 3354185.38 frames. ], batch size: 46, lr: 9.99e-03, grad_scale: 32.0 2024-09-23 07:56:01,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=212361.33333333334, ans=0.125 2024-09-23 07:56:09,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=212408.0, ans=0.125 2024-09-23 07:56:29,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-23 07:56:37,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=212501.33333333334, ans=0.025 2024-09-23 07:56:45,680 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.240e+02 1.338e+02 1.470e+02 2.352e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-23 07:56:46,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2024-09-23 07:56:57,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.07 vs. limit=6.0 2024-09-23 07:57:00,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=212548.0, ans=0.125 2024-09-23 07:57:08,182 INFO [train.py:1198] (0/4) Epoch 12, batch 2700, loss[loss=0.2585, ctc_loss=0.1808, cr_loss=0.3888, over 16576.00 frames. ], tot_loss[loss=0.2395, ctc_loss=0.1644, cr_loss=0.3756, over 3354399.35 frames. ], batch size: 66, lr: 9.99e-03, grad_scale: 32.0 2024-09-23 07:57:35,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=212641.33333333334, ans=0.1 2024-09-23 07:57:40,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=212688.0, ans=0.125 2024-09-23 07:57:54,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=212734.66666666666, ans=0.0 2024-09-23 07:58:28,279 INFO [train.py:1198] (0/4) Epoch 12, batch 2750, loss[loss=0.2683, ctc_loss=0.1881, cr_loss=0.4007, over 16565.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1642, cr_loss=0.3754, over 3361791.76 frames. ], batch size: 66, lr: 9.98e-03, grad_scale: 32.0 2024-09-23 07:58:34,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=212828.0, ans=0.0 2024-09-23 07:59:16,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=212921.33333333334, ans=0.125 2024-09-23 07:59:33,713 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.344e+02 1.475e+02 1.726e+02 2.611e+02, threshold=2.950e+02, percent-clipped=0.0 2024-09-23 07:59:58,919 INFO [train.py:1198] (0/4) Epoch 12, batch 2800, loss[loss=0.219, ctc_loss=0.1468, cr_loss=0.3607, over 16733.00 frames. ], tot_loss[loss=0.2385, ctc_loss=0.1635, cr_loss=0.3748, over 3364237.03 frames. ], batch size: 37, lr: 9.98e-03, grad_scale: 32.0 2024-09-23 08:00:03,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=213061.33333333334, ans=0.0 2024-09-23 08:00:19,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=213108.0, ans=0.0 2024-09-23 08:00:27,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=213108.0, ans=0.125 2024-09-23 08:00:33,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-09-23 08:00:42,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-09-23 08:00:54,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=213201.33333333334, ans=0.07 2024-09-23 08:01:01,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=213248.0, ans=0.2 2024-09-23 08:01:18,198 INFO [train.py:1198] (0/4) Epoch 12, batch 2850, loss[loss=0.3336, ctc_loss=0.2466, cr_loss=0.4346, over 11711.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1631, cr_loss=0.3742, over 3366346.19 frames. ], batch size: 123, lr: 9.97e-03, grad_scale: 32.0 2024-09-23 08:01:27,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-23 08:01:43,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2024-09-23 08:02:06,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=213434.66666666666, ans=0.0 2024-09-23 08:02:09,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=213434.66666666666, ans=0.05 2024-09-23 08:02:15,758 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.286e+02 1.451e+02 1.716e+02 2.634e+02, threshold=2.902e+02, percent-clipped=0.0 2024-09-23 08:02:38,158 INFO [train.py:1198] (0/4) Epoch 12, batch 2900, loss[loss=0.2202, ctc_loss=0.1533, cr_loss=0.3345, over 17145.00 frames. ], tot_loss[loss=0.24, ctc_loss=0.1646, cr_loss=0.3769, over 3359240.51 frames. ], batch size: 45, lr: 9.97e-03, grad_scale: 32.0 2024-09-23 08:03:11,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=213621.33333333334, ans=0.125 2024-09-23 08:03:44,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=213714.66666666666, ans=0.125 2024-09-23 08:04:04,996 INFO [train.py:1198] (0/4) Epoch 12, batch 2950, loss[loss=0.2626, ctc_loss=0.1814, cr_loss=0.406, over 16354.00 frames. ], tot_loss[loss=0.2414, ctc_loss=0.1658, cr_loss=0.378, over 3352571.28 frames. ], batch size: 66, lr: 9.96e-03, grad_scale: 32.0 2024-09-23 08:04:40,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=213854.66666666666, ans=0.125 2024-09-23 08:04:52,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=15.0 2024-09-23 08:04:58,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=213901.33333333334, ans=0.0 2024-09-23 08:05:04,863 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.287e+02 1.399e+02 1.568e+02 2.905e+02, threshold=2.798e+02, percent-clipped=1.0 2024-09-23 08:05:26,775 INFO [train.py:1198] (0/4) Epoch 12, batch 3000, loss[loss=0.2313, ctc_loss=0.1585, cr_loss=0.3639, over 17012.00 frames. ], tot_loss[loss=0.2403, ctc_loss=0.1649, cr_loss=0.377, over 3358115.31 frames. ], batch size: 44, lr: 9.96e-03, grad_scale: 32.0 2024-09-23 08:05:26,776 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 08:05:42,575 INFO [train.py:1230] (0/4) Epoch 12, validation: loss=0.04588, ctc_loss=0.04588, cr_loss=7.526e-15, over 944034.00 frames. 2024-09-23 08:05:42,576 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 08:05:47,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=213994.66666666666, ans=0.125 2024-09-23 08:06:01,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=214041.33333333334, ans=0.125 2024-09-23 08:06:26,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=22.5 2024-09-23 08:06:37,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=214134.66666666666, ans=0.125 2024-09-23 08:06:39,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=214134.66666666666, ans=0.0 2024-09-23 08:07:00,680 INFO [train.py:1198] (0/4) Epoch 12, batch 3050, loss[loss=0.3045, ctc_loss=0.2176, cr_loss=0.4343, over 17015.00 frames. ], tot_loss[loss=0.2412, ctc_loss=0.1656, cr_loss=0.3779, over 3352724.92 frames. ], batch size: 51, lr: 9.95e-03, grad_scale: 32.0 2024-09-23 08:07:05,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=214228.0, ans=0.0 2024-09-23 08:07:34,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214321.33333333334, ans=0.1 2024-09-23 08:07:56,199 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.305e+02 1.426e+02 1.616e+02 2.340e+02, threshold=2.852e+02, percent-clipped=0.0 2024-09-23 08:07:56,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=214368.0, ans=0.125 2024-09-23 08:08:17,700 INFO [train.py:1198] (0/4) Epoch 12, batch 3100, loss[loss=0.2282, ctc_loss=0.1569, cr_loss=0.3569, over 16944.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1651, cr_loss=0.3765, over 3345575.63 frames. ], batch size: 42, lr: 9.94e-03, grad_scale: 32.0 2024-09-23 08:08:28,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=214461.33333333334, ans=0.05 2024-09-23 08:08:35,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=214508.0, ans=0.125 2024-09-23 08:08:52,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=214554.66666666666, ans=0.125 2024-09-23 08:09:02,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=214601.33333333334, ans=0.2 2024-09-23 08:09:10,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214601.33333333334, ans=0.1 2024-09-23 08:09:21,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.85 vs. limit=15.0 2024-09-23 08:09:28,835 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:09:36,170 INFO [train.py:1198] (0/4) Epoch 12, batch 3150, loss[loss=0.2801, ctc_loss=0.1952, cr_loss=0.4245, over 16466.00 frames. ], tot_loss[loss=0.2405, ctc_loss=0.1651, cr_loss=0.377, over 3347869.95 frames. ], batch size: 66, lr: 9.94e-03, grad_scale: 32.0 2024-09-23 08:09:36,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=214694.66666666666, ans=0.0 2024-09-23 08:09:38,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=214694.66666666666, ans=12.0 2024-09-23 08:09:59,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214741.33333333334, ans=0.1 2024-09-23 08:10:02,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=214741.33333333334, ans=0.025 2024-09-23 08:10:21,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=214834.66666666666, ans=0.125 2024-09-23 08:10:21,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=214834.66666666666, ans=0.025 2024-09-23 08:10:32,318 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.379e+02 1.486e+02 1.636e+02 2.338e+02, threshold=2.971e+02, percent-clipped=0.0 2024-09-23 08:10:52,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=214928.0, ans=0.0 2024-09-23 08:10:54,109 INFO [train.py:1198] (0/4) Epoch 12, batch 3200, loss[loss=0.2962, ctc_loss=0.2182, cr_loss=0.3902, over 12082.00 frames. ], tot_loss[loss=0.2404, ctc_loss=0.1651, cr_loss=0.3766, over 3352728.71 frames. ], batch size: 123, lr: 9.93e-03, grad_scale: 32.0 2024-09-23 08:10:57,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=214928.0, ans=0.0 2024-09-23 08:11:40,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2024-09-23 08:11:46,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=215068.0, ans=0.2 2024-09-23 08:11:47,638 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:12:16,043 INFO [train.py:1198] (0/4) Epoch 12, batch 3250, loss[loss=0.1978, ctc_loss=0.1337, cr_loss=0.3203, over 17181.00 frames. ], tot_loss[loss=0.2388, ctc_loss=0.1637, cr_loss=0.3751, over 3361092.00 frames. ], batch size: 41, lr: 9.93e-03, grad_scale: 16.0 2024-09-23 08:12:30,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=215208.0, ans=0.025 2024-09-23 08:12:33,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=215208.0, ans=0.125 2024-09-23 08:12:56,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=8.0 2024-09-23 08:12:59,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=215254.66666666666, ans=0.0 2024-09-23 08:12:59,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=215254.66666666666, ans=0.125 2024-09-23 08:13:05,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2024-09-23 08:13:12,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=215301.33333333334, ans=0.125 2024-09-23 08:13:13,439 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.295e+02 1.414e+02 1.544e+02 2.184e+02, threshold=2.828e+02, percent-clipped=0.0 2024-09-23 08:13:35,944 INFO [train.py:1198] (0/4) Epoch 12, batch 3300, loss[loss=0.2443, ctc_loss=0.1677, cr_loss=0.3834, over 15338.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.164, cr_loss=0.3754, over 3366997.05 frames. ], batch size: 89, lr: 9.92e-03, grad_scale: 16.0 2024-09-23 08:14:01,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=215441.33333333334, ans=0.125 2024-09-23 08:14:12,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=215488.0, ans=0.0 2024-09-23 08:14:15,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=215488.0, ans=0.125 2024-09-23 08:14:24,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=215534.66666666666, ans=0.125 2024-09-23 08:14:39,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=12.0 2024-09-23 08:14:56,380 INFO [train.py:1198] (0/4) Epoch 12, batch 3350, loss[loss=0.2715, ctc_loss=0.187, cr_loss=0.4227, over 16784.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1635, cr_loss=0.3745, over 3365342.84 frames. ], batch size: 61, lr: 9.92e-03, grad_scale: 16.0 2024-09-23 08:15:06,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=215628.0, ans=0.125 2024-09-23 08:15:15,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=215674.66666666666, ans=0.09899494936611666 2024-09-23 08:15:25,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=215674.66666666666, ans=0.125 2024-09-23 08:15:25,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=215674.66666666666, ans=0.125 2024-09-23 08:15:31,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=215721.33333333334, ans=0.0 2024-09-23 08:15:54,078 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.291e+02 1.444e+02 1.665e+02 2.877e+02, threshold=2.888e+02, percent-clipped=1.0 2024-09-23 08:16:02,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=215814.66666666666, ans=0.2 2024-09-23 08:16:14,339 INFO [train.py:1198] (0/4) Epoch 12, batch 3400, loss[loss=0.23, ctc_loss=0.159, cr_loss=0.3547, over 17089.00 frames. ], tot_loss[loss=0.2393, ctc_loss=0.1641, cr_loss=0.3756, over 3364116.60 frames. ], batch size: 43, lr: 9.91e-03, grad_scale: 16.0 2024-09-23 08:16:19,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=215861.33333333334, ans=0.025 2024-09-23 08:16:24,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=22.5 2024-09-23 08:16:39,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215908.0, ans=0.1 2024-09-23 08:16:40,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=215908.0, ans=0.125 2024-09-23 08:16:53,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=215954.66666666666, ans=0.2 2024-09-23 08:17:07,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=216001.33333333334, ans=0.125 2024-09-23 08:17:08,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=216001.33333333334, ans=0.07 2024-09-23 08:17:32,349 INFO [train.py:1198] (0/4) Epoch 12, batch 3450, loss[loss=0.2279, ctc_loss=0.1534, cr_loss=0.3726, over 17011.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1636, cr_loss=0.3757, over 3375107.60 frames. ], batch size: 44, lr: 9.91e-03, grad_scale: 16.0 2024-09-23 08:18:16,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=216188.0, ans=0.125 2024-09-23 08:18:30,277 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.280e+02 1.416e+02 1.630e+02 3.213e+02, threshold=2.832e+02, percent-clipped=1.0 2024-09-23 08:18:44,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=216281.33333333334, ans=0.2 2024-09-23 08:18:50,514 INFO [train.py:1198] (0/4) Epoch 12, batch 3500, loss[loss=0.2241, ctc_loss=0.1529, cr_loss=0.3563, over 17002.00 frames. ], tot_loss[loss=0.2387, ctc_loss=0.1637, cr_loss=0.3746, over 3370403.33 frames. ], batch size: 56, lr: 9.90e-03, grad_scale: 16.0 2024-09-23 08:19:12,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=216374.66666666666, ans=0.2 2024-09-23 08:19:29,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=216421.33333333334, ans=0.025 2024-09-23 08:19:51,648 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2024-09-23 08:20:08,729 INFO [train.py:1198] (0/4) Epoch 12, batch 3550, loss[loss=0.1906, ctc_loss=0.1269, cr_loss=0.3184, over 17066.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1627, cr_loss=0.3728, over 3372145.04 frames. ], batch size: 39, lr: 9.90e-03, grad_scale: 16.0 2024-09-23 08:20:23,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=15.0 2024-09-23 08:20:25,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=216608.0, ans=0.125 2024-09-23 08:20:30,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=216608.0, ans=0.125 2024-09-23 08:21:05,845 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.266e+02 1.416e+02 1.619e+02 2.341e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 08:21:11,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=22.5 2024-09-23 08:21:28,067 INFO [train.py:1198] (0/4) Epoch 12, batch 3600, loss[loss=0.2461, ctc_loss=0.1634, cr_loss=0.4131, over 17012.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1628, cr_loss=0.3734, over 3369438.32 frames. ], batch size: 52, lr: 9.89e-03, grad_scale: 32.0 2024-09-23 08:21:31,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=216794.66666666666, ans=0.125 2024-09-23 08:21:34,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=216794.66666666666, ans=0.125 2024-09-23 08:22:08,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=22.5 2024-09-23 08:22:09,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=216888.0, ans=0.125 2024-09-23 08:22:33,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=216981.33333333334, ans=0.1 2024-09-23 08:22:37,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=216981.33333333334, ans=0.125 2024-09-23 08:22:48,415 INFO [train.py:1198] (0/4) Epoch 12, batch 3650, loss[loss=0.2235, ctc_loss=0.1529, cr_loss=0.3533, over 17069.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1623, cr_loss=0.3726, over 3365456.26 frames. ], batch size: 39, lr: 9.89e-03, grad_scale: 32.0 2024-09-23 08:22:58,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.03 vs. limit=10.0 2024-09-23 08:23:15,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=22.5 2024-09-23 08:23:21,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=217121.33333333334, ans=0.0 2024-09-23 08:23:27,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=217121.33333333334, ans=0.0 2024-09-23 08:23:45,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217168.0, ans=0.1 2024-09-23 08:23:50,118 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.266e+02 1.360e+02 1.445e+02 2.351e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-23 08:24:10,933 INFO [train.py:1198] (0/4) Epoch 12, batch 3700, loss[loss=0.2513, ctc_loss=0.1707, cr_loss=0.4031, over 17071.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1634, cr_loss=0.3744, over 3371778.28 frames. ], batch size: 46, lr: 9.88e-03, grad_scale: 32.0 2024-09-23 08:24:20,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=217261.33333333334, ans=0.125 2024-09-23 08:25:02,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=217401.33333333334, ans=0.125 2024-09-23 08:25:19,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-09-23 08:25:29,361 INFO [train.py:1198] (0/4) Epoch 12, batch 3750, loss[loss=0.2992, ctc_loss=0.2128, cr_loss=0.4323, over 15025.00 frames. ], tot_loss[loss=0.2386, ctc_loss=0.1637, cr_loss=0.3744, over 3354729.85 frames. ], batch size: 89, lr: 9.88e-03, grad_scale: 32.0 2024-09-23 08:25:37,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=217494.66666666666, ans=0.0 2024-09-23 08:25:41,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=217494.66666666666, ans=0.125 2024-09-23 08:26:03,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=217588.0, ans=0.125 2024-09-23 08:26:22,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=217634.66666666666, ans=0.025 2024-09-23 08:26:24,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.77 vs. limit=15.0 2024-09-23 08:26:26,722 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.365e+02 1.477e+02 1.661e+02 2.472e+02, threshold=2.954e+02, percent-clipped=0.0 2024-09-23 08:26:27,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=217634.66666666666, ans=0.125 2024-09-23 08:26:43,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=15.0 2024-09-23 08:26:45,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=217728.0, ans=0.125 2024-09-23 08:26:47,065 INFO [train.py:1198] (0/4) Epoch 12, batch 3800, loss[loss=0.2753, ctc_loss=0.1944, cr_loss=0.4047, over 16495.00 frames. ], tot_loss[loss=0.2402, ctc_loss=0.1651, cr_loss=0.3755, over 3328441.42 frames. ], batch size: 66, lr: 9.87e-03, grad_scale: 32.0 2024-09-23 08:27:08,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=217774.66666666666, ans=0.125 2024-09-23 08:27:29,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2024-09-23 08:27:30,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217821.33333333334, ans=0.1 2024-09-23 08:27:41,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=217868.0, ans=0.0 2024-09-23 08:27:42,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2024-09-23 08:27:53,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217914.66666666666, ans=0.1 2024-09-23 08:28:05,925 INFO [train.py:1198] (0/4) Epoch 12, batch 3850, loss[loss=0.2781, ctc_loss=0.1987, cr_loss=0.397, over 15053.00 frames. ], tot_loss[loss=0.2417, ctc_loss=0.1665, cr_loss=0.3758, over 3282415.91 frames. ], batch size: 89, lr: 9.87e-03, grad_scale: 32.0 2024-09-23 08:28:11,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2024-09-23 08:28:18,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=217961.33333333334, ans=0.0 2024-09-23 08:28:41,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=218054.66666666666, ans=0.125 2024-09-23 08:28:59,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=218101.33333333334, ans=0.125 2024-09-23 08:28:59,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=218101.33333333334, ans=0.125 2024-09-23 08:29:02,106 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.424e+02 1.630e+02 1.758e+02 2.384e+02, threshold=3.259e+02, percent-clipped=0.0 2024-09-23 08:29:11,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=12.0 2024-09-23 08:29:15,468 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-12.pt 2024-09-23 08:30:07,238 INFO [train.py:1198] (0/4) Epoch 13, batch 0, loss[loss=0.2448, ctc_loss=0.1689, cr_loss=0.3794, over 17149.00 frames. ], tot_loss[loss=0.2448, ctc_loss=0.1689, cr_loss=0.3794, over 17149.00 frames. ], batch size: 48, lr: 9.48e-03, grad_scale: 32.0 2024-09-23 08:30:07,239 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 08:30:15,986 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5760, 3.7755, 4.2533, 4.0306], device='cuda:0') 2024-09-23 08:30:22,747 INFO [train.py:1230] (0/4) Epoch 13, validation: loss=0.04407, ctc_loss=0.04407, cr_loss=7.62e-15, over 944034.00 frames. 2024-09-23 08:30:22,748 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 08:30:27,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=218176.0, ans=0.2 2024-09-23 08:30:29,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=15.0 2024-09-23 08:30:37,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=218222.66666666666, ans=0.025 2024-09-23 08:31:06,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=218269.33333333334, ans=0.2 2024-09-23 08:31:09,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=218316.0, ans=0.09899494936611666 2024-09-23 08:31:43,035 INFO [train.py:1198] (0/4) Epoch 13, batch 50, loss[loss=0.2445, ctc_loss=0.1652, cr_loss=0.3963, over 17129.00 frames. ], tot_loss[loss=0.2424, ctc_loss=0.1667, cr_loss=0.3783, over 753245.88 frames. ], batch size: 48, lr: 9.47e-03, grad_scale: 32.0 2024-09-23 08:31:56,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=218409.33333333334, ans=0.125 2024-09-23 08:32:15,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2024-09-23 08:32:31,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=22.5 2024-09-23 08:32:32,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=218549.33333333334, ans=0.125 2024-09-23 08:32:42,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=218549.33333333334, ans=0.2 2024-09-23 08:32:50,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=218596.0, ans=0.125 2024-09-23 08:32:51,666 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.296e+02 1.397e+02 1.549e+02 2.228e+02, threshold=2.794e+02, percent-clipped=0.0 2024-09-23 08:33:07,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=218642.66666666666, ans=0.125 2024-09-23 08:33:08,628 INFO [train.py:1198] (0/4) Epoch 13, batch 100, loss[loss=0.2189, ctc_loss=0.1456, cr_loss=0.3665, over 17016.00 frames. ], tot_loss[loss=0.2391, ctc_loss=0.1642, cr_loss=0.3742, over 1328526.35 frames. ], batch size: 44, lr: 9.47e-03, grad_scale: 32.0 2024-09-23 08:33:21,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=218642.66666666666, ans=0.1 2024-09-23 08:33:29,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218689.33333333334, ans=0.1 2024-09-23 08:33:31,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=218689.33333333334, ans=0.125 2024-09-23 08:33:47,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=218736.0, ans=0.0 2024-09-23 08:33:50,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=218736.0, ans=0.2 2024-09-23 08:34:14,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=12.0 2024-09-23 08:34:25,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=218829.33333333334, ans=0.125 2024-09-23 08:34:27,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=218876.0, ans=0.0 2024-09-23 08:34:28,453 INFO [train.py:1198] (0/4) Epoch 13, batch 150, loss[loss=0.2042, ctc_loss=0.1394, cr_loss=0.3238, over 16281.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1612, cr_loss=0.3709, over 1780415.09 frames. ], batch size: 36, lr: 9.46e-03, grad_scale: 32.0 2024-09-23 08:34:35,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=218876.0, ans=0.125 2024-09-23 08:35:05,305 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:35:06,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218969.33333333334, ans=0.1 2024-09-23 08:35:25,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=219016.0, ans=0.2 2024-09-23 08:35:35,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-09-23 08:35:36,675 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.277e+02 1.397e+02 1.518e+02 2.217e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 08:35:38,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-23 08:35:51,083 INFO [train.py:1198] (0/4) Epoch 13, batch 200, loss[loss=0.2571, ctc_loss=0.1801, cr_loss=0.3851, over 17043.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1618, cr_loss=0.3729, over 2135900.02 frames. ], batch size: 52, lr: 9.46e-03, grad_scale: 32.0 2024-09-23 08:36:30,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219202.66666666666, ans=0.1 2024-09-23 08:36:31,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=219202.66666666666, ans=0.04949747468305833 2024-09-23 08:36:53,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=219249.33333333334, ans=0.0 2024-09-23 08:36:55,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=219249.33333333334, ans=0.125 2024-09-23 08:36:58,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=219296.0, ans=0.025 2024-09-23 08:36:58,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=219296.0, ans=0.025 2024-09-23 08:37:16,178 INFO [train.py:1198] (0/4) Epoch 13, batch 250, loss[loss=0.2722, ctc_loss=0.195, cr_loss=0.3856, over 14862.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1621, cr_loss=0.3725, over 2405499.55 frames. ], batch size: 89, lr: 9.45e-03, grad_scale: 32.0 2024-09-23 08:38:24,171 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.243e+02 1.365e+02 1.573e+02 3.010e+02, threshold=2.729e+02, percent-clipped=2.0 2024-09-23 08:38:24,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219529.33333333334, ans=0.1 2024-09-23 08:38:24,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-23 08:38:30,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=219529.33333333334, ans=0.0 2024-09-23 08:38:38,516 INFO [train.py:1198] (0/4) Epoch 13, batch 300, loss[loss=0.2828, ctc_loss=0.2089, cr_loss=0.3693, over 11702.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1623, cr_loss=0.3735, over 2615000.80 frames. ], batch size: 124, lr: 9.45e-03, grad_scale: 32.0 2024-09-23 08:38:44,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2024-09-23 08:38:49,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-09-23 08:38:49,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=22.5 2024-09-23 08:39:17,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=219669.33333333334, ans=0.125 2024-09-23 08:39:52,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-09-23 08:39:53,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=219762.66666666666, ans=0.125 2024-09-23 08:39:58,341 INFO [train.py:1198] (0/4) Epoch 13, batch 350, loss[loss=0.2559, ctc_loss=0.18, cr_loss=0.3796, over 16846.00 frames. ], tot_loss[loss=0.2382, ctc_loss=0.1632, cr_loss=0.3751, over 2775224.83 frames. ], batch size: 58, lr: 9.44e-03, grad_scale: 32.0 2024-09-23 08:40:01,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=219809.33333333334, ans=0.125 2024-09-23 08:40:26,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=219856.0, ans=0.125 2024-09-23 08:40:34,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=219902.66666666666, ans=0.0 2024-09-23 08:40:45,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=219902.66666666666, ans=0.125 2024-09-23 08:41:06,039 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.335e+02 1.492e+02 1.717e+02 2.357e+02, threshold=2.983e+02, percent-clipped=0.0 2024-09-23 08:41:20,271 INFO [train.py:1198] (0/4) Epoch 13, batch 400, loss[loss=0.2447, ctc_loss=0.1698, cr_loss=0.3748, over 16913.00 frames. ], tot_loss[loss=0.2384, ctc_loss=0.1633, cr_loss=0.3755, over 2902167.08 frames. ], batch size: 58, lr: 9.44e-03, grad_scale: 32.0 2024-09-23 08:41:23,882 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:41:29,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.68 vs. limit=22.5 2024-09-23 08:41:48,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=220089.33333333334, ans=0.0 2024-09-23 08:42:25,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=220182.66666666666, ans=0.125 2024-09-23 08:42:36,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=220229.33333333334, ans=0.0 2024-09-23 08:42:36,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=220229.33333333334, ans=0.0 2024-09-23 08:42:41,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.66 vs. limit=10.0 2024-09-23 08:42:45,839 INFO [train.py:1198] (0/4) Epoch 13, batch 450, loss[loss=0.2405, ctc_loss=0.1678, cr_loss=0.3635, over 17055.00 frames. ], tot_loss[loss=0.2383, ctc_loss=0.1632, cr_loss=0.3753, over 3003984.37 frames. ], batch size: 39, lr: 9.43e-03, grad_scale: 32.0 2024-09-23 08:42:53,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=220276.0, ans=0.0 2024-09-23 08:43:03,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220322.66666666666, ans=0.1 2024-09-23 08:43:03,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=220322.66666666666, ans=0.0 2024-09-23 08:43:07,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=220322.66666666666, ans=0.025 2024-09-23 08:43:19,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=220369.33333333334, ans=0.125 2024-09-23 08:43:43,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=220416.0, ans=0.125 2024-09-23 08:43:53,978 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.263e+02 1.351e+02 1.525e+02 2.528e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-23 08:44:00,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=220462.66666666666, ans=0.125 2024-09-23 08:44:05,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2024-09-23 08:44:08,348 INFO [train.py:1198] (0/4) Epoch 13, batch 500, loss[loss=0.1972, ctc_loss=0.1319, cr_loss=0.3265, over 17022.00 frames. ], tot_loss[loss=0.2379, ctc_loss=0.1629, cr_loss=0.375, over 3089671.82 frames. ], batch size: 39, lr: 9.43e-03, grad_scale: 32.0 2024-09-23 08:44:16,922 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-09-23 08:44:26,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=220556.0, ans=0.2 2024-09-23 08:44:36,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2024-09-23 08:44:37,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=220556.0, ans=0.0 2024-09-23 08:44:57,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2024-09-23 08:45:03,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=220649.33333333334, ans=0.0 2024-09-23 08:45:04,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=220649.33333333334, ans=0.2 2024-09-23 08:45:31,235 INFO [train.py:1198] (0/4) Epoch 13, batch 550, loss[loss=0.2225, ctc_loss=0.1505, cr_loss=0.3601, over 17202.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1625, cr_loss=0.375, over 3156064.33 frames. ], batch size: 47, lr: 9.42e-03, grad_scale: 32.0 2024-09-23 08:46:09,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=220836.0, ans=0.125 2024-09-23 08:46:36,683 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.262e+02 1.359e+02 1.486e+02 2.281e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-23 08:46:56,573 INFO [train.py:1198] (0/4) Epoch 13, batch 600, loss[loss=0.2043, ctc_loss=0.1389, cr_loss=0.3269, over 17199.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1625, cr_loss=0.375, over 3208709.29 frames. ], batch size: 41, lr: 9.42e-03, grad_scale: 32.0 2024-09-23 08:47:05,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.64 vs. limit=15.0 2024-09-23 08:47:19,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=221022.66666666666, ans=0.2 2024-09-23 08:47:29,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.03 vs. limit=15.0 2024-09-23 08:48:09,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-09-23 08:48:18,961 INFO [train.py:1198] (0/4) Epoch 13, batch 650, loss[loss=0.1942, ctc_loss=0.1322, cr_loss=0.3101, over 16723.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1622, cr_loss=0.3751, over 3247334.06 frames. ], batch size: 37, lr: 9.41e-03, grad_scale: 32.0 2024-09-23 08:48:31,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-09-23 08:48:32,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=221209.33333333334, ans=0.125 2024-09-23 08:48:46,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=221256.0, ans=0.1 2024-09-23 08:48:51,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=221302.66666666666, ans=0.0 2024-09-23 08:49:23,894 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.284e+02 1.397e+02 1.593e+02 2.300e+02, threshold=2.794e+02, percent-clipped=0.0 2024-09-23 08:49:38,326 INFO [train.py:1198] (0/4) Epoch 13, batch 700, loss[loss=0.2074, ctc_loss=0.1426, cr_loss=0.324, over 17174.00 frames. ], tot_loss[loss=0.237, ctc_loss=0.1622, cr_loss=0.3742, over 3274216.83 frames. ], batch size: 41, lr: 9.41e-03, grad_scale: 32.0 2024-09-23 08:49:41,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=221442.66666666666, ans=0.125 2024-09-23 08:49:44,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=221442.66666666666, ans=0.0 2024-09-23 08:49:54,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=221489.33333333334, ans=0.0 2024-09-23 08:50:01,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=221489.33333333334, ans=0.0 2024-09-23 08:50:38,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-09-23 08:50:50,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=221629.33333333334, ans=0.125 2024-09-23 08:50:59,838 INFO [train.py:1198] (0/4) Epoch 13, batch 750, loss[loss=0.1999, ctc_loss=0.1321, cr_loss=0.3388, over 17119.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1613, cr_loss=0.3731, over 3296854.89 frames. ], batch size: 40, lr: 9.40e-03, grad_scale: 16.0 2024-09-23 08:51:17,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=221722.66666666666, ans=0.125 2024-09-23 08:51:37,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=221769.33333333334, ans=0.125 2024-09-23 08:52:05,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=221816.0, ans=0.125 2024-09-23 08:52:11,979 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.258e+02 1.365e+02 1.480e+02 2.047e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-23 08:52:24,632 INFO [train.py:1198] (0/4) Epoch 13, batch 800, loss[loss=0.289, ctc_loss=0.2122, cr_loss=0.384, over 11688.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1617, cr_loss=0.3738, over 3308832.71 frames. ], batch size: 123, lr: 9.40e-03, grad_scale: 32.0 2024-09-23 08:52:24,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=221909.33333333334, ans=0.125 2024-09-23 08:52:42,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=221956.0, ans=0.2 2024-09-23 08:52:50,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-23 08:53:00,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2024-09-23 08:53:42,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=222096.0, ans=0.2 2024-09-23 08:53:47,086 INFO [train.py:1198] (0/4) Epoch 13, batch 850, loss[loss=0.2749, ctc_loss=0.185, cr_loss=0.4499, over 17020.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.162, cr_loss=0.3746, over 3323812.58 frames. ], batch size: 53, lr: 9.39e-03, grad_scale: 32.0 2024-09-23 08:53:53,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=222142.66666666666, ans=0.0 2024-09-23 08:53:59,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2024-09-23 08:54:03,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=222189.33333333334, ans=0.125 2024-09-23 08:54:19,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222236.0, ans=0.1 2024-09-23 08:54:46,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=222282.66666666666, ans=0.0 2024-09-23 08:54:48,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=222282.66666666666, ans=0.0 2024-09-23 08:54:54,220 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.309e+02 1.415e+02 1.608e+02 2.192e+02, threshold=2.830e+02, percent-clipped=0.0 2024-09-23 08:55:06,970 INFO [train.py:1198] (0/4) Epoch 13, batch 900, loss[loss=0.2138, ctc_loss=0.1455, cr_loss=0.3413, over 16975.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.162, cr_loss=0.3736, over 3318720.26 frames. ], batch size: 42, lr: 9.39e-03, grad_scale: 32.0 2024-09-23 08:55:09,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2024-09-23 08:55:15,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=222376.0, ans=0.125 2024-09-23 08:55:15,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=222376.0, ans=0.125 2024-09-23 08:55:16,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=222376.0, ans=0.125 2024-09-23 08:55:30,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=222422.66666666666, ans=0.025 2024-09-23 08:55:33,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=222422.66666666666, ans=0.0 2024-09-23 08:55:49,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=222469.33333333334, ans=0.09899494936611666 2024-09-23 08:56:18,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=222562.66666666666, ans=0.025 2024-09-23 08:56:31,999 INFO [train.py:1198] (0/4) Epoch 13, batch 950, loss[loss=0.2824, ctc_loss=0.1997, cr_loss=0.4137, over 16517.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.162, cr_loss=0.3737, over 3329605.07 frames. ], batch size: 66, lr: 9.38e-03, grad_scale: 32.0 2024-09-23 08:56:55,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222656.0, ans=0.1 2024-09-23 08:57:06,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=222702.66666666666, ans=0.125 2024-09-23 08:57:21,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=222749.33333333334, ans=0.07 2024-09-23 08:57:29,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=222749.33333333334, ans=0.0 2024-09-23 08:57:41,764 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.336e+02 1.446e+02 1.624e+02 2.624e+02, threshold=2.892e+02, percent-clipped=0.0 2024-09-23 08:57:43,695 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 08:57:55,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=222842.66666666666, ans=0.125 2024-09-23 08:57:57,068 INFO [train.py:1198] (0/4) Epoch 13, batch 1000, loss[loss=0.2745, ctc_loss=0.1888, cr_loss=0.4284, over 16486.00 frames. ], tot_loss[loss=0.239, ctc_loss=0.1639, cr_loss=0.3757, over 3324297.03 frames. ], batch size: 66, lr: 9.38e-03, grad_scale: 32.0 2024-09-23 08:58:08,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=222842.66666666666, ans=0.125 2024-09-23 08:58:42,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.70 vs. limit=10.0 2024-09-23 08:59:09,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-23 08:59:16,891 INFO [train.py:1198] (0/4) Epoch 13, batch 1050, loss[loss=0.2833, ctc_loss=0.201, cr_loss=0.4115, over 15129.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1627, cr_loss=0.3741, over 3327380.84 frames. ], batch size: 89, lr: 9.37e-03, grad_scale: 32.0 2024-09-23 08:59:20,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=223076.0, ans=0.0 2024-09-23 08:59:28,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=223076.0, ans=0.125 2024-09-23 08:59:51,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=223169.33333333334, ans=0.0 2024-09-23 09:00:05,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=223216.0, ans=0.0 2024-09-23 09:00:19,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=223216.0, ans=0.2 2024-09-23 09:00:25,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=223262.66666666666, ans=0.0 2024-09-23 09:00:26,978 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.277e+02 1.397e+02 1.576e+02 3.836e+02, threshold=2.794e+02, percent-clipped=1.0 2024-09-23 09:00:31,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223262.66666666666, ans=0.0 2024-09-23 09:00:39,437 INFO [train.py:1198] (0/4) Epoch 13, batch 1100, loss[loss=0.2171, ctc_loss=0.1456, cr_loss=0.3577, over 17182.00 frames. ], tot_loss[loss=0.2375, ctc_loss=0.1626, cr_loss=0.3747, over 3338108.89 frames. ], batch size: 41, lr: 9.37e-03, grad_scale: 32.0 2024-09-23 09:00:57,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223356.0, ans=0.0 2024-09-23 09:01:10,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=22.5 2024-09-23 09:02:01,297 INFO [train.py:1198] (0/4) Epoch 13, batch 1150, loss[loss=0.2522, ctc_loss=0.1758, cr_loss=0.3819, over 17138.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1615, cr_loss=0.3729, over 3346466.34 frames. ], batch size: 48, lr: 9.37e-03, grad_scale: 32.0 2024-09-23 09:02:04,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=223542.66666666666, ans=0.125 2024-09-23 09:02:58,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=223682.66666666666, ans=0.125 2024-09-23 09:03:10,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=223729.33333333334, ans=0.07 2024-09-23 09:03:11,339 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.252e+02 1.365e+02 1.487e+02 2.591e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-23 09:03:23,906 INFO [train.py:1198] (0/4) Epoch 13, batch 1200, loss[loss=0.2661, ctc_loss=0.1832, cr_loss=0.4142, over 16756.00 frames. ], tot_loss[loss=0.2368, ctc_loss=0.1621, cr_loss=0.3736, over 3338047.69 frames. ], batch size: 61, lr: 9.36e-03, grad_scale: 32.0 2024-09-23 09:03:27,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=223776.0, ans=0.125 2024-09-23 09:03:33,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=223776.0, ans=0.0 2024-09-23 09:04:00,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=223869.33333333334, ans=0.0 2024-09-23 09:04:07,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=223869.33333333334, ans=0.125 2024-09-23 09:04:16,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=223916.0, ans=0.0 2024-09-23 09:04:39,396 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-48000.pt 2024-09-23 09:04:42,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-09-23 09:04:46,171 INFO [train.py:1198] (0/4) Epoch 13, batch 1250, loss[loss=0.2492, ctc_loss=0.1682, cr_loss=0.4048, over 17219.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1625, cr_loss=0.3733, over 3324707.40 frames. ], batch size: 55, lr: 9.36e-03, grad_scale: 32.0 2024-09-23 09:04:49,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=224009.33333333334, ans=0.0 2024-09-23 09:04:56,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=224009.33333333334, ans=0.1 2024-09-23 09:05:00,817 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:05:32,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-09-23 09:05:55,484 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.315e+02 1.381e+02 1.503e+02 2.464e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-23 09:06:05,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=224196.0, ans=0.125 2024-09-23 09:06:08,065 INFO [train.py:1198] (0/4) Epoch 13, batch 1300, loss[loss=0.285, ctc_loss=0.1977, cr_loss=0.4365, over 16393.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.1625, cr_loss=0.3732, over 3334588.41 frames. ], batch size: 66, lr: 9.35e-03, grad_scale: 32.0 2024-09-23 09:06:10,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.68 vs. limit=10.0 2024-09-23 09:06:28,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-23 09:06:46,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=224336.0, ans=0.0 2024-09-23 09:07:25,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-09-23 09:07:29,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=224429.33333333334, ans=0.0 2024-09-23 09:07:32,575 INFO [train.py:1198] (0/4) Epoch 13, batch 1350, loss[loss=0.2078, ctc_loss=0.1381, cr_loss=0.3485, over 17175.00 frames. ], tot_loss[loss=0.2361, ctc_loss=0.1615, cr_loss=0.373, over 3350192.52 frames. ], batch size: 41, lr: 9.35e-03, grad_scale: 32.0 2024-09-23 09:07:59,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=224522.66666666666, ans=0.125 2024-09-23 09:08:01,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=12.0 2024-09-23 09:08:10,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=224569.33333333334, ans=0.125 2024-09-23 09:08:30,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=224616.0, ans=0.125 2024-09-23 09:08:41,709 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.260e+02 1.380e+02 1.508e+02 2.232e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-23 09:08:53,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2024-09-23 09:08:54,699 INFO [train.py:1198] (0/4) Epoch 13, batch 1400, loss[loss=0.2421, ctc_loss=0.1626, cr_loss=0.3976, over 17221.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1625, cr_loss=0.3745, over 3349573.37 frames. ], batch size: 47, lr: 9.34e-03, grad_scale: 32.0 2024-09-23 09:09:15,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=224756.0, ans=0.2 2024-09-23 09:09:38,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=224802.66666666666, ans=0.0 2024-09-23 09:09:54,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=12.0 2024-09-23 09:10:17,266 INFO [train.py:1198] (0/4) Epoch 13, batch 1450, loss[loss=0.2283, ctc_loss=0.1585, cr_loss=0.3493, over 17357.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1623, cr_loss=0.3747, over 3361954.39 frames. ], batch size: 48, lr: 9.34e-03, grad_scale: 32.0 2024-09-23 09:10:43,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224989.33333333334, ans=0.1 2024-09-23 09:10:46,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=224989.33333333334, ans=0.0 2024-09-23 09:11:15,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=225082.66666666666, ans=0.0 2024-09-23 09:11:29,649 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.289e+02 1.416e+02 1.573e+02 2.089e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 09:11:42,313 INFO [train.py:1198] (0/4) Epoch 13, batch 1500, loss[loss=0.2139, ctc_loss=0.1458, cr_loss=0.3402, over 17019.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1624, cr_loss=0.3748, over 3364541.69 frames. ], batch size: 44, lr: 9.33e-03, grad_scale: 32.0 2024-09-23 09:11:45,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=225176.0, ans=0.125 2024-09-23 09:11:47,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=225176.0, ans=0.0 2024-09-23 09:12:03,202 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:12:54,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225362.66666666666, ans=0.1 2024-09-23 09:13:03,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=225409.33333333334, ans=0.1 2024-09-23 09:13:05,143 INFO [train.py:1198] (0/4) Epoch 13, batch 1550, loss[loss=0.2369, ctc_loss=0.1603, cr_loss=0.3829, over 17218.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1618, cr_loss=0.3735, over 3364777.99 frames. ], batch size: 55, lr: 9.33e-03, grad_scale: 32.0 2024-09-23 09:13:05,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=225409.33333333334, ans=0.0 2024-09-23 09:13:13,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2024-09-23 09:13:29,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=225456.0, ans=0.125 2024-09-23 09:13:31,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=225456.0, ans=0.125 2024-09-23 09:13:32,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=225456.0, ans=0.125 2024-09-23 09:14:01,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225549.33333333334, ans=0.1 2024-09-23 09:14:06,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2024-09-23 09:14:12,651 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.248e+02 1.360e+02 1.586e+02 2.535e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-23 09:14:13,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=12.0 2024-09-23 09:14:21,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-09-23 09:14:22,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=225596.0, ans=0.125 2024-09-23 09:14:25,437 INFO [train.py:1198] (0/4) Epoch 13, batch 1600, loss[loss=0.2116, ctc_loss=0.1432, cr_loss=0.3419, over 17262.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1617, cr_loss=0.3737, over 3361298.27 frames. ], batch size: 44, lr: 9.32e-03, grad_scale: 32.0 2024-09-23 09:14:51,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=225689.33333333334, ans=0.125 2024-09-23 09:15:14,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=225782.66666666666, ans=0.125 2024-09-23 09:15:18,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2024-09-23 09:15:21,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=225782.66666666666, ans=0.0 2024-09-23 09:15:34,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.23 vs. limit=10.0 2024-09-23 09:15:48,384 INFO [train.py:1198] (0/4) Epoch 13, batch 1650, loss[loss=0.2489, ctc_loss=0.1693, cr_loss=0.3981, over 16699.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1614, cr_loss=0.3736, over 3364795.42 frames. ], batch size: 61, lr: 9.32e-03, grad_scale: 32.0 2024-09-23 09:16:00,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=225876.0, ans=0.0 2024-09-23 09:16:09,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=225922.66666666666, ans=0.015 2024-09-23 09:16:28,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2024-09-23 09:16:36,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=225969.33333333334, ans=0.0 2024-09-23 09:16:46,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=226016.0, ans=0.0 2024-09-23 09:16:56,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=226062.66666666666, ans=0.1 2024-09-23 09:17:00,540 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.262e+02 1.363e+02 1.503e+02 2.167e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-23 09:17:02,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2024-09-23 09:17:08,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=226062.66666666666, ans=0.0 2024-09-23 09:17:13,266 INFO [train.py:1198] (0/4) Epoch 13, batch 1700, loss[loss=0.2186, ctc_loss=0.1487, cr_loss=0.3497, over 17246.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1608, cr_loss=0.3727, over 3362338.11 frames. ], batch size: 42, lr: 9.31e-03, grad_scale: 32.0 2024-09-23 09:17:21,585 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:17:56,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=226202.66666666666, ans=0.125 2024-09-23 09:18:18,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=226296.0, ans=0.025 2024-09-23 09:18:35,888 INFO [train.py:1198] (0/4) Epoch 13, batch 1750, loss[loss=0.2273, ctc_loss=0.1525, cr_loss=0.374, over 17002.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1605, cr_loss=0.3725, over 3364958.40 frames. ], batch size: 44, lr: 9.31e-03, grad_scale: 32.0 2024-09-23 09:18:55,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=226389.33333333334, ans=0.125 2024-09-23 09:19:35,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-09-23 09:19:36,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=226482.66666666666, ans=0.0 2024-09-23 09:19:42,591 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.255e+02 1.356e+02 1.518e+02 2.113e+02, threshold=2.712e+02, percent-clipped=0.0 2024-09-23 09:19:55,547 INFO [train.py:1198] (0/4) Epoch 13, batch 1800, loss[loss=0.1846, ctc_loss=0.1233, cr_loss=0.3065, over 17177.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1608, cr_loss=0.3726, over 3362380.15 frames. ], batch size: 41, lr: 9.30e-03, grad_scale: 32.0 2024-09-23 09:19:58,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226576.0, ans=0.1 2024-09-23 09:20:20,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=226622.66666666666, ans=0.125 2024-09-23 09:20:49,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2024-09-23 09:20:54,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=226716.0, ans=0.2 2024-09-23 09:20:55,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.73 vs. limit=22.5 2024-09-23 09:21:00,927 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-09-23 09:21:22,739 INFO [train.py:1198] (0/4) Epoch 13, batch 1850, loss[loss=0.2237, ctc_loss=0.1535, cr_loss=0.3512, over 16952.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1612, cr_loss=0.3732, over 3363980.52 frames. ], batch size: 42, lr: 9.30e-03, grad_scale: 32.0 2024-09-23 09:21:26,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=226809.33333333334, ans=0.0 2024-09-23 09:21:27,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=226809.33333333334, ans=0.0 2024-09-23 09:21:40,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=226856.0, ans=0.0 2024-09-23 09:21:55,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=12.0 2024-09-23 09:22:14,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=226949.33333333334, ans=0.125 2024-09-23 09:22:14,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=226949.33333333334, ans=0.125 2024-09-23 09:22:29,434 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.038e+02 1.309e+02 1.425e+02 1.783e+02 3.519e+02, threshold=2.851e+02, percent-clipped=1.0 2024-09-23 09:22:29,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=226996.0, ans=0.125 2024-09-23 09:22:44,630 INFO [train.py:1198] (0/4) Epoch 13, batch 1900, loss[loss=0.2385, ctc_loss=0.1644, cr_loss=0.3702, over 17291.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1612, cr_loss=0.3729, over 3364584.65 frames. ], batch size: 49, lr: 9.29e-03, grad_scale: 32.0 2024-09-23 09:23:16,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2024-09-23 09:23:19,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=15.0 2024-09-23 09:23:41,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=22.5 2024-09-23 09:23:56,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=227229.33333333334, ans=0.125 2024-09-23 09:24:02,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=227276.0, ans=0.1 2024-09-23 09:24:04,048 INFO [train.py:1198] (0/4) Epoch 13, batch 1950, loss[loss=0.2006, ctc_loss=0.1355, cr_loss=0.3259, over 17170.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1611, cr_loss=0.3729, over 3366224.94 frames. ], batch size: 45, lr: 9.29e-03, grad_scale: 32.0 2024-09-23 09:24:31,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=227322.66666666666, ans=0.0 2024-09-23 09:24:57,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=227416.0, ans=0.0 2024-09-23 09:25:00,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=227416.0, ans=0.0 2024-09-23 09:25:03,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227416.0, ans=0.1 2024-09-23 09:25:13,356 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.325e+02 1.427e+02 1.553e+02 2.368e+02, threshold=2.853e+02, percent-clipped=0.0 2024-09-23 09:25:25,899 INFO [train.py:1198] (0/4) Epoch 13, batch 2000, loss[loss=0.2584, ctc_loss=0.1759, cr_loss=0.4122, over 17319.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1606, cr_loss=0.3719, over 3366508.08 frames. ], batch size: 51, lr: 9.29e-03, grad_scale: 32.0 2024-09-23 09:25:26,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=227509.33333333334, ans=0.2 2024-09-23 09:25:43,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=227556.0, ans=0.125 2024-09-23 09:26:19,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=227649.33333333334, ans=0.125 2024-09-23 09:26:42,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227696.0, ans=0.1 2024-09-23 09:26:51,001 INFO [train.py:1198] (0/4) Epoch 13, batch 2050, loss[loss=0.2262, ctc_loss=0.1515, cr_loss=0.3735, over 17302.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1603, cr_loss=0.3716, over 3367197.60 frames. ], batch size: 49, lr: 9.28e-03, grad_scale: 16.0 2024-09-23 09:26:57,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=227742.66666666666, ans=0.2 2024-09-23 09:27:00,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227742.66666666666, ans=0.1 2024-09-23 09:27:46,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=227882.66666666666, ans=0.0 2024-09-23 09:27:57,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-09-23 09:28:00,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=227929.33333333334, ans=0.0 2024-09-23 09:28:01,913 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.246e+02 1.339e+02 1.445e+02 2.585e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-23 09:28:03,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2024-09-23 09:28:11,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=227976.0, ans=0.95 2024-09-23 09:28:13,096 INFO [train.py:1198] (0/4) Epoch 13, batch 2100, loss[loss=0.2385, ctc_loss=0.1621, cr_loss=0.3824, over 17080.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1608, cr_loss=0.3729, over 3375996.12 frames. ], batch size: 46, lr: 9.28e-03, grad_scale: 16.0 2024-09-23 09:28:25,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.65 vs. limit=10.0 2024-09-23 09:28:42,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=228022.66666666666, ans=0.125 2024-09-23 09:28:54,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228069.33333333334, ans=0.1 2024-09-23 09:28:59,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228116.0, ans=0.1 2024-09-23 09:29:07,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=228116.0, ans=0.5 2024-09-23 09:29:13,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228116.0, ans=0.1 2024-09-23 09:29:21,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=228162.66666666666, ans=0.125 2024-09-23 09:29:27,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=228162.66666666666, ans=0.025 2024-09-23 09:29:29,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=228162.66666666666, ans=0.125 2024-09-23 09:29:32,533 INFO [train.py:1198] (0/4) Epoch 13, batch 2150, loss[loss=0.2375, ctc_loss=0.1629, cr_loss=0.3734, over 17295.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1611, cr_loss=0.3735, over 3374365.84 frames. ], batch size: 49, lr: 9.27e-03, grad_scale: 16.0 2024-09-23 09:29:33,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2024-09-23 09:29:36,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=228209.33333333334, ans=0.0 2024-09-23 09:29:51,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-23 09:30:00,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=228256.0, ans=0.0 2024-09-23 09:30:02,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2024-09-23 09:30:03,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228302.66666666666, ans=0.1 2024-09-23 09:30:40,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=228396.0, ans=0.125 2024-09-23 09:30:43,654 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.254e+02 1.312e+02 1.458e+02 2.181e+02, threshold=2.624e+02, percent-clipped=0.0 2024-09-23 09:30:54,888 INFO [train.py:1198] (0/4) Epoch 13, batch 2200, loss[loss=0.2824, ctc_loss=0.1924, cr_loss=0.4503, over 16457.00 frames. ], tot_loss[loss=0.2356, ctc_loss=0.1609, cr_loss=0.3736, over 3372635.24 frames. ], batch size: 66, lr: 9.27e-03, grad_scale: 16.0 2024-09-23 09:31:09,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=228489.33333333334, ans=0.025 2024-09-23 09:31:46,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=228582.66666666666, ans=0.1 2024-09-23 09:31:57,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=228582.66666666666, ans=0.2 2024-09-23 09:32:19,401 INFO [train.py:1198] (0/4) Epoch 13, batch 2250, loss[loss=0.2263, ctc_loss=0.1544, cr_loss=0.3591, over 17155.00 frames. ], tot_loss[loss=0.2359, ctc_loss=0.1611, cr_loss=0.3739, over 3366601.42 frames. ], batch size: 45, lr: 9.26e-03, grad_scale: 16.0 2024-09-23 09:32:26,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=228676.0, ans=0.125 2024-09-23 09:32:41,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=228722.66666666666, ans=0.125 2024-09-23 09:33:09,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=228816.0, ans=0.1 2024-09-23 09:33:14,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=228816.0, ans=0.125 2024-09-23 09:33:22,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=228816.0, ans=0.0 2024-09-23 09:33:30,214 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.274e+02 1.381e+02 1.487e+02 1.972e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-23 09:33:38,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=228862.66666666666, ans=0.125 2024-09-23 09:33:41,460 INFO [train.py:1198] (0/4) Epoch 13, batch 2300, loss[loss=0.2463, ctc_loss=0.1705, cr_loss=0.3794, over 16997.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1608, cr_loss=0.3729, over 3355936.61 frames. ], batch size: 56, lr: 9.26e-03, grad_scale: 16.0 2024-09-23 09:33:49,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=228909.33333333334, ans=0.2 2024-09-23 09:34:06,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=228956.0, ans=0.025 2024-09-23 09:34:22,203 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:34:34,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=229049.33333333334, ans=0.0 2024-09-23 09:34:38,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=229049.33333333334, ans=0.125 2024-09-23 09:34:43,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2024-09-23 09:34:45,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0 2024-09-23 09:35:01,977 INFO [train.py:1198] (0/4) Epoch 13, batch 2350, loss[loss=0.2437, ctc_loss=0.1647, cr_loss=0.3947, over 16988.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.161, cr_loss=0.3733, over 3356819.44 frames. ], batch size: 53, lr: 9.25e-03, grad_scale: 16.0 2024-09-23 09:35:02,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=229142.66666666666, ans=0.07 2024-09-23 09:35:05,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=229142.66666666666, ans=0.0 2024-09-23 09:35:05,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-23 09:35:09,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=229142.66666666666, ans=0.125 2024-09-23 09:35:12,849 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:35:17,578 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:35:25,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229189.33333333334, ans=0.1 2024-09-23 09:35:29,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=22.5 2024-09-23 09:35:52,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=229282.66666666666, ans=0.0 2024-09-23 09:35:52,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=229282.66666666666, ans=0.0 2024-09-23 09:35:56,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=229282.66666666666, ans=0.125 2024-09-23 09:36:10,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=229329.33333333334, ans=0.0 2024-09-23 09:36:15,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.26 vs. limit=15.0 2024-09-23 09:36:15,725 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.316e+02 1.425e+02 1.605e+02 2.479e+02, threshold=2.851e+02, percent-clipped=0.0 2024-09-23 09:36:20,830 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:36:27,011 INFO [train.py:1198] (0/4) Epoch 13, batch 2400, loss[loss=0.2319, ctc_loss=0.1582, cr_loss=0.3686, over 17141.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1612, cr_loss=0.3739, over 3361166.93 frames. ], batch size: 48, lr: 9.25e-03, grad_scale: 32.0 2024-09-23 09:36:35,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-09-23 09:37:02,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=229469.33333333334, ans=0.125 2024-09-23 09:37:12,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=229469.33333333334, ans=0.2 2024-09-23 09:37:18,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=229516.0, ans=0.0 2024-09-23 09:37:25,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=229516.0, ans=0.125 2024-09-23 09:37:31,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=229562.66666666666, ans=0.0 2024-09-23 09:37:49,398 INFO [train.py:1198] (0/4) Epoch 13, batch 2450, loss[loss=0.2125, ctc_loss=0.1429, cr_loss=0.3478, over 17178.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1611, cr_loss=0.3737, over 3359205.27 frames. ], batch size: 41, lr: 9.24e-03, grad_scale: 32.0 2024-09-23 09:38:12,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229656.0, ans=0.1 2024-09-23 09:38:15,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2024-09-23 09:38:26,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=229702.66666666666, ans=0.0 2024-09-23 09:38:31,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=229702.66666666666, ans=0.125 2024-09-23 09:38:42,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229749.33333333334, ans=0.1 2024-09-23 09:38:49,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=12.0 2024-09-23 09:38:52,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=12.0 2024-09-23 09:38:54,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-09-23 09:38:58,211 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.292e+02 1.402e+02 1.572e+02 2.224e+02, threshold=2.803e+02, percent-clipped=0.0 2024-09-23 09:39:09,496 INFO [train.py:1198] (0/4) Epoch 13, batch 2500, loss[loss=0.2591, ctc_loss=0.1791, cr_loss=0.3998, over 15201.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.162, cr_loss=0.3744, over 3347796.48 frames. ], batch size: 89, lr: 9.24e-03, grad_scale: 32.0 2024-09-23 09:39:24,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=229889.33333333334, ans=0.0 2024-09-23 09:40:11,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=229982.66666666666, ans=0.125 2024-09-23 09:40:27,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230029.33333333334, ans=0.1 2024-09-23 09:40:32,033 INFO [train.py:1198] (0/4) Epoch 13, batch 2550, loss[loss=0.2472, ctc_loss=0.1717, cr_loss=0.3771, over 17151.00 frames. ], tot_loss[loss=0.2373, ctc_loss=0.1623, cr_loss=0.375, over 3356890.62 frames. ], batch size: 48, lr: 9.23e-03, grad_scale: 32.0 2024-09-23 09:40:52,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=230122.66666666666, ans=0.5 2024-09-23 09:41:22,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=230169.33333333334, ans=0.125 2024-09-23 09:41:36,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=230216.0, ans=10.0 2024-09-23 09:41:36,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=230216.0, ans=0.0 2024-09-23 09:41:38,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=230216.0, ans=0.125 2024-09-23 09:41:39,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=230262.66666666666, ans=10.0 2024-09-23 09:41:42,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=12.0 2024-09-23 09:41:45,894 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.282e+02 1.430e+02 1.589e+02 2.134e+02, threshold=2.861e+02, percent-clipped=0.0 2024-09-23 09:41:57,177 INFO [train.py:1198] (0/4) Epoch 13, batch 2600, loss[loss=0.2405, ctc_loss=0.1641, cr_loss=0.3818, over 17133.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1617, cr_loss=0.3741, over 3360698.79 frames. ], batch size: 48, lr: 9.23e-03, grad_scale: 32.0 2024-09-23 09:42:15,184 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 09:42:36,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=230402.66666666666, ans=0.0 2024-09-23 09:42:44,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=230402.66666666666, ans=0.0 2024-09-23 09:42:45,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.88 vs. limit=22.5 2024-09-23 09:42:54,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=230449.33333333334, ans=0.1 2024-09-23 09:43:13,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=230496.0, ans=0.2 2024-09-23 09:43:20,134 INFO [train.py:1198] (0/4) Epoch 13, batch 2650, loss[loss=0.2346, ctc_loss=0.1574, cr_loss=0.3858, over 17249.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1607, cr_loss=0.3735, over 3369261.26 frames. ], batch size: 44, lr: 9.23e-03, grad_scale: 32.0 2024-09-23 09:43:34,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=230589.33333333334, ans=0.0 2024-09-23 09:44:02,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=230636.0, ans=0.0 2024-09-23 09:44:28,571 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.270e+02 1.366e+02 1.498e+02 2.280e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-23 09:44:39,786 INFO [train.py:1198] (0/4) Epoch 13, batch 2700, loss[loss=0.2988, ctc_loss=0.2176, cr_loss=0.4062, over 11277.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1615, cr_loss=0.3743, over 3354955.64 frames. ], batch size: 123, lr: 9.22e-03, grad_scale: 32.0 2024-09-23 09:44:40,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=230776.0, ans=0.2 2024-09-23 09:44:51,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=230776.0, ans=0.125 2024-09-23 09:45:17,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=230869.33333333334, ans=0.125 2024-09-23 09:45:21,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=230869.33333333334, ans=0.2 2024-09-23 09:45:29,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2024-09-23 09:46:02,467 INFO [train.py:1198] (0/4) Epoch 13, batch 2750, loss[loss=0.2317, ctc_loss=0.1542, cr_loss=0.3878, over 17266.00 frames. ], tot_loss[loss=0.2365, ctc_loss=0.1616, cr_loss=0.3747, over 3357306.74 frames. ], batch size: 44, lr: 9.22e-03, grad_scale: 32.0 2024-09-23 09:46:02,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=231009.33333333334, ans=0.125 2024-09-23 09:46:54,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=231149.33333333334, ans=10.0 2024-09-23 09:47:02,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=231149.33333333334, ans=0.2 2024-09-23 09:47:05,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=231149.33333333334, ans=0.125 2024-09-23 09:47:09,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-09-23 09:47:19,273 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.267e+02 1.386e+02 1.565e+02 1.913e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-23 09:47:30,587 INFO [train.py:1198] (0/4) Epoch 13, batch 2800, loss[loss=0.2485, ctc_loss=0.1686, cr_loss=0.3997, over 17309.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.162, cr_loss=0.3746, over 3350726.26 frames. ], batch size: 51, lr: 9.21e-03, grad_scale: 32.0 2024-09-23 09:48:32,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231429.33333333334, ans=0.1 2024-09-23 09:48:42,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.65 vs. limit=15.0 2024-09-23 09:48:50,284 INFO [train.py:1198] (0/4) Epoch 13, batch 2850, loss[loss=0.2473, ctc_loss=0.1697, cr_loss=0.3879, over 17049.00 frames. ], tot_loss[loss=0.2371, ctc_loss=0.162, cr_loss=0.3751, over 3361778.63 frames. ], batch size: 56, lr: 9.21e-03, grad_scale: 16.0 2024-09-23 09:49:09,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231522.66666666666, ans=0.1 2024-09-23 09:49:38,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=231616.0, ans=0.2 2024-09-23 09:49:38,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231616.0, ans=0.1 2024-09-23 09:49:54,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-09-23 09:50:00,334 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.288e+02 1.395e+02 1.565e+02 5.212e+02, threshold=2.790e+02, percent-clipped=1.0 2024-09-23 09:50:12,524 INFO [train.py:1198] (0/4) Epoch 13, batch 2900, loss[loss=0.2232, ctc_loss=0.1518, cr_loss=0.357, over 17029.00 frames. ], tot_loss[loss=0.2363, ctc_loss=0.1615, cr_loss=0.374, over 3364298.05 frames. ], batch size: 51, lr: 9.20e-03, grad_scale: 16.0 2024-09-23 09:50:28,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=231756.0, ans=0.015 2024-09-23 09:50:52,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231802.66666666666, ans=0.1 2024-09-23 09:50:58,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=12.0 2024-09-23 09:51:17,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=231896.0, ans=0.125 2024-09-23 09:51:35,581 INFO [train.py:1198] (0/4) Epoch 13, batch 2950, loss[loss=0.2011, ctc_loss=0.1342, cr_loss=0.3348, over 17148.00 frames. ], tot_loss[loss=0.2358, ctc_loss=0.1612, cr_loss=0.3733, over 3353244.17 frames. ], batch size: 41, lr: 9.20e-03, grad_scale: 16.0 2024-09-23 09:51:35,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=231942.66666666666, ans=0.2 2024-09-23 09:51:35,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=231942.66666666666, ans=0.125 2024-09-23 09:52:15,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232036.0, ans=0.1 2024-09-23 09:52:20,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2024-09-23 09:52:21,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=232036.0, ans=0.2 2024-09-23 09:52:34,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=232082.66666666666, ans=0.07 2024-09-23 09:52:47,838 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.241e+02 1.342e+02 1.458e+02 2.416e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-23 09:52:57,272 INFO [train.py:1198] (0/4) Epoch 13, batch 3000, loss[loss=0.2719, ctc_loss=0.1916, cr_loss=0.4015, over 16565.00 frames. ], tot_loss[loss=0.2367, ctc_loss=0.1618, cr_loss=0.3743, over 3355395.57 frames. ], batch size: 66, lr: 9.19e-03, grad_scale: 16.0 2024-09-23 09:52:57,273 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 09:53:06,069 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7175, 4.0415, 3.3288, 4.1347, 3.0246, 3.5680, 3.6277, 3.6908], device='cuda:0') 2024-09-23 09:53:12,986 INFO [train.py:1230] (0/4) Epoch 13, validation: loss=0.04424, ctc_loss=0.04424, cr_loss=7.269e-15, over 944034.00 frames. 2024-09-23 09:53:12,986 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 09:53:52,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-09-23 09:54:27,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=232362.66666666666, ans=0.0 2024-09-23 09:54:31,494 INFO [train.py:1198] (0/4) Epoch 13, batch 3050, loss[loss=0.2331, ctc_loss=0.1561, cr_loss=0.3852, over 17306.00 frames. ], tot_loss[loss=0.2374, ctc_loss=0.1624, cr_loss=0.3749, over 3353887.71 frames. ], batch size: 46, lr: 9.19e-03, grad_scale: 16.0 2024-09-23 09:54:55,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=232456.0, ans=0.125 2024-09-23 09:55:05,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=232502.66666666666, ans=0.5 2024-09-23 09:55:32,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=232596.0, ans=0.125 2024-09-23 09:55:39,980 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.372e+02 1.503e+02 1.600e+02 2.658e+02, threshold=3.005e+02, percent-clipped=0.0 2024-09-23 09:55:41,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=232596.0, ans=0.125 2024-09-23 09:55:49,567 INFO [train.py:1198] (0/4) Epoch 13, batch 3100, loss[loss=0.2455, ctc_loss=0.1707, cr_loss=0.374, over 17109.00 frames. ], tot_loss[loss=0.236, ctc_loss=0.1614, cr_loss=0.3731, over 3347492.06 frames. ], batch size: 49, lr: 9.18e-03, grad_scale: 16.0 2024-09-23 09:55:51,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=232642.66666666666, ans=0.125 2024-09-23 09:56:00,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=232642.66666666666, ans=0.125 2024-09-23 09:56:23,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=232736.0, ans=0.125 2024-09-23 09:56:32,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=232736.0, ans=0.1 2024-09-23 09:57:10,784 INFO [train.py:1198] (0/4) Epoch 13, batch 3150, loss[loss=0.2086, ctc_loss=0.1426, cr_loss=0.3303, over 17293.00 frames. ], tot_loss[loss=0.2362, ctc_loss=0.1615, cr_loss=0.3732, over 3357328.78 frames. ], batch size: 46, lr: 9.18e-03, grad_scale: 16.0 2024-09-23 09:57:19,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=232876.0, ans=0.125 2024-09-23 09:57:26,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=232922.66666666666, ans=0.025 2024-09-23 09:57:30,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=232922.66666666666, ans=0.025 2024-09-23 09:57:33,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=232922.66666666666, ans=0.0 2024-09-23 09:57:58,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2024-09-23 09:58:19,650 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.308e+02 1.468e+02 1.678e+02 2.567e+02, threshold=2.937e+02, percent-clipped=0.0 2024-09-23 09:58:28,988 INFO [train.py:1198] (0/4) Epoch 13, batch 3200, loss[loss=0.212, ctc_loss=0.141, cr_loss=0.355, over 17307.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1607, cr_loss=0.3728, over 3361670.83 frames. ], batch size: 46, lr: 9.18e-03, grad_scale: 32.0 2024-09-23 09:58:51,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=233156.0, ans=0.2 2024-09-23 09:58:56,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=233156.0, ans=0.05 2024-09-23 09:59:26,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=233249.33333333334, ans=0.0 2024-09-23 09:59:39,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=233296.0, ans=0.025 2024-09-23 09:59:44,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=233296.0, ans=0.125 2024-09-23 09:59:52,148 INFO [train.py:1198] (0/4) Epoch 13, batch 3250, loss[loss=0.2373, ctc_loss=0.1641, cr_loss=0.366, over 16998.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.161, cr_loss=0.3731, over 3358512.39 frames. ], batch size: 51, lr: 9.17e-03, grad_scale: 32.0 2024-09-23 10:00:00,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=233342.66666666666, ans=0.125 2024-09-23 10:00:05,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=233342.66666666666, ans=0.125 2024-09-23 10:01:01,302 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.287e+02 1.416e+02 1.581e+02 2.137e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 10:01:03,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=233529.33333333334, ans=0.125 2024-09-23 10:01:10,548 INFO [train.py:1198] (0/4) Epoch 13, batch 3300, loss[loss=0.2395, ctc_loss=0.1632, cr_loss=0.3811, over 17140.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1608, cr_loss=0.3729, over 3356899.35 frames. ], batch size: 48, lr: 9.17e-03, grad_scale: 32.0 2024-09-23 10:01:21,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=233576.0, ans=0.125 2024-09-23 10:01:37,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=233622.66666666666, ans=0.125 2024-09-23 10:02:04,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=233716.0, ans=0.025 2024-09-23 10:02:21,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=233762.66666666666, ans=0.0 2024-09-23 10:02:25,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233762.66666666666, ans=0.1 2024-09-23 10:02:30,431 INFO [train.py:1198] (0/4) Epoch 13, batch 3350, loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3749, over 17333.00 frames. ], tot_loss[loss=0.2351, ctc_loss=0.1606, cr_loss=0.3728, over 3361840.96 frames. ], batch size: 48, lr: 9.16e-03, grad_scale: 32.0 2024-09-23 10:02:50,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=233856.0, ans=0.2 2024-09-23 10:03:18,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=233949.33333333334, ans=0.2 2024-09-23 10:03:38,963 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.272e+02 1.377e+02 1.520e+02 2.187e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 10:03:40,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=233996.0, ans=0.0 2024-09-23 10:03:47,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2024-09-23 10:03:48,298 INFO [train.py:1198] (0/4) Epoch 13, batch 3400, loss[loss=0.2953, ctc_loss=0.2085, cr_loss=0.4344, over 15985.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1605, cr_loss=0.3728, over 3366627.34 frames. ], batch size: 74, lr: 9.16e-03, grad_scale: 32.0 2024-09-23 10:03:50,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234042.66666666666, ans=0.1 2024-09-23 10:04:07,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=234089.33333333334, ans=0.09899494936611666 2024-09-23 10:04:11,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=234089.33333333334, ans=0.0 2024-09-23 10:04:17,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=234136.0, ans=0.125 2024-09-23 10:04:40,182 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:04:48,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=15.0 2024-09-23 10:05:06,443 INFO [train.py:1198] (0/4) Epoch 13, batch 3450, loss[loss=0.2121, ctc_loss=0.1422, cr_loss=0.3494, over 17172.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1607, cr_loss=0.3735, over 3365590.47 frames. ], batch size: 45, lr: 9.15e-03, grad_scale: 32.0 2024-09-23 10:05:51,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2024-09-23 10:06:11,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234462.66666666666, ans=0.0 2024-09-23 10:06:15,810 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.304e+02 1.412e+02 1.643e+02 2.368e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-23 10:06:25,230 INFO [train.py:1198] (0/4) Epoch 13, batch 3500, loss[loss=0.2269, ctc_loss=0.1558, cr_loss=0.3556, over 17364.00 frames. ], tot_loss[loss=0.2354, ctc_loss=0.1608, cr_loss=0.3731, over 3359915.51 frames. ], batch size: 48, lr: 9.15e-03, grad_scale: 32.0 2024-09-23 10:06:27,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2024-09-23 10:07:43,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2024-09-23 10:07:45,895 INFO [train.py:1198] (0/4) Epoch 13, batch 3550, loss[loss=0.2882, ctc_loss=0.204, cr_loss=0.421, over 15145.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1609, cr_loss=0.373, over 3363970.89 frames. ], batch size: 89, lr: 9.14e-03, grad_scale: 32.0 2024-09-23 10:07:52,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234742.66666666666, ans=0.1 2024-09-23 10:07:53,019 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.82 vs. limit=10.0 2024-09-23 10:07:55,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=234742.66666666666, ans=0.95 2024-09-23 10:08:25,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=234836.0, ans=0.125 2024-09-23 10:09:00,083 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.323e+02 1.492e+02 1.691e+02 2.949e+02, threshold=2.984e+02, percent-clipped=2.0 2024-09-23 10:09:03,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=234929.33333333334, ans=0.1 2024-09-23 10:09:07,583 INFO [train.py:1198] (0/4) Epoch 13, batch 3600, loss[loss=0.2633, ctc_loss=0.1822, cr_loss=0.4056, over 17193.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1597, cr_loss=0.3712, over 3366808.78 frames. ], batch size: 55, lr: 9.14e-03, grad_scale: 32.0 2024-09-23 10:09:14,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=234976.0, ans=0.07 2024-09-23 10:09:15,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-09-23 10:09:18,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=234976.0, ans=0.125 2024-09-23 10:10:18,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=22.5 2024-09-23 10:10:25,895 INFO [train.py:1198] (0/4) Epoch 13, batch 3650, loss[loss=0.2779, ctc_loss=0.1855, cr_loss=0.462, over 17292.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1599, cr_loss=0.3721, over 3368629.10 frames. ], batch size: 49, lr: 9.14e-03, grad_scale: 32.0 2024-09-23 10:10:32,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=235209.33333333334, ans=0.025 2024-09-23 10:11:05,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-09-23 10:11:09,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=235302.66666666666, ans=0.0 2024-09-23 10:11:37,917 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.270e+02 1.340e+02 1.472e+02 2.759e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-23 10:11:45,789 INFO [train.py:1198] (0/4) Epoch 13, batch 3700, loss[loss=0.2122, ctc_loss=0.1388, cr_loss=0.3668, over 15899.00 frames. ], tot_loss[loss=0.235, ctc_loss=0.1605, cr_loss=0.3726, over 3356828.92 frames. ], batch size: 35, lr: 9.13e-03, grad_scale: 32.0 2024-09-23 10:11:57,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2024-09-23 10:11:59,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=235442.66666666666, ans=0.125 2024-09-23 10:12:02,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=235489.33333333334, ans=0.025 2024-09-23 10:12:13,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=235489.33333333334, ans=0.1 2024-09-23 10:13:03,922 INFO [train.py:1198] (0/4) Epoch 13, batch 3750, loss[loss=0.2736, ctc_loss=0.1889, cr_loss=0.4235, over 16496.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1611, cr_loss=0.3733, over 3341772.44 frames. ], batch size: 66, lr: 9.13e-03, grad_scale: 32.0 2024-09-23 10:13:32,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=235722.66666666666, ans=0.125 2024-09-23 10:13:32,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=235722.66666666666, ans=0.2 2024-09-23 10:13:40,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=235769.33333333334, ans=0.125 2024-09-23 10:13:42,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=235769.33333333334, ans=0.125 2024-09-23 10:14:00,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=235816.0, ans=0.125 2024-09-23 10:14:12,975 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:14:14,235 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.350e+02 1.440e+02 1.635e+02 2.891e+02, threshold=2.880e+02, percent-clipped=1.0 2024-09-23 10:14:19,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=235862.66666666666, ans=0.015 2024-09-23 10:14:22,030 INFO [train.py:1198] (0/4) Epoch 13, batch 3800, loss[loss=0.2701, ctc_loss=0.1869, cr_loss=0.4159, over 16686.00 frames. ], tot_loss[loss=0.2372, ctc_loss=0.1624, cr_loss=0.3738, over 3311647.55 frames. ], batch size: 61, lr: 9.12e-03, grad_scale: 32.0 2024-09-23 10:14:27,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=235909.33333333334, ans=0.0 2024-09-23 10:14:36,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=15.0 2024-09-23 10:15:31,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=12.0 2024-09-23 10:15:37,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=236096.0, ans=0.125 2024-09-23 10:15:40,300 INFO [train.py:1198] (0/4) Epoch 13, batch 3850, loss[loss=0.2996, ctc_loss=0.2241, cr_loss=0.3772, over 12277.00 frames. ], tot_loss[loss=0.2406, ctc_loss=0.1652, cr_loss=0.3768, over 3274985.77 frames. ], batch size: 123, lr: 9.12e-03, grad_scale: 16.0 2024-09-23 10:15:42,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=236142.66666666666, ans=0.125 2024-09-23 10:15:54,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=236189.33333333334, ans=0.1 2024-09-23 10:16:09,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=236236.0, ans=0.125 2024-09-23 10:16:35,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=236282.66666666666, ans=10.0 2024-09-23 10:16:50,286 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-13.pt 2024-09-23 10:17:39,608 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.439e+02 1.592e+02 1.784e+02 2.502e+02, threshold=3.185e+02, percent-clipped=0.0 2024-09-23 10:17:39,632 INFO [train.py:1198] (0/4) Epoch 14, batch 0, loss[loss=0.2612, ctc_loss=0.1813, cr_loss=0.3999, over 17001.00 frames. ], tot_loss[loss=0.2612, ctc_loss=0.1813, cr_loss=0.3999, over 17001.00 frames. ], batch size: 53, lr: 8.78e-03, grad_scale: 32.0 2024-09-23 10:17:39,633 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 10:17:55,047 INFO [train.py:1230] (0/4) Epoch 14, validation: loss=0.04435, ctc_loss=0.04435, cr_loss=7.317e-15, over 944034.00 frames. 2024-09-23 10:17:55,048 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 10:18:03,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-09-23 10:18:15,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.61 vs. limit=10.0 2024-09-23 10:18:17,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=236404.0, ans=0.2 2024-09-23 10:18:33,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=236450.66666666666, ans=0.0 2024-09-23 10:19:01,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=236497.33333333334, ans=0.125 2024-09-23 10:19:21,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=236590.66666666666, ans=0.0 2024-09-23 10:19:22,451 INFO [train.py:1198] (0/4) Epoch 14, batch 50, loss[loss=0.2623, ctc_loss=0.178, cr_loss=0.4219, over 17031.00 frames. ], tot_loss[loss=0.2401, ctc_loss=0.1639, cr_loss=0.3813, over 763691.05 frames. ], batch size: 52, lr: 8.78e-03, grad_scale: 32.0 2024-09-23 10:19:26,161 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:19:29,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=236590.66666666666, ans=0.125 2024-09-23 10:20:24,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=236777.33333333334, ans=0.125 2024-09-23 10:20:33,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236777.33333333334, ans=0.125 2024-09-23 10:20:42,383 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.241e+02 1.435e+02 1.718e+02 2.310e+02, threshold=2.871e+02, percent-clipped=0.0 2024-09-23 10:20:42,407 INFO [train.py:1198] (0/4) Epoch 14, batch 100, loss[loss=0.2538, ctc_loss=0.1794, cr_loss=0.3719, over 16507.00 frames. ], tot_loss[loss=0.2399, ctc_loss=0.164, cr_loss=0.3794, over 1331705.34 frames. ], batch size: 66, lr: 8.77e-03, grad_scale: 32.0 2024-09-23 10:21:22,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=236917.33333333334, ans=0.0 2024-09-23 10:21:23,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=236917.33333333334, ans=0.02 2024-09-23 10:21:24,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=236917.33333333334, ans=0.025 2024-09-23 10:21:56,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.51 vs. limit=10.0 2024-09-23 10:22:03,253 INFO [train.py:1198] (0/4) Epoch 14, batch 150, loss[loss=0.2444, ctc_loss=0.1662, cr_loss=0.3912, over 17100.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1598, cr_loss=0.3741, over 1786412.29 frames. ], batch size: 49, lr: 8.77e-03, grad_scale: 32.0 2024-09-23 10:22:03,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=237057.33333333334, ans=0.09899494936611666 2024-09-23 10:22:13,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237057.33333333334, ans=0.1 2024-09-23 10:22:14,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=237057.33333333334, ans=0.0 2024-09-23 10:22:38,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=237150.66666666666, ans=0.125 2024-09-23 10:22:43,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=237150.66666666666, ans=0.125 2024-09-23 10:23:25,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=237244.0, ans=0.2 2024-09-23 10:23:28,629 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.307e+02 1.446e+02 1.689e+02 2.559e+02, threshold=2.892e+02, percent-clipped=0.0 2024-09-23 10:23:28,654 INFO [train.py:1198] (0/4) Epoch 14, batch 200, loss[loss=0.2436, ctc_loss=0.1672, cr_loss=0.3824, over 16919.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1603, cr_loss=0.3748, over 2129654.99 frames. ], batch size: 58, lr: 8.76e-03, grad_scale: 32.0 2024-09-23 10:23:44,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=237290.66666666666, ans=0.125 2024-09-23 10:24:24,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=237430.66666666666, ans=0.2 2024-09-23 10:24:35,440 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=22.5 2024-09-23 10:24:49,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=237477.33333333334, ans=0.125 2024-09-23 10:24:53,918 INFO [train.py:1198] (0/4) Epoch 14, batch 250, loss[loss=0.2533, ctc_loss=0.1751, cr_loss=0.3909, over 17017.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1595, cr_loss=0.3733, over 2406900.11 frames. ], batch size: 52, lr: 8.76e-03, grad_scale: 32.0 2024-09-23 10:24:58,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=237524.0, ans=0.025 2024-09-23 10:25:23,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=237570.66666666666, ans=0.2 2024-09-23 10:25:40,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=237664.0, ans=0.0 2024-09-23 10:25:50,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.61 vs. limit=10.0 2024-09-23 10:25:51,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2024-09-23 10:25:52,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=237664.0, ans=0.025 2024-09-23 10:26:09,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2024-09-23 10:26:13,348 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.272e+02 1.438e+02 1.640e+02 2.630e+02, threshold=2.876e+02, percent-clipped=0.0 2024-09-23 10:26:13,373 INFO [train.py:1198] (0/4) Epoch 14, batch 300, loss[loss=0.2906, ctc_loss=0.2153, cr_loss=0.3768, over 11412.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1595, cr_loss=0.3737, over 2616884.22 frames. ], batch size: 123, lr: 8.76e-03, grad_scale: 32.0 2024-09-23 10:26:58,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-09-23 10:27:32,702 INFO [train.py:1198] (0/4) Epoch 14, batch 350, loss[loss=0.1894, ctc_loss=0.1256, cr_loss=0.3191, over 17079.00 frames. ], tot_loss[loss=0.2335, ctc_loss=0.1589, cr_loss=0.3729, over 2778159.64 frames. ], batch size: 43, lr: 8.75e-03, grad_scale: 32.0 2024-09-23 10:27:45,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=237990.66666666666, ans=0.125 2024-09-23 10:28:05,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=238037.33333333334, ans=0.2 2024-09-23 10:28:41,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=238130.66666666666, ans=0.0 2024-09-23 10:29:02,522 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.337e+02 1.482e+02 1.714e+02 2.531e+02, threshold=2.964e+02, percent-clipped=0.0 2024-09-23 10:29:02,546 INFO [train.py:1198] (0/4) Epoch 14, batch 400, loss[loss=0.2755, ctc_loss=0.1941, cr_loss=0.4071, over 17315.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1586, cr_loss=0.3714, over 2901663.35 frames. ], batch size: 49, lr: 8.75e-03, grad_scale: 32.0 2024-09-23 10:29:24,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=238270.66666666666, ans=0.09899494936611666 2024-09-23 10:29:30,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2024-09-23 10:29:34,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=238317.33333333334, ans=0.125 2024-09-23 10:30:21,635 INFO [train.py:1198] (0/4) Epoch 14, batch 450, loss[loss=0.2457, ctc_loss=0.1657, cr_loss=0.4, over 17150.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.159, cr_loss=0.3722, over 3005644.37 frames. ], batch size: 48, lr: 8.74e-03, grad_scale: 32.0 2024-09-23 10:30:28,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=238457.33333333334, ans=0.1 2024-09-23 10:30:55,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=238550.66666666666, ans=0.125 2024-09-23 10:30:55,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=238550.66666666666, ans=0.2 2024-09-23 10:31:27,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=238644.0, ans=0.025 2024-09-23 10:31:36,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238644.0, ans=0.1 2024-09-23 10:31:41,086 INFO [train.py:1198] (0/4) Epoch 14, batch 500, loss[loss=0.2139, ctc_loss=0.1416, cr_loss=0.3616, over 17003.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1603, cr_loss=0.3748, over 3075540.31 frames. ], batch size: 51, lr: 8.74e-03, grad_scale: 16.0 2024-09-23 10:31:41,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=12.0 2024-09-23 10:31:42,720 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.208e+02 1.324e+02 1.495e+02 2.086e+02, threshold=2.649e+02, percent-clipped=0.0 2024-09-23 10:31:47,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=238690.66666666666, ans=0.125 2024-09-23 10:31:58,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=238737.33333333334, ans=0.04949747468305833 2024-09-23 10:32:17,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=238784.0, ans=0.0 2024-09-23 10:32:19,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=238784.0, ans=0.015 2024-09-23 10:32:49,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.62 vs. limit=15.0 2024-09-23 10:32:51,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-09-23 10:33:06,646 INFO [train.py:1198] (0/4) Epoch 14, batch 550, loss[loss=0.2566, ctc_loss=0.1744, cr_loss=0.4107, over 17092.00 frames. ], tot_loss[loss=0.2353, ctc_loss=0.1603, cr_loss=0.3753, over 3148693.91 frames. ], batch size: 46, lr: 8.74e-03, grad_scale: 16.0 2024-09-23 10:33:20,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=238924.0, ans=0.125 2024-09-23 10:34:05,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239064.0, ans=0.1 2024-09-23 10:34:05,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=239064.0, ans=0.125 2024-09-23 10:34:07,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-09-23 10:34:09,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=239064.0, ans=0.07 2024-09-23 10:34:16,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=239110.66666666666, ans=0.2 2024-09-23 10:34:22,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-23 10:34:31,898 INFO [train.py:1198] (0/4) Epoch 14, batch 600, loss[loss=0.2025, ctc_loss=0.1367, cr_loss=0.3289, over 17125.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1591, cr_loss=0.3738, over 3203750.99 frames. ], batch size: 40, lr: 8.73e-03, grad_scale: 16.0 2024-09-23 10:34:33,458 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.266e+02 1.344e+02 1.475e+02 2.652e+02, threshold=2.689e+02, percent-clipped=1.0 2024-09-23 10:34:40,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=239157.33333333334, ans=0.2 2024-09-23 10:35:00,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=239204.0, ans=0.125 2024-09-23 10:35:21,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=239297.33333333334, ans=0.0 2024-09-23 10:35:26,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2024-09-23 10:35:30,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=239297.33333333334, ans=0.125 2024-09-23 10:35:36,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.64 vs. limit=10.0 2024-09-23 10:35:51,348 INFO [train.py:1198] (0/4) Epoch 14, batch 650, loss[loss=0.2406, ctc_loss=0.1617, cr_loss=0.3949, over 17301.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1592, cr_loss=0.3738, over 3230030.27 frames. ], batch size: 49, lr: 8.73e-03, grad_scale: 16.0 2024-09-23 10:36:16,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=239437.33333333334, ans=0.0 2024-09-23 10:37:04,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-09-23 10:37:11,335 INFO [train.py:1198] (0/4) Epoch 14, batch 700, loss[loss=0.1913, ctc_loss=0.1276, cr_loss=0.3187, over 17169.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1583, cr_loss=0.3725, over 3271236.84 frames. ], batch size: 41, lr: 8.72e-03, grad_scale: 16.0 2024-09-23 10:37:13,000 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.260e+02 1.373e+02 1.552e+02 2.322e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-23 10:37:23,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2024-09-23 10:37:24,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=239624.0, ans=0.125 2024-09-23 10:37:49,127 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:37:53,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239717.33333333334, ans=0.1 2024-09-23 10:38:08,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=239764.0, ans=0.2 2024-09-23 10:38:27,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=239810.66666666666, ans=0.1 2024-09-23 10:38:37,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.49 vs. limit=12.0 2024-09-23 10:38:39,756 INFO [train.py:1198] (0/4) Epoch 14, batch 750, loss[loss=0.252, ctc_loss=0.1677, cr_loss=0.4216, over 17068.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1588, cr_loss=0.3729, over 3294900.61 frames. ], batch size: 56, lr: 8.72e-03, grad_scale: 16.0 2024-09-23 10:39:19,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2024-09-23 10:39:23,996 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:39:45,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.87 vs. limit=10.0 2024-09-23 10:39:54,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=240044.0, ans=0.125 2024-09-23 10:39:56,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=240044.0, ans=0.125 2024-09-23 10:39:56,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-23 10:40:02,418 INFO [train.py:1198] (0/4) Epoch 14, batch 800, loss[loss=0.2129, ctc_loss=0.1446, cr_loss=0.3414, over 17024.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1591, cr_loss=0.3727, over 3308737.29 frames. ], batch size: 51, lr: 8.71e-03, grad_scale: 32.0 2024-09-23 10:40:03,952 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.283e+02 1.393e+02 1.518e+02 3.186e+02, threshold=2.786e+02, percent-clipped=2.0 2024-09-23 10:40:47,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=240184.0, ans=0.2 2024-09-23 10:41:22,388 INFO [train.py:1198] (0/4) Epoch 14, batch 850, loss[loss=0.2063, ctc_loss=0.1381, cr_loss=0.341, over 17086.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1592, cr_loss=0.3728, over 3326213.06 frames. ], batch size: 43, lr: 8.71e-03, grad_scale: 32.0 2024-09-23 10:41:49,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=240370.66666666666, ans=0.0 2024-09-23 10:41:59,286 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:42:05,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=240417.33333333334, ans=0.0 2024-09-23 10:42:44,015 INFO [train.py:1198] (0/4) Epoch 14, batch 900, loss[loss=0.2136, ctc_loss=0.143, cr_loss=0.3526, over 17002.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1581, cr_loss=0.3711, over 3338107.83 frames. ], batch size: 44, lr: 8.71e-03, grad_scale: 32.0 2024-09-23 10:42:48,314 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.265e+02 1.359e+02 1.500e+02 2.203e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-23 10:42:51,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=240557.33333333334, ans=0.1 2024-09-23 10:43:02,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=240604.0, ans=0.2 2024-09-23 10:43:15,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=240604.0, ans=0.125 2024-09-23 10:43:20,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=240650.66666666666, ans=0.0 2024-09-23 10:43:21,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.39 vs. limit=10.0 2024-09-23 10:43:40,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=240697.33333333334, ans=0.2 2024-09-23 10:43:54,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=240744.0, ans=0.125 2024-09-23 10:44:11,569 INFO [train.py:1198] (0/4) Epoch 14, batch 950, loss[loss=0.2184, ctc_loss=0.1441, cr_loss=0.3717, over 17156.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1574, cr_loss=0.3704, over 3338663.99 frames. ], batch size: 45, lr: 8.70e-03, grad_scale: 32.0 2024-09-23 10:45:01,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=240930.66666666666, ans=0.125 2024-09-23 10:45:09,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=240930.66666666666, ans=0.125 2024-09-23 10:45:16,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=240977.33333333334, ans=0.2 2024-09-23 10:45:25,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240977.33333333334, ans=0.1 2024-09-23 10:45:25,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=240977.33333333334, ans=0.125 2024-09-23 10:45:31,844 INFO [train.py:1198] (0/4) Epoch 14, batch 1000, loss[loss=0.1957, ctc_loss=0.1329, cr_loss=0.3139, over 17079.00 frames. ], tot_loss[loss=0.231, ctc_loss=0.1569, cr_loss=0.37, over 3351913.21 frames. ], batch size: 43, lr: 8.70e-03, grad_scale: 32.0 2024-09-23 10:45:33,324 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.317e+02 1.495e+02 1.766e+02 2.386e+02, threshold=2.990e+02, percent-clipped=0.0 2024-09-23 10:46:11,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=241117.33333333334, ans=0.125 2024-09-23 10:46:31,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=241164.0, ans=0.125 2024-09-23 10:46:39,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=241210.66666666666, ans=22.5 2024-09-23 10:46:42,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=241210.66666666666, ans=0.0 2024-09-23 10:46:52,003 INFO [train.py:1198] (0/4) Epoch 14, batch 1050, loss[loss=0.1947, ctc_loss=0.1287, cr_loss=0.3302, over 17181.00 frames. ], tot_loss[loss=0.2314, ctc_loss=0.1574, cr_loss=0.37, over 3352297.44 frames. ], batch size: 41, lr: 8.69e-03, grad_scale: 32.0 2024-09-23 10:47:16,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=241304.0, ans=0.125 2024-09-23 10:47:29,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=241350.66666666666, ans=0.0 2024-09-23 10:47:30,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-23 10:47:58,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=241397.33333333334, ans=0.125 2024-09-23 10:48:16,988 INFO [train.py:1198] (0/4) Epoch 14, batch 1100, loss[loss=0.1789, ctc_loss=0.1193, cr_loss=0.2982, over 17260.00 frames. ], tot_loss[loss=0.2306, ctc_loss=0.1569, cr_loss=0.3688, over 3352952.04 frames. ], batch size: 42, lr: 8.69e-03, grad_scale: 32.0 2024-09-23 10:48:18,620 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.315e+02 1.444e+02 1.614e+02 2.728e+02, threshold=2.888e+02, percent-clipped=0.0 2024-09-23 10:48:23,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=241490.66666666666, ans=0.125 2024-09-23 10:48:25,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=241490.66666666666, ans=0.125 2024-09-23 10:48:34,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=241537.33333333334, ans=0.2 2024-09-23 10:48:49,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=241537.33333333334, ans=0.125 2024-09-23 10:48:50,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=15.0 2024-09-23 10:49:02,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=241584.0, ans=0.125 2024-09-23 10:49:14,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=241630.66666666666, ans=0.1 2024-09-23 10:49:21,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2024-09-23 10:49:21,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-09-23 10:49:35,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=241677.33333333334, ans=0.125 2024-09-23 10:49:41,262 INFO [train.py:1198] (0/4) Epoch 14, batch 1150, loss[loss=0.2642, ctc_loss=0.1772, cr_loss=0.4349, over 17215.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.1564, cr_loss=0.3679, over 3351779.99 frames. ], batch size: 55, lr: 8.69e-03, grad_scale: 32.0 2024-09-23 10:49:46,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=241724.0, ans=0.125 2024-09-23 10:50:16,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=15.0 2024-09-23 10:50:22,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-23 10:51:01,212 INFO [train.py:1198] (0/4) Epoch 14, batch 1200, loss[loss=0.2378, ctc_loss=0.1645, cr_loss=0.3664, over 17157.00 frames. ], tot_loss[loss=0.2328, ctc_loss=0.1586, cr_loss=0.3709, over 3340132.19 frames. ], batch size: 48, lr: 8.68e-03, grad_scale: 32.0 2024-09-23 10:51:02,790 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.057e+02 1.307e+02 1.418e+02 1.626e+02 2.907e+02, threshold=2.837e+02, percent-clipped=1.0 2024-09-23 10:51:12,783 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 10:51:12,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=241957.33333333334, ans=0.125 2024-09-23 10:51:20,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=242004.0, ans=0.0 2024-09-23 10:51:35,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-09-23 10:51:38,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=242050.66666666666, ans=0.0 2024-09-23 10:51:47,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=242097.33333333334, ans=0.125 2024-09-23 10:52:20,962 INFO [train.py:1198] (0/4) Epoch 14, batch 1250, loss[loss=0.1883, ctc_loss=0.1268, cr_loss=0.3074, over 17039.00 frames. ], tot_loss[loss=0.2344, ctc_loss=0.1598, cr_loss=0.3731, over 3335824.52 frames. ], batch size: 39, lr: 8.68e-03, grad_scale: 32.0 2024-09-23 10:52:35,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242190.66666666666, ans=0.1 2024-09-23 10:52:37,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=242190.66666666666, ans=0.1 2024-09-23 10:52:53,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=242237.33333333334, ans=0.125 2024-09-23 10:53:00,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=242284.0, ans=0.0 2024-09-23 10:53:05,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=242284.0, ans=0.0 2024-09-23 10:53:45,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242377.33333333334, ans=0.1 2024-09-23 10:53:49,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2024-09-23 10:53:49,820 INFO [train.py:1198] (0/4) Epoch 14, batch 1300, loss[loss=0.2373, ctc_loss=0.1602, cr_loss=0.3855, over 17352.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1587, cr_loss=0.3719, over 3342775.10 frames. ], batch size: 48, lr: 8.67e-03, grad_scale: 32.0 2024-09-23 10:53:51,343 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.264e+02 1.376e+02 1.514e+02 2.274e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-23 10:53:51,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=242424.0, ans=0.125 2024-09-23 10:53:56,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=242424.0, ans=0.0 2024-09-23 10:54:01,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-23 10:54:14,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=242470.66666666666, ans=0.125 2024-09-23 10:54:19,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=242470.66666666666, ans=0.125 2024-09-23 10:54:33,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=242517.33333333334, ans=0.125 2024-09-23 10:54:59,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.29 vs. limit=22.5 2024-09-23 10:55:03,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=242610.66666666666, ans=0.125 2024-09-23 10:55:04,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=22.5 2024-09-23 10:55:10,059 INFO [train.py:1198] (0/4) Epoch 14, batch 1350, loss[loss=0.1996, ctc_loss=0.139, cr_loss=0.3033, over 17197.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1594, cr_loss=0.372, over 3341289.72 frames. ], batch size: 41, lr: 8.67e-03, grad_scale: 32.0 2024-09-23 10:55:11,948 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-52000.pt 2024-09-23 10:55:25,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=242657.33333333334, ans=0.125 2024-09-23 10:55:36,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=242704.0, ans=0.0 2024-09-23 10:55:50,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=242750.66666666666, ans=0.0 2024-09-23 10:56:15,296 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2024-09-23 10:56:32,048 INFO [train.py:1198] (0/4) Epoch 14, batch 1400, loss[loss=0.2563, ctc_loss=0.1778, cr_loss=0.3925, over 17196.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1592, cr_loss=0.3721, over 3345828.19 frames. ], batch size: 55, lr: 8.67e-03, grad_scale: 32.0 2024-09-23 10:56:33,632 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.305e+02 1.425e+02 1.607e+02 2.757e+02, threshold=2.850e+02, percent-clipped=1.0 2024-09-23 10:56:45,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=242890.66666666666, ans=0.2 2024-09-23 10:57:07,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2024-09-23 10:57:56,934 INFO [train.py:1198] (0/4) Epoch 14, batch 1450, loss[loss=0.1946, ctc_loss=0.1305, cr_loss=0.3205, over 15935.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1585, cr_loss=0.371, over 3352691.83 frames. ], batch size: 35, lr: 8.66e-03, grad_scale: 16.0 2024-09-23 10:57:57,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=243124.0, ans=0.0 2024-09-23 10:58:25,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.84 vs. limit=22.5 2024-09-23 10:58:28,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=243170.66666666666, ans=0.5 2024-09-23 10:58:43,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=243217.33333333334, ans=0.125 2024-09-23 10:58:46,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=243217.33333333334, ans=0.125 2024-09-23 10:59:01,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=243264.0, ans=0.125 2024-09-23 10:59:21,664 INFO [train.py:1198] (0/4) Epoch 14, batch 1500, loss[loss=0.247, ctc_loss=0.1682, cr_loss=0.3943, over 17011.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1585, cr_loss=0.3717, over 3352368.43 frames. ], batch size: 56, lr: 8.66e-03, grad_scale: 16.0 2024-09-23 10:59:24,853 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.258e+02 1.373e+02 1.539e+02 2.095e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-23 10:59:39,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=243404.0, ans=0.2 2024-09-23 10:59:47,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=243404.0, ans=0.09899494936611666 2024-09-23 11:00:00,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-23 11:00:41,715 INFO [train.py:1198] (0/4) Epoch 14, batch 1550, loss[loss=0.2741, ctc_loss=0.1921, cr_loss=0.4099, over 17205.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.1583, cr_loss=0.3714, over 3355324.01 frames. ], batch size: 55, lr: 8.65e-03, grad_scale: 16.0 2024-09-23 11:00:47,656 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-09-23 11:01:01,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=243637.33333333334, ans=0.125 2024-09-23 11:01:01,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=243637.33333333334, ans=0.125 2024-09-23 11:01:12,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=243684.0, ans=0.125 2024-09-23 11:01:20,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=243684.0, ans=0.0 2024-09-23 11:01:27,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=243684.0, ans=0.0 2024-09-23 11:01:36,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=243730.66666666666, ans=0.0 2024-09-23 11:01:54,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=243777.33333333334, ans=0.0 2024-09-23 11:01:54,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-23 11:02:01,666 INFO [train.py:1198] (0/4) Epoch 14, batch 1600, loss[loss=0.2229, ctc_loss=0.151, cr_loss=0.3593, over 17001.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1588, cr_loss=0.3718, over 3345979.58 frames. ], batch size: 44, lr: 8.65e-03, grad_scale: 32.0 2024-09-23 11:02:04,729 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.281e+02 1.419e+02 1.549e+02 2.365e+02, threshold=2.838e+02, percent-clipped=0.0 2024-09-23 11:02:35,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=243870.66666666666, ans=0.0 2024-09-23 11:02:40,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2024-09-23 11:03:02,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=243964.0, ans=0.2 2024-09-23 11:03:05,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=243964.0, ans=0.0 2024-09-23 11:03:05,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-23 11:03:16,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=244010.66666666666, ans=0.125 2024-09-23 11:03:23,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2024-09-23 11:03:30,842 INFO [train.py:1198] (0/4) Epoch 14, batch 1650, loss[loss=0.2244, ctc_loss=0.1531, cr_loss=0.3564, over 16963.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.1601, cr_loss=0.3728, over 3325408.43 frames. ], batch size: 56, lr: 8.64e-03, grad_scale: 32.0 2024-09-23 11:03:43,725 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:03:47,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2024-09-23 11:03:58,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=244104.0, ans=10.0 2024-09-23 11:04:04,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=244150.66666666666, ans=0.125 2024-09-23 11:04:14,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244150.66666666666, ans=0.1 2024-09-23 11:04:19,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=244197.33333333334, ans=0.125 2024-09-23 11:04:19,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=244197.33333333334, ans=0.125 2024-09-23 11:04:21,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-09-23 11:04:50,838 INFO [train.py:1198] (0/4) Epoch 14, batch 1700, loss[loss=0.2338, ctc_loss=0.1594, cr_loss=0.3719, over 17211.00 frames. ], tot_loss[loss=0.2355, ctc_loss=0.1607, cr_loss=0.3742, over 3328312.10 frames. ], batch size: 55, lr: 8.64e-03, grad_scale: 32.0 2024-09-23 11:04:54,009 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.254e+02 1.382e+02 1.612e+02 3.536e+02, threshold=2.764e+02, percent-clipped=2.0 2024-09-23 11:04:58,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.77 vs. limit=22.5 2024-09-23 11:05:33,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-09-23 11:05:46,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.79 vs. limit=12.0 2024-09-23 11:06:10,497 INFO [train.py:1198] (0/4) Epoch 14, batch 1750, loss[loss=0.2602, ctc_loss=0.1789, cr_loss=0.4062, over 17018.00 frames. ], tot_loss[loss=0.2357, ctc_loss=0.1609, cr_loss=0.3739, over 3323935.24 frames. ], batch size: 53, lr: 8.64e-03, grad_scale: 32.0 2024-09-23 11:06:20,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=244524.0, ans=0.0 2024-09-23 11:07:36,229 INFO [train.py:1198] (0/4) Epoch 14, batch 1800, loss[loss=0.2263, ctc_loss=0.1526, cr_loss=0.3684, over 17311.00 frames. ], tot_loss[loss=0.2352, ctc_loss=0.1605, cr_loss=0.3737, over 3334442.47 frames. ], batch size: 51, lr: 8.63e-03, grad_scale: 32.0 2024-09-23 11:07:39,496 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.282e+02 1.372e+02 1.529e+02 2.252e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-23 11:07:47,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=244757.33333333334, ans=0.0 2024-09-23 11:08:09,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=244850.66666666666, ans=0.0 2024-09-23 11:08:23,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=244850.66666666666, ans=0.125 2024-09-23 11:08:28,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-09-23 11:08:37,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244897.33333333334, ans=0.1 2024-09-23 11:08:57,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-09-23 11:09:00,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=22.5 2024-09-23 11:09:01,689 INFO [train.py:1198] (0/4) Epoch 14, batch 1850, loss[loss=0.2717, ctc_loss=0.1881, cr_loss=0.4179, over 15035.00 frames. ], tot_loss[loss=0.2366, ctc_loss=0.1615, cr_loss=0.3754, over 3342350.27 frames. ], batch size: 89, lr: 8.63e-03, grad_scale: 32.0 2024-09-23 11:09:01,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=244990.66666666666, ans=0.125 2024-09-23 11:09:21,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-09-23 11:09:29,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=245037.33333333334, ans=15.0 2024-09-23 11:09:30,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=245037.33333333334, ans=0.2 2024-09-23 11:09:36,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-09-23 11:09:40,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=245084.0, ans=0.125 2024-09-23 11:09:51,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245130.66666666666, ans=0.1 2024-09-23 11:10:04,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=245177.33333333334, ans=0.125 2024-09-23 11:10:04,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=245177.33333333334, ans=0.0 2024-09-23 11:10:21,848 INFO [train.py:1198] (0/4) Epoch 14, batch 1900, loss[loss=0.1691, ctc_loss=0.1102, cr_loss=0.2944, over 17180.00 frames. ], tot_loss[loss=0.2348, ctc_loss=0.1601, cr_loss=0.3734, over 3346119.52 frames. ], batch size: 41, lr: 8.62e-03, grad_scale: 32.0 2024-09-23 11:10:25,082 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.260e+02 1.374e+02 1.529e+02 3.130e+02, threshold=2.747e+02, percent-clipped=1.0 2024-09-23 11:10:36,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=245270.66666666666, ans=0.2 2024-09-23 11:10:40,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=245270.66666666666, ans=0.0 2024-09-23 11:10:48,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=245270.66666666666, ans=0.1 2024-09-23 11:11:13,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=245364.0, ans=0.2 2024-09-23 11:11:19,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=245364.0, ans=0.09899494936611666 2024-09-23 11:11:29,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=245410.66666666666, ans=0.125 2024-09-23 11:11:41,463 INFO [train.py:1198] (0/4) Epoch 14, batch 1950, loss[loss=0.2447, ctc_loss=0.1658, cr_loss=0.3944, over 16998.00 frames. ], tot_loss[loss=0.2349, ctc_loss=0.1603, cr_loss=0.3733, over 3348491.54 frames. ], batch size: 56, lr: 8.62e-03, grad_scale: 32.0 2024-09-23 11:11:51,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245457.33333333334, ans=0.1 2024-09-23 11:11:53,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=245457.33333333334, ans=0.025 2024-09-23 11:11:53,938 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2024-09-23 11:12:10,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-09-23 11:12:46,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=245597.33333333334, ans=0.125 2024-09-23 11:12:51,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=245644.0, ans=0.2 2024-09-23 11:12:52,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=245644.0, ans=0.0 2024-09-23 11:13:09,060 INFO [train.py:1198] (0/4) Epoch 14, batch 2000, loss[loss=0.2433, ctc_loss=0.1676, cr_loss=0.3785, over 17237.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.159, cr_loss=0.3711, over 3343355.05 frames. ], batch size: 55, lr: 8.62e-03, grad_scale: 32.0 2024-09-23 11:13:14,677 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.297e+02 1.399e+02 1.635e+02 2.518e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-23 11:13:37,524 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:13:50,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=245784.0, ans=0.125 2024-09-23 11:14:07,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=245830.66666666666, ans=0.125 2024-09-23 11:14:21,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=245877.33333333334, ans=0.2 2024-09-23 11:14:25,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=245877.33333333334, ans=0.0 2024-09-23 11:14:31,853 INFO [train.py:1198] (0/4) Epoch 14, batch 2050, loss[loss=0.2533, ctc_loss=0.1756, cr_loss=0.3883, over 16955.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1593, cr_loss=0.3725, over 3356246.80 frames. ], batch size: 58, lr: 8.61e-03, grad_scale: 32.0 2024-09-23 11:14:35,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=245924.0, ans=0.125 2024-09-23 11:14:42,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2024-09-23 11:15:29,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=246064.0, ans=0.2 2024-09-23 11:15:33,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=246064.0, ans=0.09899494936611666 2024-09-23 11:15:33,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-09-23 11:15:52,135 INFO [train.py:1198] (0/4) Epoch 14, batch 2100, loss[loss=0.2495, ctc_loss=0.1691, cr_loss=0.4019, over 16589.00 frames. ], tot_loss[loss=0.2336, ctc_loss=0.1591, cr_loss=0.3727, over 3355095.21 frames. ], batch size: 66, lr: 8.61e-03, grad_scale: 32.0 2024-09-23 11:15:55,403 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.001e+02 1.226e+02 1.315e+02 1.406e+02 3.033e+02, threshold=2.629e+02, percent-clipped=1.0 2024-09-23 11:16:00,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=246157.33333333334, ans=0.0 2024-09-23 11:16:00,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=246157.33333333334, ans=0.0 2024-09-23 11:16:39,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=22.5 2024-09-23 11:16:52,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=246297.33333333334, ans=0.125 2024-09-23 11:17:07,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=246344.0, ans=0.125 2024-09-23 11:17:15,230 INFO [train.py:1198] (0/4) Epoch 14, batch 2150, loss[loss=0.2467, ctc_loss=0.1686, cr_loss=0.3904, over 17317.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1583, cr_loss=0.372, over 3359527.88 frames. ], batch size: 51, lr: 8.60e-03, grad_scale: 32.0 2024-09-23 11:17:31,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=246390.66666666666, ans=0.125 2024-09-23 11:17:34,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=246437.33333333334, ans=0.125 2024-09-23 11:18:01,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=246484.0, ans=0.0 2024-09-23 11:18:43,400 INFO [train.py:1198] (0/4) Epoch 14, batch 2200, loss[loss=0.253, ctc_loss=0.1729, cr_loss=0.4005, over 17236.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1593, cr_loss=0.3733, over 3351407.09 frames. ], batch size: 44, lr: 8.60e-03, grad_scale: 32.0 2024-09-23 11:18:45,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=246624.0, ans=0.04949747468305833 2024-09-23 11:18:46,517 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.334e+02 1.425e+02 1.540e+02 2.133e+02, threshold=2.850e+02, percent-clipped=0.0 2024-09-23 11:18:57,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=246670.66666666666, ans=0.125 2024-09-23 11:18:59,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246670.66666666666, ans=0.1 2024-09-23 11:19:20,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=246717.33333333334, ans=0.025 2024-09-23 11:19:23,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=246717.33333333334, ans=0.125 2024-09-23 11:19:33,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=246764.0, ans=0.125 2024-09-23 11:19:41,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=246764.0, ans=0.025 2024-09-23 11:20:02,898 INFO [train.py:1198] (0/4) Epoch 14, batch 2250, loss[loss=0.2181, ctc_loss=0.1462, cr_loss=0.3594, over 17015.00 frames. ], tot_loss[loss=0.2334, ctc_loss=0.1588, cr_loss=0.3728, over 3361614.66 frames. ], batch size: 52, lr: 8.60e-03, grad_scale: 16.0 2024-09-23 11:20:17,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=246904.0, ans=0.0 2024-09-23 11:20:28,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=246904.0, ans=0.125 2024-09-23 11:20:36,932 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.37 vs. limit=10.0 2024-09-23 11:20:53,833 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:21:19,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=247044.0, ans=0.1 2024-09-23 11:21:22,505 INFO [train.py:1198] (0/4) Epoch 14, batch 2300, loss[loss=0.2035, ctc_loss=0.1359, cr_loss=0.3381, over 17268.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1594, cr_loss=0.3736, over 3363465.04 frames. ], batch size: 44, lr: 8.59e-03, grad_scale: 16.0 2024-09-23 11:21:25,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=15.0 2024-09-23 11:21:27,261 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.285e+02 1.398e+02 1.588e+02 2.479e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 11:21:36,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=247090.66666666666, ans=10.0 2024-09-23 11:21:45,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=247137.33333333334, ans=0.0 2024-09-23 11:21:55,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=247184.0, ans=0.125 2024-09-23 11:22:05,873 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:22:12,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.86 vs. limit=15.0 2024-09-23 11:22:21,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-09-23 11:22:50,510 INFO [train.py:1198] (0/4) Epoch 14, batch 2350, loss[loss=0.2734, ctc_loss=0.1903, cr_loss=0.4157, over 16090.00 frames. ], tot_loss[loss=0.2324, ctc_loss=0.1581, cr_loss=0.3714, over 3364081.39 frames. ], batch size: 74, lr: 8.59e-03, grad_scale: 16.0 2024-09-23 11:22:50,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=247324.0, ans=10.0 2024-09-23 11:23:25,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=247417.33333333334, ans=0.2 2024-09-23 11:23:52,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=247464.0, ans=0.125 2024-09-23 11:24:00,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=247510.66666666666, ans=0.2 2024-09-23 11:24:05,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=247510.66666666666, ans=0.0 2024-09-23 11:24:12,732 INFO [train.py:1198] (0/4) Epoch 14, batch 2400, loss[loss=0.1972, ctc_loss=0.1307, cr_loss=0.3325, over 16949.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1569, cr_loss=0.3698, over 3367780.09 frames. ], batch size: 42, lr: 8.58e-03, grad_scale: 32.0 2024-09-23 11:24:17,509 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.242e+02 1.312e+02 1.459e+02 2.054e+02, threshold=2.624e+02, percent-clipped=0.0 2024-09-23 11:24:29,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.49 vs. limit=22.5 2024-09-23 11:24:29,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-09-23 11:24:38,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=247604.0, ans=0.125 2024-09-23 11:24:50,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=247650.66666666666, ans=10.0 2024-09-23 11:24:50,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=247650.66666666666, ans=0.1 2024-09-23 11:25:00,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=247697.33333333334, ans=0.1 2024-09-23 11:25:08,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=247697.33333333334, ans=0.025 2024-09-23 11:25:29,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=247744.0, ans=0.125 2024-09-23 11:25:32,182 INFO [train.py:1198] (0/4) Epoch 14, batch 2450, loss[loss=0.2377, ctc_loss=0.1652, cr_loss=0.3624, over 14935.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1574, cr_loss=0.3707, over 3361886.18 frames. ], batch size: 89, lr: 8.58e-03, grad_scale: 32.0 2024-09-23 11:25:38,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=247790.66666666666, ans=0.0 2024-09-23 11:25:48,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=247837.33333333334, ans=0.125 2024-09-23 11:26:38,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-23 11:26:43,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=247977.33333333334, ans=0.5 2024-09-23 11:26:44,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=22.5 2024-09-23 11:26:51,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=247977.33333333334, ans=0.125 2024-09-23 11:26:53,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=248024.0, ans=0.1 2024-09-23 11:26:54,749 INFO [train.py:1198] (0/4) Epoch 14, batch 2500, loss[loss=0.3294, ctc_loss=0.2462, cr_loss=0.4156, over 11257.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1583, cr_loss=0.3717, over 3354962.83 frames. ], batch size: 123, lr: 8.58e-03, grad_scale: 32.0 2024-09-23 11:26:58,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=12.0 2024-09-23 11:26:59,531 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.311e+02 1.464e+02 1.674e+02 2.701e+02, threshold=2.928e+02, percent-clipped=1.0 2024-09-23 11:26:59,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=248024.0, ans=0.125 2024-09-23 11:28:04,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=248210.66666666666, ans=10.0 2024-09-23 11:28:21,883 INFO [train.py:1198] (0/4) Epoch 14, batch 2550, loss[loss=0.2299, ctc_loss=0.157, cr_loss=0.3645, over 17304.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1588, cr_loss=0.3719, over 3352774.42 frames. ], batch size: 49, lr: 8.57e-03, grad_scale: 32.0 2024-09-23 11:28:22,302 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:28:55,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=248350.66666666666, ans=0.125 2024-09-23 11:29:28,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-09-23 11:29:42,066 INFO [train.py:1198] (0/4) Epoch 14, batch 2600, loss[loss=0.2513, ctc_loss=0.1704, cr_loss=0.4048, over 17045.00 frames. ], tot_loss[loss=0.2337, ctc_loss=0.1591, cr_loss=0.373, over 3353896.58 frames. ], batch size: 52, lr: 8.57e-03, grad_scale: 32.0 2024-09-23 11:29:46,745 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.321e+02 1.442e+02 1.644e+02 2.368e+02, threshold=2.883e+02, percent-clipped=0.0 2024-09-23 11:29:54,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=248490.66666666666, ans=0.0 2024-09-23 11:30:01,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=22.5 2024-09-23 11:30:11,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=248537.33333333334, ans=0.2 2024-09-23 11:30:19,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2024-09-23 11:31:02,247 INFO [train.py:1198] (0/4) Epoch 14, batch 2650, loss[loss=0.209, ctc_loss=0.1397, cr_loss=0.3467, over 17172.00 frames. ], tot_loss[loss=0.233, ctc_loss=0.1586, cr_loss=0.372, over 3359110.37 frames. ], batch size: 41, lr: 8.56e-03, grad_scale: 32.0 2024-09-23 11:31:04,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=248724.0, ans=0.125 2024-09-23 11:32:20,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=248910.66666666666, ans=0.125 2024-09-23 11:32:20,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=248910.66666666666, ans=0.025 2024-09-23 11:32:26,707 INFO [train.py:1198] (0/4) Epoch 14, batch 2700, loss[loss=0.3117, ctc_loss=0.2239, cr_loss=0.439, over 11746.00 frames. ], tot_loss[loss=0.2346, ctc_loss=0.16, cr_loss=0.3732, over 3333441.53 frames. ], batch size: 124, lr: 8.56e-03, grad_scale: 32.0 2024-09-23 11:32:31,437 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.327e+02 1.447e+02 1.619e+02 2.182e+02, threshold=2.895e+02, percent-clipped=0.0 2024-09-23 11:32:38,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248957.33333333334, ans=0.1 2024-09-23 11:32:50,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=249004.0, ans=0.09899494936611666 2024-09-23 11:33:03,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=249050.66666666666, ans=0.125 2024-09-23 11:33:05,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=249050.66666666666, ans=0.09899494936611666 2024-09-23 11:33:51,511 INFO [train.py:1198] (0/4) Epoch 14, batch 2750, loss[loss=0.2404, ctc_loss=0.165, cr_loss=0.3771, over 17296.00 frames. ], tot_loss[loss=0.234, ctc_loss=0.1595, cr_loss=0.3726, over 3346436.09 frames. ], batch size: 51, lr: 8.56e-03, grad_scale: 32.0 2024-09-23 11:34:10,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=22.5 2024-09-23 11:34:33,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=249284.0, ans=0.125 2024-09-23 11:34:39,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=249330.66666666666, ans=0.0 2024-09-23 11:34:45,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2024-09-23 11:35:10,882 INFO [train.py:1198] (0/4) Epoch 14, batch 2800, loss[loss=0.2603, ctc_loss=0.18, cr_loss=0.4014, over 17003.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1595, cr_loss=0.374, over 3354524.33 frames. ], batch size: 56, lr: 8.55e-03, grad_scale: 32.0 2024-09-23 11:35:14,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-09-23 11:35:15,651 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.308e+02 1.382e+02 1.526e+02 2.267e+02, threshold=2.765e+02, percent-clipped=0.0 2024-09-23 11:35:25,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=249470.66666666666, ans=0.125 2024-09-23 11:35:54,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=249517.33333333334, ans=0.125 2024-09-23 11:35:57,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=249564.0, ans=0.125 2024-09-23 11:36:08,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2024-09-23 11:36:13,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=249610.66666666666, ans=0.125 2024-09-23 11:36:20,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=249610.66666666666, ans=0.0 2024-09-23 11:36:26,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249610.66666666666, ans=0.1 2024-09-23 11:36:31,110 INFO [train.py:1198] (0/4) Epoch 14, batch 2850, loss[loss=0.2102, ctc_loss=0.1396, cr_loss=0.3526, over 17225.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1583, cr_loss=0.3727, over 3362359.08 frames. ], batch size: 47, lr: 8.55e-03, grad_scale: 16.0 2024-09-23 11:36:34,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=249657.33333333334, ans=0.025 2024-09-23 11:36:37,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249657.33333333334, ans=0.1 2024-09-23 11:36:51,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2024-09-23 11:37:21,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=249750.66666666666, ans=0.0 2024-09-23 11:37:52,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=249844.0, ans=0.125 2024-09-23 11:38:01,840 INFO [train.py:1198] (0/4) Epoch 14, batch 2900, loss[loss=0.3082, ctc_loss=0.2198, cr_loss=0.4416, over 11943.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1574, cr_loss=0.371, over 3355175.43 frames. ], batch size: 124, lr: 8.55e-03, grad_scale: 16.0 2024-09-23 11:38:06,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249890.66666666666, ans=0.1 2024-09-23 11:38:08,318 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.270e+02 1.418e+02 1.645e+02 2.792e+02, threshold=2.835e+02, percent-clipped=1.0 2024-09-23 11:38:15,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=249890.66666666666, ans=0.2 2024-09-23 11:38:32,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=249984.0, ans=0.04949747468305833 2024-09-23 11:38:43,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=249984.0, ans=0.125 2024-09-23 11:38:49,707 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:39:10,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=250077.33333333334, ans=0.0 2024-09-23 11:39:21,420 INFO [train.py:1198] (0/4) Epoch 14, batch 2950, loss[loss=0.2408, ctc_loss=0.1613, cr_loss=0.3978, over 17335.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.157, cr_loss=0.3695, over 3355347.28 frames. ], batch size: 48, lr: 8.54e-03, grad_scale: 16.0 2024-09-23 11:39:25,567 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=22.5 2024-09-23 11:39:26,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=250124.0, ans=0.125 2024-09-23 11:39:30,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-09-23 11:39:50,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=250170.66666666666, ans=0.125 2024-09-23 11:40:14,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=250264.0, ans=0.125 2024-09-23 11:40:20,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=250264.0, ans=0.125 2024-09-23 11:40:20,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250264.0, ans=0.125 2024-09-23 11:40:23,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=250310.66666666666, ans=0.2 2024-09-23 11:40:29,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=250310.66666666666, ans=0.125 2024-09-23 11:40:32,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=250310.66666666666, ans=0.09899494936611666 2024-09-23 11:40:39,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2024-09-23 11:40:40,236 INFO [train.py:1198] (0/4) Epoch 14, batch 3000, loss[loss=0.2564, ctc_loss=0.1771, cr_loss=0.3961, over 17033.00 frames. ], tot_loss[loss=0.2313, ctc_loss=0.1573, cr_loss=0.3702, over 3358733.26 frames. ], batch size: 52, lr: 8.54e-03, grad_scale: 16.0 2024-09-23 11:40:40,237 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 11:40:55,477 INFO [train.py:1230] (0/4) Epoch 14, validation: loss=0.04331, ctc_loss=0.04331, cr_loss=7.532e-15, over 944034.00 frames. 2024-09-23 11:40:55,478 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 11:40:58,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=250357.33333333334, ans=0.2 2024-09-23 11:41:00,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=250357.33333333334, ans=0.09899494936611666 2024-09-23 11:41:01,569 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.302e+02 1.389e+02 1.457e+02 1.974e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-23 11:41:07,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2024-09-23 11:41:50,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-23 11:41:57,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=250544.0, ans=0.1 2024-09-23 11:42:00,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=250544.0, ans=0.125 2024-09-23 11:42:13,882 INFO [train.py:1198] (0/4) Epoch 14, batch 3050, loss[loss=0.1843, ctc_loss=0.1214, cr_loss=0.3146, over 17046.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1578, cr_loss=0.3717, over 3356437.55 frames. ], batch size: 39, lr: 8.53e-03, grad_scale: 16.0 2024-09-23 11:42:20,559 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:42:50,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250684.0, ans=0.1 2024-09-23 11:42:50,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2024-09-23 11:42:52,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-09-23 11:43:09,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=250730.66666666666, ans=0.125 2024-09-23 11:43:28,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=250777.33333333334, ans=0.0 2024-09-23 11:43:34,615 INFO [train.py:1198] (0/4) Epoch 14, batch 3100, loss[loss=0.1982, ctc_loss=0.1313, cr_loss=0.3345, over 17248.00 frames. ], tot_loss[loss=0.232, ctc_loss=0.1577, cr_loss=0.3712, over 3349323.38 frames. ], batch size: 44, lr: 8.53e-03, grad_scale: 16.0 2024-09-23 11:43:36,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=250824.0, ans=0.125 2024-09-23 11:43:40,946 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.263e+02 1.328e+02 1.443e+02 2.080e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-23 11:43:53,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-23 11:44:25,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=8.0 2024-09-23 11:44:43,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=251010.66666666666, ans=0.0 2024-09-23 11:44:46,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=251010.66666666666, ans=0.125 2024-09-23 11:44:55,828 INFO [train.py:1198] (0/4) Epoch 14, batch 3150, loss[loss=0.2362, ctc_loss=0.1648, cr_loss=0.3571, over 17118.00 frames. ], tot_loss[loss=0.2327, ctc_loss=0.1581, cr_loss=0.3726, over 3342242.05 frames. ], batch size: 49, lr: 8.53e-03, grad_scale: 16.0 2024-09-23 11:45:04,279 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:45:57,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-09-23 11:46:08,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251244.0, ans=0.1 2024-09-23 11:46:12,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=251244.0, ans=0.125 2024-09-23 11:46:18,649 INFO [train.py:1198] (0/4) Epoch 14, batch 3200, loss[loss=0.2531, ctc_loss=0.1747, cr_loss=0.3917, over 17172.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1576, cr_loss=0.3709, over 3344703.75 frames. ], batch size: 45, lr: 8.52e-03, grad_scale: 32.0 2024-09-23 11:46:18,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251290.66666666666, ans=0.1 2024-09-23 11:46:24,743 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.267e+02 1.361e+02 1.514e+02 1.918e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-23 11:46:28,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.98 vs. limit=15.0 2024-09-23 11:46:56,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=251384.0, ans=0.125 2024-09-23 11:47:17,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=251430.66666666666, ans=0.1 2024-09-23 11:47:36,195 INFO [train.py:1198] (0/4) Epoch 14, batch 3250, loss[loss=0.2228, ctc_loss=0.1524, cr_loss=0.3518, over 17017.00 frames. ], tot_loss[loss=0.2331, ctc_loss=0.1586, cr_loss=0.3726, over 3352959.26 frames. ], batch size: 51, lr: 8.52e-03, grad_scale: 32.0 2024-09-23 11:47:38,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=251524.0, ans=0.125 2024-09-23 11:47:46,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=251524.0, ans=0.125 2024-09-23 11:47:55,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251570.66666666666, ans=0.1 2024-09-23 11:48:10,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=251617.33333333334, ans=0.0 2024-09-23 11:48:12,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=251617.33333333334, ans=0.125 2024-09-23 11:48:26,280 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:48:31,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=251664.0, ans=0.125 2024-09-23 11:48:52,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=251757.33333333334, ans=0.0 2024-09-23 11:48:54,097 INFO [train.py:1198] (0/4) Epoch 14, batch 3300, loss[loss=0.2684, ctc_loss=0.1837, cr_loss=0.4233, over 17006.00 frames. ], tot_loss[loss=0.2341, ctc_loss=0.1592, cr_loss=0.3741, over 3355541.43 frames. ], batch size: 51, lr: 8.51e-03, grad_scale: 32.0 2024-09-23 11:48:59,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=251757.33333333334, ans=0.125 2024-09-23 11:49:00,432 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.303e+02 1.410e+02 1.606e+02 3.318e+02, threshold=2.819e+02, percent-clipped=1.0 2024-09-23 11:49:25,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=251850.66666666666, ans=0.125 2024-09-23 11:49:26,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=12.0 2024-09-23 11:49:57,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=251944.0, ans=0.125 2024-09-23 11:50:07,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=251944.0, ans=0.125 2024-09-23 11:50:12,288 INFO [train.py:1198] (0/4) Epoch 14, batch 3350, loss[loss=0.279, ctc_loss=0.1896, cr_loss=0.4471, over 17045.00 frames. ], tot_loss[loss=0.2339, ctc_loss=0.1591, cr_loss=0.3738, over 3353471.95 frames. ], batch size: 52, lr: 8.51e-03, grad_scale: 16.0 2024-09-23 11:50:26,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=252037.33333333334, ans=0.1 2024-09-23 11:50:32,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=252037.33333333334, ans=0.125 2024-09-23 11:50:37,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=252037.33333333334, ans=0.125 2024-09-23 11:51:02,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=252130.66666666666, ans=0.0 2024-09-23 11:51:29,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=252224.0, ans=0.125 2024-09-23 11:51:30,494 INFO [train.py:1198] (0/4) Epoch 14, batch 3400, loss[loss=0.2709, ctc_loss=0.1928, cr_loss=0.3909, over 15204.00 frames. ], tot_loss[loss=0.2338, ctc_loss=0.1593, cr_loss=0.3727, over 3353560.61 frames. ], batch size: 89, lr: 8.51e-03, grad_scale: 16.0 2024-09-23 11:51:38,161 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.275e+02 1.402e+02 1.543e+02 4.509e+02, threshold=2.804e+02, percent-clipped=1.0 2024-09-23 11:51:40,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=252224.0, ans=0.125 2024-09-23 11:51:43,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=252224.0, ans=0.125 2024-09-23 11:51:54,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=252270.66666666666, ans=0.1 2024-09-23 11:52:05,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-23 11:52:23,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252364.0, ans=0.1 2024-09-23 11:52:24,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=252364.0, ans=0.015 2024-09-23 11:52:35,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=252410.66666666666, ans=0.09899494936611666 2024-09-23 11:52:37,372 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 11:52:48,095 INFO [train.py:1198] (0/4) Epoch 14, batch 3450, loss[loss=0.2368, ctc_loss=0.157, cr_loss=0.3993, over 17308.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1584, cr_loss=0.3711, over 3354823.00 frames. ], batch size: 49, lr: 8.50e-03, grad_scale: 16.0 2024-09-23 11:52:54,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=252457.33333333334, ans=0.09899494936611666 2024-09-23 11:52:54,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=252457.33333333334, ans=0.0 2024-09-23 11:52:57,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=252457.33333333334, ans=0.125 2024-09-23 11:53:04,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=252504.0, ans=0.09899494936611666 2024-09-23 11:53:15,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=252504.0, ans=0.1 2024-09-23 11:53:18,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=252550.66666666666, ans=0.125 2024-09-23 11:53:50,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=252597.33333333334, ans=0.125 2024-09-23 11:54:04,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=252644.0, ans=0.07 2024-09-23 11:54:08,275 INFO [train.py:1198] (0/4) Epoch 14, batch 3500, loss[loss=0.1913, ctc_loss=0.1291, cr_loss=0.3112, over 16379.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1581, cr_loss=0.3704, over 3356034.61 frames. ], batch size: 36, lr: 8.50e-03, grad_scale: 16.0 2024-09-23 11:54:18,114 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.252e+02 1.352e+02 1.458e+02 2.935e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-23 11:54:22,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=252690.66666666666, ans=0.125 2024-09-23 11:54:39,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=252784.0, ans=0.0 2024-09-23 11:54:47,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=252784.0, ans=0.125 2024-09-23 11:55:09,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=252830.66666666666, ans=0.2 2024-09-23 11:55:09,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=252830.66666666666, ans=0.0 2024-09-23 11:55:31,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=252924.0, ans=0.0 2024-09-23 11:55:32,295 INFO [train.py:1198] (0/4) Epoch 14, batch 3550, loss[loss=0.2017, ctc_loss=0.1339, cr_loss=0.3391, over 17219.00 frames. ], tot_loss[loss=0.2318, ctc_loss=0.1577, cr_loss=0.3702, over 3360873.66 frames. ], batch size: 47, lr: 8.49e-03, grad_scale: 16.0 2024-09-23 11:55:48,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=252970.66666666666, ans=0.125 2024-09-23 11:56:07,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=12.0 2024-09-23 11:56:18,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=253064.0, ans=0.0 2024-09-23 11:56:24,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=253064.0, ans=0.0 2024-09-23 11:56:35,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-09-23 11:56:51,150 INFO [train.py:1198] (0/4) Epoch 14, batch 3600, loss[loss=0.235, ctc_loss=0.1581, cr_loss=0.3847, over 17297.00 frames. ], tot_loss[loss=0.2317, ctc_loss=0.1578, cr_loss=0.3699, over 3355715.80 frames. ], batch size: 49, lr: 8.49e-03, grad_scale: 32.0 2024-09-23 11:56:56,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=253157.33333333334, ans=0.125 2024-09-23 11:56:58,886 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.239e+02 1.349e+02 1.491e+02 2.999e+02, threshold=2.699e+02, percent-clipped=1.0 2024-09-23 11:56:59,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=253157.33333333334, ans=0.125 2024-09-23 11:57:08,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=253204.0, ans=0.0 2024-09-23 11:57:09,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=253204.0, ans=0.2 2024-09-23 11:57:10,636 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=22.5 2024-09-23 11:57:13,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2024-09-23 11:57:22,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=253250.66666666666, ans=0.0 2024-09-23 11:57:39,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253297.33333333334, ans=0.1 2024-09-23 11:57:54,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2024-09-23 11:58:00,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2024-09-23 11:58:08,983 INFO [train.py:1198] (0/4) Epoch 14, batch 3650, loss[loss=0.2155, ctc_loss=0.1464, cr_loss=0.3455, over 17015.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1578, cr_loss=0.3708, over 3356192.49 frames. ], batch size: 44, lr: 8.49e-03, grad_scale: 32.0 2024-09-23 11:58:12,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=253390.66666666666, ans=0.125 2024-09-23 11:58:21,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=253390.66666666666, ans=0.035 2024-09-23 11:59:03,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=253530.66666666666, ans=0.125 2024-09-23 11:59:17,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=253577.33333333334, ans=0.2 2024-09-23 11:59:28,445 INFO [train.py:1198] (0/4) Epoch 14, batch 3700, loss[loss=0.2018, ctc_loss=0.1338, cr_loss=0.3402, over 17285.00 frames. ], tot_loss[loss=0.2326, ctc_loss=0.1582, cr_loss=0.3721, over 3357484.36 frames. ], batch size: 46, lr: 8.48e-03, grad_scale: 32.0 2024-09-23 11:59:29,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2024-09-23 11:59:36,288 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.294e+02 1.387e+02 1.607e+02 1.987e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-23 11:59:42,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=253670.66666666666, ans=0.125 2024-09-23 12:00:46,836 INFO [train.py:1198] (0/4) Epoch 14, batch 3750, loss[loss=0.1978, ctc_loss=0.1299, cr_loss=0.3396, over 17183.00 frames. ], tot_loss[loss=0.2332, ctc_loss=0.1587, cr_loss=0.3725, over 3346754.11 frames. ], batch size: 41, lr: 8.48e-03, grad_scale: 32.0 2024-09-23 12:02:03,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=254090.66666666666, ans=0.125 2024-09-23 12:02:05,050 INFO [train.py:1198] (0/4) Epoch 14, batch 3800, loss[loss=0.3224, ctc_loss=0.2413, cr_loss=0.4055, over 11663.00 frames. ], tot_loss[loss=0.2343, ctc_loss=0.1597, cr_loss=0.373, over 3319919.64 frames. ], batch size: 123, lr: 8.48e-03, grad_scale: 32.0 2024-09-23 12:02:13,078 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.380e+02 1.551e+02 1.777e+02 3.575e+02, threshold=3.102e+02, percent-clipped=2.0 2024-09-23 12:02:45,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=254184.0, ans=0.125 2024-09-23 12:02:54,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=254230.66666666666, ans=0.125 2024-09-23 12:03:14,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.77 vs. limit=22.5 2024-09-23 12:03:24,332 INFO [train.py:1198] (0/4) Epoch 14, batch 3850, loss[loss=0.227, ctc_loss=0.155, cr_loss=0.36, over 16878.00 frames. ], tot_loss[loss=0.2342, ctc_loss=0.1598, cr_loss=0.3721, over 3297656.69 frames. ], batch size: 58, lr: 8.47e-03, grad_scale: 32.0 2024-09-23 12:03:40,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=254370.66666666666, ans=0.2 2024-09-23 12:03:50,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=254370.66666666666, ans=0.07 2024-09-23 12:04:05,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254417.33333333334, ans=0.1 2024-09-23 12:04:08,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=254417.33333333334, ans=0.125 2024-09-23 12:04:28,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=254510.66666666666, ans=0.0 2024-09-23 12:04:35,354 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-14.pt 2024-09-23 12:05:28,604 INFO [train.py:1198] (0/4) Epoch 15, batch 0, loss[loss=0.2545, ctc_loss=0.1761, cr_loss=0.3918, over 16566.00 frames. ], tot_loss[loss=0.2545, ctc_loss=0.1761, cr_loss=0.3918, over 16566.00 frames. ], batch size: 66, lr: 8.18e-03, grad_scale: 32.0 2024-09-23 12:05:28,605 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 12:05:46,333 INFO [train.py:1230] (0/4) Epoch 15, validation: loss=0.0431, ctc_loss=0.0431, cr_loss=7.486e-15, over 944034.00 frames. 2024-09-23 12:05:46,334 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 12:05:46,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=254538.66666666666, ans=0.2 2024-09-23 12:05:49,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=254538.66666666666, ans=0.125 2024-09-23 12:05:57,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=254538.66666666666, ans=0.125 2024-09-23 12:06:00,796 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.387e+02 1.561e+02 1.706e+02 2.670e+02, threshold=3.121e+02, percent-clipped=0.0 2024-09-23 12:06:34,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=254678.66666666666, ans=0.025 2024-09-23 12:06:40,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=254678.66666666666, ans=0.0 2024-09-23 12:06:48,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=254725.33333333334, ans=0.04949747468305833 2024-09-23 12:06:58,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=254725.33333333334, ans=0.125 2024-09-23 12:07:05,544 INFO [train.py:1198] (0/4) Epoch 15, batch 50, loss[loss=0.2143, ctc_loss=0.144, cr_loss=0.3518, over 17213.00 frames. ], tot_loss[loss=0.2323, ctc_loss=0.1573, cr_loss=0.3747, over 764799.87 frames. ], batch size: 47, lr: 8.18e-03, grad_scale: 32.0 2024-09-23 12:07:19,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=254772.0, ans=0.0 2024-09-23 12:08:01,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=254912.0, ans=0.125 2024-09-23 12:08:09,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=254912.0, ans=0.1 2024-09-23 12:08:28,618 INFO [train.py:1198] (0/4) Epoch 15, batch 100, loss[loss=0.2214, ctc_loss=0.1473, cr_loss=0.3702, over 17155.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1552, cr_loss=0.3691, over 1328779.23 frames. ], batch size: 48, lr: 8.17e-03, grad_scale: 32.0 2024-09-23 12:08:41,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=255005.33333333334, ans=0.125 2024-09-23 12:08:42,830 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.954e+01 1.228e+02 1.306e+02 1.476e+02 1.867e+02, threshold=2.613e+02, percent-clipped=0.0 2024-09-23 12:09:18,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=255145.33333333334, ans=0.125 2024-09-23 12:09:20,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.27 vs. limit=22.5 2024-09-23 12:09:33,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=255192.0, ans=0.035 2024-09-23 12:09:37,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.84 vs. limit=10.0 2024-09-23 12:09:44,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=255192.0, ans=0.0 2024-09-23 12:09:47,785 INFO [train.py:1198] (0/4) Epoch 15, batch 150, loss[loss=0.2459, ctc_loss=0.17, cr_loss=0.3792, over 17363.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1552, cr_loss=0.3692, over 1786384.32 frames. ], batch size: 48, lr: 8.17e-03, grad_scale: 32.0 2024-09-23 12:09:50,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=255238.66666666666, ans=0.1 2024-09-23 12:09:54,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=255238.66666666666, ans=0.035 2024-09-23 12:09:54,704 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:10:00,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=255238.66666666666, ans=0.125 2024-09-23 12:10:16,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=255285.33333333334, ans=0.2 2024-09-23 12:10:39,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=255332.0, ans=0.125 2024-09-23 12:11:14,513 INFO [train.py:1198] (0/4) Epoch 15, batch 200, loss[loss=0.2386, ctc_loss=0.1605, cr_loss=0.3907, over 17018.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1552, cr_loss=0.3698, over 2138831.04 frames. ], batch size: 51, lr: 8.16e-03, grad_scale: 32.0 2024-09-23 12:11:28,889 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.243e+02 1.308e+02 1.422e+02 1.839e+02, threshold=2.616e+02, percent-clipped=0.0 2024-09-23 12:11:48,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=255565.33333333334, ans=0.0 2024-09-23 12:11:54,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=255565.33333333334, ans=0.025 2024-09-23 12:12:07,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=255612.0, ans=0.125 2024-09-23 12:12:13,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=255612.0, ans=0.125 2024-09-23 12:12:25,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=15.0 2024-09-23 12:12:33,974 INFO [train.py:1198] (0/4) Epoch 15, batch 250, loss[loss=0.2746, ctc_loss=0.1854, cr_loss=0.4462, over 16473.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1564, cr_loss=0.3715, over 2417704.51 frames. ], batch size: 66, lr: 8.16e-03, grad_scale: 32.0 2024-09-23 12:12:40,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=255705.33333333334, ans=0.05 2024-09-23 12:12:42,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=255705.33333333334, ans=0.04949747468305833 2024-09-23 12:12:53,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=255752.0, ans=0.125 2024-09-23 12:13:09,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2024-09-23 12:13:23,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=255845.33333333334, ans=0.125 2024-09-23 12:13:25,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=255845.33333333334, ans=0.125 2024-09-23 12:13:56,726 INFO [train.py:1198] (0/4) Epoch 15, batch 300, loss[loss=0.2374, ctc_loss=0.1633, cr_loss=0.3708, over 17078.00 frames. ], tot_loss[loss=0.2304, ctc_loss=0.1562, cr_loss=0.3705, over 2638554.97 frames. ], batch size: 46, lr: 8.16e-03, grad_scale: 32.0 2024-09-23 12:14:01,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=255938.66666666666, ans=0.1 2024-09-23 12:14:01,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2024-09-23 12:14:10,831 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.341e+02 1.470e+02 1.683e+02 2.993e+02, threshold=2.941e+02, percent-clipped=1.0 2024-09-23 12:14:18,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.71 vs. limit=12.0 2024-09-23 12:14:19,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-09-23 12:14:24,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=255985.33333333334, ans=15.0 2024-09-23 12:14:28,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=256032.0, ans=0.125 2024-09-23 12:14:33,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=256032.0, ans=0.035 2024-09-23 12:14:36,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=256032.0, ans=0.125 2024-09-23 12:14:39,734 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-23 12:14:53,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-23 12:14:59,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=256078.66666666666, ans=0.125 2024-09-23 12:15:21,581 INFO [train.py:1198] (0/4) Epoch 15, batch 350, loss[loss=0.2562, ctc_loss=0.1736, cr_loss=0.4134, over 16995.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1555, cr_loss=0.3682, over 2802306.50 frames. ], batch size: 56, lr: 8.15e-03, grad_scale: 32.0 2024-09-23 12:15:48,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=256218.66666666666, ans=0.2 2024-09-23 12:15:59,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=256265.33333333334, ans=0.2 2024-09-23 12:16:17,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2024-09-23 12:16:44,050 INFO [train.py:1198] (0/4) Epoch 15, batch 400, loss[loss=0.2186, ctc_loss=0.1488, cr_loss=0.349, over 17298.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1557, cr_loss=0.3681, over 2921943.76 frames. ], batch size: 49, lr: 8.15e-03, grad_scale: 32.0 2024-09-23 12:16:55,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=256405.33333333334, ans=0.0 2024-09-23 12:16:58,142 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.238e+02 1.377e+02 1.544e+02 2.269e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-23 12:17:44,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=256545.33333333334, ans=0.125 2024-09-23 12:17:49,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256592.0, ans=0.1 2024-09-23 12:17:58,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=256592.0, ans=0.125 2024-09-23 12:18:06,462 INFO [train.py:1198] (0/4) Epoch 15, batch 450, loss[loss=0.2043, ctc_loss=0.1324, cr_loss=0.3591, over 17096.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1556, cr_loss=0.3688, over 3025685.38 frames. ], batch size: 40, lr: 8.15e-03, grad_scale: 32.0 2024-09-23 12:18:08,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=256638.66666666666, ans=0.125 2024-09-23 12:18:17,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.35 vs. limit=15.0 2024-09-23 12:18:18,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=256638.66666666666, ans=0.0 2024-09-23 12:18:24,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256685.33333333334, ans=0.1 2024-09-23 12:18:26,683 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2024-09-23 12:18:29,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=256685.33333333334, ans=0.125 2024-09-23 12:18:45,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=256732.0, ans=0.125 2024-09-23 12:18:45,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=256732.0, ans=0.0 2024-09-23 12:18:47,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=256732.0, ans=0.125 2024-09-23 12:19:06,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-09-23 12:19:27,187 INFO [train.py:1198] (0/4) Epoch 15, batch 500, loss[loss=0.2964, ctc_loss=0.209, cr_loss=0.4372, over 15172.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.156, cr_loss=0.3694, over 3103222.69 frames. ], batch size: 89, lr: 8.14e-03, grad_scale: 32.0 2024-09-23 12:19:41,901 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.305e+02 1.439e+02 1.681e+02 2.242e+02, threshold=2.879e+02, percent-clipped=0.0 2024-09-23 12:19:51,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=256918.66666666666, ans=0.09899494936611666 2024-09-23 12:19:57,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256918.66666666666, ans=0.1 2024-09-23 12:20:14,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=256965.33333333334, ans=0.125 2024-09-23 12:20:16,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=256965.33333333334, ans=0.125 2024-09-23 12:20:24,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-09-23 12:20:28,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-23 12:20:33,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2024-09-23 12:20:55,118 INFO [train.py:1198] (0/4) Epoch 15, batch 550, loss[loss=0.2178, ctc_loss=0.1442, cr_loss=0.3677, over 17066.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1559, cr_loss=0.3693, over 3149848.27 frames. ], batch size: 46, lr: 8.14e-03, grad_scale: 32.0 2024-09-23 12:21:08,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-23 12:21:14,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=22.5 2024-09-23 12:21:17,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=257152.0, ans=0.125 2024-09-23 12:21:54,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257245.33333333334, ans=0.1 2024-09-23 12:22:15,290 INFO [train.py:1198] (0/4) Epoch 15, batch 600, loss[loss=0.2111, ctc_loss=0.1412, cr_loss=0.3492, over 17267.00 frames. ], tot_loss[loss=0.2302, ctc_loss=0.1561, cr_loss=0.3702, over 3202730.69 frames. ], batch size: 42, lr: 8.14e-03, grad_scale: 32.0 2024-09-23 12:22:26,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=257338.66666666666, ans=0.125 2024-09-23 12:22:29,756 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.274e+02 1.386e+02 1.572e+02 2.356e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-23 12:22:37,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257385.33333333334, ans=0.1 2024-09-23 12:22:50,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=257432.0, ans=0.2 2024-09-23 12:23:05,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=257478.66666666666, ans=0.025 2024-09-23 12:23:18,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=257478.66666666666, ans=0.125 2024-09-23 12:23:22,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=257525.33333333334, ans=0.125 2024-09-23 12:23:26,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=257525.33333333334, ans=0.2 2024-09-23 12:23:29,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=12.0 2024-09-23 12:23:33,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=257525.33333333334, ans=0.0 2024-09-23 12:23:38,342 INFO [train.py:1198] (0/4) Epoch 15, batch 650, loss[loss=0.2569, ctc_loss=0.1732, cr_loss=0.4188, over 17215.00 frames. ], tot_loss[loss=0.2308, ctc_loss=0.1566, cr_loss=0.3711, over 3245229.52 frames. ], batch size: 55, lr: 8.13e-03, grad_scale: 32.0 2024-09-23 12:23:47,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-23 12:23:55,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-09-23 12:24:02,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=257618.66666666666, ans=0.125 2024-09-23 12:24:17,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-23 12:25:01,760 INFO [train.py:1198] (0/4) Epoch 15, batch 700, loss[loss=0.2116, ctc_loss=0.1424, cr_loss=0.3464, over 17159.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1568, cr_loss=0.3716, over 3277881.77 frames. ], batch size: 48, lr: 8.13e-03, grad_scale: 32.0 2024-09-23 12:25:01,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=257805.33333333334, ans=10.0 2024-09-23 12:25:18,901 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.271e+02 1.381e+02 1.537e+02 2.206e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-23 12:25:19,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=257852.0, ans=0.0 2024-09-23 12:25:24,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-09-23 12:25:48,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=257898.66666666666, ans=0.0 2024-09-23 12:26:01,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=257945.33333333334, ans=0.0 2024-09-23 12:26:07,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=257945.33333333334, ans=0.0 2024-09-23 12:26:18,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=257992.0, ans=0.025 2024-09-23 12:26:26,613 INFO [train.py:1198] (0/4) Epoch 15, batch 750, loss[loss=0.2234, ctc_loss=0.1502, cr_loss=0.3662, over 17299.00 frames. ], tot_loss[loss=0.2316, ctc_loss=0.1572, cr_loss=0.372, over 3284120.63 frames. ], batch size: 46, lr: 8.12e-03, grad_scale: 32.0 2024-09-23 12:26:38,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=258038.66666666666, ans=0.0 2024-09-23 12:26:56,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2024-09-23 12:27:00,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=258132.0, ans=0.125 2024-09-23 12:27:24,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-09-23 12:27:36,571 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:27:48,955 INFO [train.py:1198] (0/4) Epoch 15, batch 800, loss[loss=0.2435, ctc_loss=0.1662, cr_loss=0.3864, over 16399.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1564, cr_loss=0.3713, over 3305154.06 frames. ], batch size: 66, lr: 8.12e-03, grad_scale: 32.0 2024-09-23 12:28:03,182 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.255e+02 1.392e+02 1.544e+02 3.619e+02, threshold=2.784e+02, percent-clipped=1.0 2024-09-23 12:28:05,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=258318.66666666666, ans=0.025 2024-09-23 12:28:14,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=258318.66666666666, ans=0.125 2024-09-23 12:28:24,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=258365.33333333334, ans=0.0 2024-09-23 12:28:33,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=258365.33333333334, ans=0.0 2024-09-23 12:29:03,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=258458.66666666666, ans=0.125 2024-09-23 12:29:08,257 INFO [train.py:1198] (0/4) Epoch 15, batch 850, loss[loss=0.2204, ctc_loss=0.149, cr_loss=0.3573, over 17227.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1566, cr_loss=0.3715, over 3314102.10 frames. ], batch size: 50, lr: 8.12e-03, grad_scale: 32.0 2024-09-23 12:29:10,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.79 vs. limit=10.0 2024-09-23 12:29:48,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=258598.66666666666, ans=0.0 2024-09-23 12:30:01,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=258645.33333333334, ans=0.125 2024-09-23 12:30:06,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=258645.33333333334, ans=0.0 2024-09-23 12:30:36,046 INFO [train.py:1198] (0/4) Epoch 15, batch 900, loss[loss=0.2163, ctc_loss=0.1433, cr_loss=0.365, over 17097.00 frames. ], tot_loss[loss=0.2325, ctc_loss=0.158, cr_loss=0.3727, over 3314811.52 frames. ], batch size: 49, lr: 8.11e-03, grad_scale: 32.0 2024-09-23 12:30:41,110 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:30:50,309 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.250e+02 1.335e+02 1.491e+02 2.252e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-23 12:31:06,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=258832.0, ans=0.125 2024-09-23 12:31:15,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2024-09-23 12:31:45,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=258925.33333333334, ans=0.5 2024-09-23 12:31:56,135 INFO [train.py:1198] (0/4) Epoch 15, batch 950, loss[loss=0.2165, ctc_loss=0.1437, cr_loss=0.364, over 17266.00 frames. ], tot_loss[loss=0.2311, ctc_loss=0.1569, cr_loss=0.3713, over 3326575.61 frames. ], batch size: 44, lr: 8.11e-03, grad_scale: 16.0 2024-09-23 12:32:06,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=258972.0, ans=0.125 2024-09-23 12:32:07,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=258972.0, ans=0.0 2024-09-23 12:32:08,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.08 vs. limit=10.0 2024-09-23 12:32:34,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-09-23 12:32:37,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=259065.33333333334, ans=0.125 2024-09-23 12:32:54,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=259112.0, ans=0.0 2024-09-23 12:33:08,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=259158.66666666666, ans=0.0 2024-09-23 12:33:17,946 INFO [train.py:1198] (0/4) Epoch 15, batch 1000, loss[loss=0.1751, ctc_loss=0.1148, cr_loss=0.3015, over 17181.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1564, cr_loss=0.3712, over 3339027.60 frames. ], batch size: 41, lr: 8.11e-03, grad_scale: 16.0 2024-09-23 12:33:33,773 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.224e+02 1.330e+02 1.426e+02 2.141e+02, threshold=2.660e+02, percent-clipped=0.0 2024-09-23 12:34:09,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=259345.33333333334, ans=0.125 2024-09-23 12:34:40,677 INFO [train.py:1198] (0/4) Epoch 15, batch 1050, loss[loss=0.2651, ctc_loss=0.1832, cr_loss=0.4094, over 16993.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1557, cr_loss=0.3698, over 3351941.13 frames. ], batch size: 53, lr: 8.10e-03, grad_scale: 16.0 2024-09-23 12:34:54,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=259438.66666666666, ans=0.2 2024-09-23 12:35:21,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-09-23 12:35:58,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=259625.33333333334, ans=0.0 2024-09-23 12:36:05,845 INFO [train.py:1198] (0/4) Epoch 15, batch 1100, loss[loss=0.21, ctc_loss=0.1399, cr_loss=0.3506, over 16960.00 frames. ], tot_loss[loss=0.2293, ctc_loss=0.1556, cr_loss=0.3686, over 3351312.44 frames. ], batch size: 42, lr: 8.10e-03, grad_scale: 16.0 2024-09-23 12:36:21,571 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.305e+02 1.420e+02 1.545e+02 2.157e+02, threshold=2.840e+02, percent-clipped=0.0 2024-09-23 12:36:41,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=259765.33333333334, ans=0.0 2024-09-23 12:36:56,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=259812.0, ans=22.5 2024-09-23 12:37:21,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=259858.66666666666, ans=15.0 2024-09-23 12:37:28,268 INFO [train.py:1198] (0/4) Epoch 15, batch 1150, loss[loss=0.2316, ctc_loss=0.1562, cr_loss=0.3771, over 17225.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.1553, cr_loss=0.3683, over 3358219.15 frames. ], batch size: 50, lr: 8.10e-03, grad_scale: 16.0 2024-09-23 12:37:28,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=259905.33333333334, ans=0.0 2024-09-23 12:37:44,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=259952.0, ans=0.125 2024-09-23 12:37:51,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=259952.0, ans=0.125 2024-09-23 12:38:47,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=260138.66666666666, ans=0.125 2024-09-23 12:38:48,575 INFO [train.py:1198] (0/4) Epoch 15, batch 1200, loss[loss=0.2574, ctc_loss=0.1781, cr_loss=0.3966, over 16094.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.1544, cr_loss=0.3671, over 3368120.70 frames. ], batch size: 74, lr: 8.09e-03, grad_scale: 32.0 2024-09-23 12:38:53,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=260138.66666666666, ans=0.025 2024-09-23 12:39:03,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=260185.33333333334, ans=0.125 2024-09-23 12:39:04,650 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.045e+02 1.247e+02 1.362e+02 1.504e+02 2.311e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-23 12:39:11,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2024-09-23 12:39:30,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=260232.0, ans=0.125 2024-09-23 12:39:42,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=260278.66666666666, ans=0.125 2024-09-23 12:40:00,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=260325.33333333334, ans=0.125 2024-09-23 12:40:13,699 INFO [train.py:1198] (0/4) Epoch 15, batch 1250, loss[loss=0.2594, ctc_loss=0.1785, cr_loss=0.4042, over 16733.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1551, cr_loss=0.3685, over 3362204.87 frames. ], batch size: 61, lr: 8.09e-03, grad_scale: 32.0 2024-09-23 12:40:34,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-09-23 12:41:18,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=260558.66666666666, ans=0.0 2024-09-23 12:41:34,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=260605.33333333334, ans=0.2 2024-09-23 12:41:35,960 INFO [train.py:1198] (0/4) Epoch 15, batch 1300, loss[loss=0.2425, ctc_loss=0.1639, cr_loss=0.3928, over 15960.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1555, cr_loss=0.3693, over 3357868.20 frames. ], batch size: 74, lr: 8.09e-03, grad_scale: 16.0 2024-09-23 12:41:53,388 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.270e+02 1.373e+02 1.516e+02 2.157e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-23 12:42:06,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=260698.66666666666, ans=0.125 2024-09-23 12:42:29,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2024-09-23 12:42:46,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=260792.0, ans=0.0 2024-09-23 12:42:58,449 INFO [train.py:1198] (0/4) Epoch 15, batch 1350, loss[loss=0.2586, ctc_loss=0.1742, cr_loss=0.4223, over 17023.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.155, cr_loss=0.3681, over 3359344.32 frames. ], batch size: 44, lr: 8.08e-03, grad_scale: 16.0 2024-09-23 12:43:08,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=260838.66666666666, ans=0.125 2024-09-23 12:43:09,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=260838.66666666666, ans=0.125 2024-09-23 12:43:11,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=260838.66666666666, ans=0.0 2024-09-23 12:43:17,987 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:43:32,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=260932.0, ans=0.125 2024-09-23 12:43:42,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2024-09-23 12:43:53,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2024-09-23 12:44:01,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=261025.33333333334, ans=0.125 2024-09-23 12:44:18,619 INFO [train.py:1198] (0/4) Epoch 15, batch 1400, loss[loss=0.2432, ctc_loss=0.1658, cr_loss=0.3871, over 17118.00 frames. ], tot_loss[loss=0.2297, ctc_loss=0.1558, cr_loss=0.3695, over 3361270.07 frames. ], batch size: 49, lr: 8.08e-03, grad_scale: 16.0 2024-09-23 12:44:28,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=261072.0, ans=0.0 2024-09-23 12:44:36,421 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.225e+02 1.306e+02 1.392e+02 2.119e+02, threshold=2.612e+02, percent-clipped=0.0 2024-09-23 12:45:13,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=261212.0, ans=0.0 2024-09-23 12:45:15,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=261212.0, ans=0.125 2024-09-23 12:45:22,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=261212.0, ans=0.125 2024-09-23 12:45:40,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=261258.66666666666, ans=0.2 2024-09-23 12:45:46,012 INFO [train.py:1198] (0/4) Epoch 15, batch 1450, loss[loss=0.2492, ctc_loss=0.1693, cr_loss=0.3995, over 17011.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1565, cr_loss=0.3712, over 3356728.58 frames. ], batch size: 53, lr: 8.07e-03, grad_scale: 16.0 2024-09-23 12:45:54,436 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-56000.pt 2024-09-23 12:46:19,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2024-09-23 12:46:23,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=261398.66666666666, ans=0.125 2024-09-23 12:46:28,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=261398.66666666666, ans=0.05 2024-09-23 12:46:28,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=261398.66666666666, ans=0.025 2024-09-23 12:46:38,160 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:46:46,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2024-09-23 12:46:50,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=261492.0, ans=0.07 2024-09-23 12:47:06,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=261538.66666666666, ans=0.0 2024-09-23 12:47:08,081 INFO [train.py:1198] (0/4) Epoch 15, batch 1500, loss[loss=0.2766, ctc_loss=0.1938, cr_loss=0.4143, over 14952.00 frames. ], tot_loss[loss=0.2298, ctc_loss=0.1558, cr_loss=0.3701, over 3364390.97 frames. ], batch size: 89, lr: 8.07e-03, grad_scale: 16.0 2024-09-23 12:47:14,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=261538.66666666666, ans=0.125 2024-09-23 12:47:25,760 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.263e+02 1.374e+02 1.561e+02 5.695e+02, threshold=2.748e+02, percent-clipped=2.0 2024-09-23 12:47:31,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=261585.33333333334, ans=0.125 2024-09-23 12:47:39,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=261585.33333333334, ans=0.125 2024-09-23 12:47:55,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=261632.0, ans=0.125 2024-09-23 12:48:01,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=261678.66666666666, ans=0.07 2024-09-23 12:48:01,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=261678.66666666666, ans=0.07 2024-09-23 12:48:16,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=261725.33333333334, ans=0.0 2024-09-23 12:48:30,897 INFO [train.py:1198] (0/4) Epoch 15, batch 1550, loss[loss=0.2552, ctc_loss=0.1713, cr_loss=0.4194, over 16928.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.155, cr_loss=0.3691, over 3369914.38 frames. ], batch size: 58, lr: 8.07e-03, grad_scale: 16.0 2024-09-23 12:48:31,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261772.0, ans=0.1 2024-09-23 12:48:41,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=12.0 2024-09-23 12:49:00,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2024-09-23 12:49:15,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=261865.33333333334, ans=0.125 2024-09-23 12:49:17,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=261912.0, ans=0.125 2024-09-23 12:49:53,461 INFO [train.py:1198] (0/4) Epoch 15, batch 1600, loss[loss=0.2208, ctc_loss=0.1453, cr_loss=0.3775, over 17090.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.155, cr_loss=0.3688, over 3360986.71 frames. ], batch size: 49, lr: 8.06e-03, grad_scale: 32.0 2024-09-23 12:50:00,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=262005.33333333334, ans=0.125 2024-09-23 12:50:13,468 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.270e+02 1.410e+02 1.627e+02 2.274e+02, threshold=2.820e+02, percent-clipped=0.0 2024-09-23 12:50:15,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=262052.0, ans=0.2 2024-09-23 12:50:28,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=262098.66666666666, ans=0.2 2024-09-23 12:50:32,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=262098.66666666666, ans=0.125 2024-09-23 12:51:17,821 INFO [train.py:1198] (0/4) Epoch 15, batch 1650, loss[loss=0.2927, ctc_loss=0.2026, cr_loss=0.4507, over 14800.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.156, cr_loss=0.37, over 3357504.88 frames. ], batch size: 89, lr: 8.06e-03, grad_scale: 32.0 2024-09-23 12:52:28,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=262425.3333333333, ans=0.0 2024-09-23 12:52:32,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=262425.3333333333, ans=0.125 2024-09-23 12:52:38,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=22.5 2024-09-23 12:52:39,798 INFO [train.py:1198] (0/4) Epoch 15, batch 1700, loss[loss=0.242, ctc_loss=0.1606, cr_loss=0.407, over 17306.00 frames. ], tot_loss[loss=0.23, ctc_loss=0.156, cr_loss=0.3702, over 3368004.43 frames. ], batch size: 49, lr: 8.06e-03, grad_scale: 32.0 2024-09-23 12:52:52,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=262472.0, ans=0.125 2024-09-23 12:52:54,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=262518.6666666667, ans=0.07 2024-09-23 12:52:57,187 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.272e+02 1.393e+02 1.539e+02 2.504e+02, threshold=2.785e+02, percent-clipped=0.0 2024-09-23 12:53:15,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=262565.3333333333, ans=0.125 2024-09-23 12:53:21,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=262565.3333333333, ans=0.125 2024-09-23 12:53:48,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2024-09-23 12:53:51,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2024-09-23 12:53:58,955 INFO [train.py:1198] (0/4) Epoch 15, batch 1750, loss[loss=0.2127, ctc_loss=0.1412, cr_loss=0.3574, over 17291.00 frames. ], tot_loss[loss=0.2295, ctc_loss=0.1554, cr_loss=0.3703, over 3378915.49 frames. ], batch size: 46, lr: 8.05e-03, grad_scale: 32.0 2024-09-23 12:54:11,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=262705.3333333333, ans=0.125 2024-09-23 12:54:13,371 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:54:15,210 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 12:54:21,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=262752.0, ans=0.0 2024-09-23 12:54:44,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262798.6666666667, ans=0.1 2024-09-23 12:54:54,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-23 12:54:59,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=262845.3333333333, ans=0.125 2024-09-23 12:55:15,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=262892.0, ans=0.2 2024-09-23 12:55:21,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=262892.0, ans=0.125 2024-09-23 12:55:26,238 INFO [train.py:1198] (0/4) Epoch 15, batch 1800, loss[loss=0.2502, ctc_loss=0.1687, cr_loss=0.4075, over 17295.00 frames. ], tot_loss[loss=0.2296, ctc_loss=0.1556, cr_loss=0.37, over 3372121.07 frames. ], batch size: 46, lr: 8.05e-03, grad_scale: 32.0 2024-09-23 12:55:43,904 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.283e+02 1.353e+02 1.483e+02 2.243e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-23 12:55:58,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=263032.0, ans=0.125 2024-09-23 12:56:26,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2024-09-23 12:56:35,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=263125.3333333333, ans=0.09899494936611666 2024-09-23 12:56:46,112 INFO [train.py:1198] (0/4) Epoch 15, batch 1850, loss[loss=0.2344, ctc_loss=0.157, cr_loss=0.3873, over 17093.00 frames. ], tot_loss[loss=0.2309, ctc_loss=0.1566, cr_loss=0.3717, over 3363735.23 frames. ], batch size: 49, lr: 8.05e-03, grad_scale: 32.0 2024-09-23 12:56:49,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=263172.0, ans=0.0 2024-09-23 12:56:59,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=263172.0, ans=0.0 2024-09-23 12:57:04,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=263218.6666666667, ans=0.025 2024-09-23 12:57:10,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263218.6666666667, ans=0.1 2024-09-23 12:57:22,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=263265.3333333333, ans=0.0 2024-09-23 12:57:31,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=263265.3333333333, ans=0.125 2024-09-23 12:58:08,529 INFO [train.py:1198] (0/4) Epoch 15, batch 1900, loss[loss=0.2582, ctc_loss=0.1757, cr_loss=0.4125, over 17018.00 frames. ], tot_loss[loss=0.2322, ctc_loss=0.1575, cr_loss=0.3734, over 3359514.35 frames. ], batch size: 52, lr: 8.04e-03, grad_scale: 32.0 2024-09-23 12:58:10,887 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2024-09-23 12:58:16,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=263405.3333333333, ans=0.125 2024-09-23 12:58:17,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=22.5 2024-09-23 12:58:26,134 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.271e+02 1.385e+02 1.549e+02 2.232e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-23 12:58:36,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=263452.0, ans=0.125 2024-09-23 12:58:40,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=263498.6666666667, ans=0.1 2024-09-23 12:59:12,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263592.0, ans=0.1 2024-09-23 12:59:27,839 INFO [train.py:1198] (0/4) Epoch 15, batch 1950, loss[loss=0.2097, ctc_loss=0.1388, cr_loss=0.3542, over 17103.00 frames. ], tot_loss[loss=0.2319, ctc_loss=0.1573, cr_loss=0.373, over 3359378.85 frames. ], batch size: 40, lr: 8.04e-03, grad_scale: 16.0 2024-09-23 12:59:48,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=263685.3333333333, ans=0.125 2024-09-23 13:00:01,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=263685.3333333333, ans=0.2 2024-09-23 13:00:17,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=263732.0, ans=0.2 2024-09-23 13:00:18,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=263732.0, ans=0.0 2024-09-23 13:00:34,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=263778.6666666667, ans=0.025 2024-09-23 13:00:41,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=263825.3333333333, ans=0.025 2024-09-23 13:00:54,762 INFO [train.py:1198] (0/4) Epoch 15, batch 2000, loss[loss=0.2311, ctc_loss=0.1536, cr_loss=0.3875, over 17343.00 frames. ], tot_loss[loss=0.2321, ctc_loss=0.1575, cr_loss=0.3729, over 3355705.83 frames. ], batch size: 48, lr: 8.04e-03, grad_scale: 32.0 2024-09-23 13:01:14,002 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.286e+02 1.410e+02 1.590e+02 2.174e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-23 13:01:23,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=263918.6666666667, ans=0.125 2024-09-23 13:01:44,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=264012.0, ans=0.0 2024-09-23 13:01:57,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-09-23 13:02:04,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=264058.6666666667, ans=0.125 2024-09-23 13:02:16,688 INFO [train.py:1198] (0/4) Epoch 15, batch 2050, loss[loss=0.2508, ctc_loss=0.1728, cr_loss=0.3902, over 16990.00 frames. ], tot_loss[loss=0.2307, ctc_loss=0.1564, cr_loss=0.3712, over 3353159.55 frames. ], batch size: 53, lr: 8.03e-03, grad_scale: 32.0 2024-09-23 13:02:18,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=264105.3333333333, ans=0.0 2024-09-23 13:02:38,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=264152.0, ans=0.0 2024-09-23 13:02:47,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=264198.6666666667, ans=0.09899494936611666 2024-09-23 13:03:14,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=264245.3333333333, ans=0.0 2024-09-23 13:03:23,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=264292.0, ans=0.025 2024-09-23 13:03:36,266 INFO [train.py:1198] (0/4) Epoch 15, batch 2100, loss[loss=0.2445, ctc_loss=0.1671, cr_loss=0.3872, over 16185.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1549, cr_loss=0.3694, over 3361060.44 frames. ], batch size: 74, lr: 8.03e-03, grad_scale: 32.0 2024-09-23 13:03:41,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2024-09-23 13:03:55,211 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.282e+02 1.387e+02 1.573e+02 2.128e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-23 13:03:56,058 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.06 vs. limit=6.0 2024-09-23 13:04:11,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=12.0 2024-09-23 13:04:25,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=264478.6666666667, ans=0.07 2024-09-23 13:04:26,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2024-09-23 13:04:44,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=264525.3333333333, ans=0.125 2024-09-23 13:04:57,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.02 vs. limit=10.0 2024-09-23 13:05:03,757 INFO [train.py:1198] (0/4) Epoch 15, batch 2150, loss[loss=0.2256, ctc_loss=0.1533, cr_loss=0.3618, over 17003.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1535, cr_loss=0.3671, over 3366613.34 frames. ], batch size: 53, lr: 8.03e-03, grad_scale: 32.0 2024-09-23 13:05:17,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=12.0 2024-09-23 13:05:18,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=264618.6666666667, ans=0.09899494936611666 2024-09-23 13:05:36,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=264665.3333333333, ans=0.125 2024-09-23 13:05:42,555 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:05:53,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=264712.0, ans=0.025 2024-09-23 13:06:04,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=264712.0, ans=0.09899494936611666 2024-09-23 13:06:11,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264758.6666666667, ans=0.1 2024-09-23 13:06:21,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=264758.6666666667, ans=0.0 2024-09-23 13:06:22,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=264805.3333333333, ans=0.125 2024-09-23 13:06:23,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-23 13:06:23,715 INFO [train.py:1198] (0/4) Epoch 15, batch 2200, loss[loss=0.2594, ctc_loss=0.1747, cr_loss=0.4237, over 17048.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1532, cr_loss=0.3669, over 3376166.98 frames. ], batch size: 52, lr: 8.02e-03, grad_scale: 32.0 2024-09-23 13:06:32,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=264805.3333333333, ans=0.025 2024-09-23 13:06:42,813 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.291e+02 1.443e+02 1.638e+02 2.419e+02, threshold=2.885e+02, percent-clipped=0.0 2024-09-23 13:06:53,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-23 13:07:16,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.68 vs. limit=22.5 2024-09-23 13:07:46,503 INFO [train.py:1198] (0/4) Epoch 15, batch 2250, loss[loss=0.2567, ctc_loss=0.1829, cr_loss=0.3688, over 11713.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1543, cr_loss=0.368, over 3374786.84 frames. ], batch size: 123, lr: 8.02e-03, grad_scale: 32.0 2024-09-23 13:07:58,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=265038.6666666667, ans=0.05 2024-09-23 13:08:28,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=265132.0, ans=0.125 2024-09-23 13:08:47,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=265178.6666666667, ans=0.5 2024-09-23 13:08:56,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=265225.3333333333, ans=0.0 2024-09-23 13:09:06,709 INFO [train.py:1198] (0/4) Epoch 15, batch 2300, loss[loss=0.2041, ctc_loss=0.1388, cr_loss=0.3261, over 17288.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.153, cr_loss=0.3662, over 3374693.11 frames. ], batch size: 49, lr: 8.02e-03, grad_scale: 16.0 2024-09-23 13:09:20,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2024-09-23 13:09:27,205 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.234e+02 1.309e+02 1.490e+02 3.155e+02, threshold=2.619e+02, percent-clipped=1.0 2024-09-23 13:09:55,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=265365.3333333333, ans=0.2 2024-09-23 13:10:21,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265458.6666666667, ans=0.1 2024-09-23 13:10:33,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=265505.3333333333, ans=0.125 2024-09-23 13:10:34,412 INFO [train.py:1198] (0/4) Epoch 15, batch 2350, loss[loss=0.2234, ctc_loss=0.1468, cr_loss=0.3832, over 17223.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1539, cr_loss=0.3676, over 3362872.09 frames. ], batch size: 50, lr: 8.01e-03, grad_scale: 16.0 2024-09-23 13:10:39,753 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2024-09-23 13:10:49,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=265552.0, ans=0.125 2024-09-23 13:11:12,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=265598.6666666667, ans=0.125 2024-09-23 13:11:21,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=265645.3333333333, ans=0.125 2024-09-23 13:11:30,165 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:11:43,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2024-09-23 13:11:53,692 INFO [train.py:1198] (0/4) Epoch 15, batch 2400, loss[loss=0.251, ctc_loss=0.1762, cr_loss=0.374, over 17179.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.1543, cr_loss=0.3683, over 3368136.37 frames. ], batch size: 55, lr: 8.01e-03, grad_scale: 16.0 2024-09-23 13:11:58,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=265738.6666666667, ans=0.125 2024-09-23 13:12:18,954 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.338e+02 1.450e+02 1.574e+02 3.453e+02, threshold=2.900e+02, percent-clipped=1.0 2024-09-23 13:12:24,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265785.3333333333, ans=0.1 2024-09-23 13:12:28,479 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-09-23 13:12:36,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=265832.0, ans=0.125 2024-09-23 13:13:02,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=265925.3333333333, ans=0.05 2024-09-23 13:13:16,577 INFO [train.py:1198] (0/4) Epoch 15, batch 2450, loss[loss=0.1964, ctc_loss=0.1285, cr_loss=0.3391, over 17277.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.153, cr_loss=0.3663, over 3364479.80 frames. ], batch size: 42, lr: 8.00e-03, grad_scale: 16.0 2024-09-23 13:13:34,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=266018.6666666667, ans=0.1 2024-09-23 13:14:24,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-09-23 13:14:38,833 INFO [train.py:1198] (0/4) Epoch 15, batch 2500, loss[loss=0.2487, ctc_loss=0.171, cr_loss=0.3888, over 17011.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1535, cr_loss=0.3672, over 3354927.32 frames. ], batch size: 53, lr: 8.00e-03, grad_scale: 16.0 2024-09-23 13:14:41,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-23 13:14:56,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266205.3333333333, ans=0.1 2024-09-23 13:15:01,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=266252.0, ans=0.0 2024-09-23 13:15:04,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=266252.0, ans=0.125 2024-09-23 13:15:05,875 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.240e+02 1.331e+02 1.479e+02 2.623e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-23 13:15:30,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=266345.3333333333, ans=0.5 2024-09-23 13:15:39,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=266345.3333333333, ans=0.025 2024-09-23 13:16:03,518 INFO [train.py:1198] (0/4) Epoch 15, batch 2550, loss[loss=0.2296, ctc_loss=0.1531, cr_loss=0.3826, over 17309.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1536, cr_loss=0.3671, over 3355875.47 frames. ], batch size: 51, lr: 8.00e-03, grad_scale: 16.0 2024-09-23 13:16:17,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=266485.3333333333, ans=0.0 2024-09-23 13:16:26,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=12.0 2024-09-23 13:16:29,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=266485.3333333333, ans=0.07 2024-09-23 13:17:08,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=266625.3333333333, ans=0.125 2024-09-23 13:17:25,064 INFO [train.py:1198] (0/4) Epoch 15, batch 2600, loss[loss=0.2307, ctc_loss=0.1548, cr_loss=0.3798, over 16741.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1533, cr_loss=0.3662, over 3356041.17 frames. ], batch size: 61, lr: 7.99e-03, grad_scale: 16.0 2024-09-23 13:17:44,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=266718.6666666667, ans=0.125 2024-09-23 13:17:47,325 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.291e+02 1.396e+02 1.519e+02 2.239e+02, threshold=2.791e+02, percent-clipped=0.0 2024-09-23 13:17:52,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=266718.6666666667, ans=0.125 2024-09-23 13:18:16,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=266812.0, ans=0.125 2024-09-23 13:18:30,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=266858.6666666667, ans=0.0 2024-09-23 13:18:37,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=266858.6666666667, ans=0.0 2024-09-23 13:18:44,567 INFO [train.py:1198] (0/4) Epoch 15, batch 2650, loss[loss=0.2404, ctc_loss=0.1629, cr_loss=0.3875, over 16868.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1548, cr_loss=0.368, over 3340854.07 frames. ], batch size: 58, lr: 7.99e-03, grad_scale: 16.0 2024-09-23 13:18:54,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=266905.3333333333, ans=0.0 2024-09-23 13:19:26,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=266998.6666666667, ans=0.0 2024-09-23 13:19:44,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=267045.3333333333, ans=0.125 2024-09-23 13:19:57,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=267092.0, ans=0.125 2024-09-23 13:20:04,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=267092.0, ans=0.125 2024-09-23 13:20:12,066 INFO [train.py:1198] (0/4) Epoch 15, batch 2700, loss[loss=0.2137, ctc_loss=0.143, cr_loss=0.3534, over 17159.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1537, cr_loss=0.3673, over 3348132.18 frames. ], batch size: 45, lr: 7.99e-03, grad_scale: 16.0 2024-09-23 13:20:17,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2024-09-23 13:20:25,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=267138.6666666667, ans=15.0 2024-09-23 13:20:29,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2024-09-23 13:20:33,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267185.3333333333, ans=0.1 2024-09-23 13:20:33,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=267185.3333333333, ans=0.125 2024-09-23 13:20:34,473 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.269e+02 1.346e+02 1.479e+02 2.065e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-23 13:20:44,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=267232.0, ans=0.09899494936611666 2024-09-23 13:20:49,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-09-23 13:20:57,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=267232.0, ans=0.0 2024-09-23 13:21:24,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=267325.3333333333, ans=0.0 2024-09-23 13:21:31,862 INFO [train.py:1198] (0/4) Epoch 15, batch 2750, loss[loss=0.1921, ctc_loss=0.1298, cr_loss=0.3111, over 17262.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1549, cr_loss=0.3686, over 3345774.66 frames. ], batch size: 42, lr: 7.98e-03, grad_scale: 16.0 2024-09-23 13:21:35,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=267372.0, ans=0.0 2024-09-23 13:21:41,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=267372.0, ans=0.125 2024-09-23 13:21:57,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=267418.6666666667, ans=0.0 2024-09-23 13:22:21,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=12.0 2024-09-23 13:22:25,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=267512.0, ans=0.2 2024-09-23 13:22:27,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=267512.0, ans=0.1 2024-09-23 13:22:53,942 INFO [train.py:1198] (0/4) Epoch 15, batch 2800, loss[loss=0.2198, ctc_loss=0.1456, cr_loss=0.3709, over 17046.00 frames. ], tot_loss[loss=0.2285, ctc_loss=0.1548, cr_loss=0.3684, over 3346329.47 frames. ], batch size: 39, lr: 7.98e-03, grad_scale: 32.0 2024-09-23 13:23:17,756 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.278e+02 1.476e+02 1.758e+02 2.429e+02, threshold=2.952e+02, percent-clipped=0.0 2024-09-23 13:23:59,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=267792.0, ans=0.125 2024-09-23 13:24:13,375 INFO [train.py:1198] (0/4) Epoch 15, batch 2850, loss[loss=0.2626, ctc_loss=0.1754, cr_loss=0.4363, over 16958.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1546, cr_loss=0.3686, over 3358858.35 frames. ], batch size: 58, lr: 7.98e-03, grad_scale: 16.0 2024-09-23 13:24:31,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=267885.3333333333, ans=0.125 2024-09-23 13:24:31,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2024-09-23 13:25:03,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=267932.0, ans=0.1 2024-09-23 13:25:06,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=267932.0, ans=0.125 2024-09-23 13:25:24,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268025.3333333333, ans=0.1 2024-09-23 13:25:41,409 INFO [train.py:1198] (0/4) Epoch 15, batch 2900, loss[loss=0.2048, ctc_loss=0.1385, cr_loss=0.3315, over 17189.00 frames. ], tot_loss[loss=0.2286, ctc_loss=0.1548, cr_loss=0.3689, over 3360217.14 frames. ], batch size: 41, lr: 7.97e-03, grad_scale: 16.0 2024-09-23 13:25:54,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=268072.0, ans=0.125 2024-09-23 13:26:05,969 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.264e+02 1.372e+02 1.552e+02 2.806e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-23 13:26:07,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=268118.6666666667, ans=0.125 2024-09-23 13:26:14,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=268165.3333333333, ans=0.0 2024-09-23 13:26:18,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-09-23 13:26:35,173 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:26:51,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=268258.6666666667, ans=22.5 2024-09-23 13:27:03,975 INFO [train.py:1198] (0/4) Epoch 15, batch 2950, loss[loss=0.2283, ctc_loss=0.1515, cr_loss=0.3838, over 17187.00 frames. ], tot_loss[loss=0.2291, ctc_loss=0.1552, cr_loss=0.3697, over 3360135.36 frames. ], batch size: 47, lr: 7.97e-03, grad_scale: 16.0 2024-09-23 13:27:04,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-09-23 13:27:13,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=268305.3333333333, ans=0.125 2024-09-23 13:27:21,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=268352.0, ans=0.0 2024-09-23 13:27:23,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=268352.0, ans=0.07 2024-09-23 13:28:15,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=268492.0, ans=0.035 2024-09-23 13:28:23,347 INFO [train.py:1198] (0/4) Epoch 15, batch 3000, loss[loss=0.2432, ctc_loss=0.1613, cr_loss=0.4095, over 17010.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1553, cr_loss=0.3703, over 3358772.15 frames. ], batch size: 53, lr: 7.97e-03, grad_scale: 16.0 2024-09-23 13:28:23,348 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 13:28:38,972 INFO [train.py:1230] (0/4) Epoch 15, validation: loss=0.04166, ctc_loss=0.04166, cr_loss=7.464e-15, over 944034.00 frames. 2024-09-23 13:28:38,973 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 13:29:02,498 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.290e+02 1.376e+02 1.476e+02 2.234e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-23 13:29:16,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268632.0, ans=0.1 2024-09-23 13:29:20,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=22.5 2024-09-23 13:29:27,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=268678.6666666667, ans=0.0 2024-09-23 13:29:38,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=268678.6666666667, ans=0.125 2024-09-23 13:29:56,912 INFO [train.py:1198] (0/4) Epoch 15, batch 3050, loss[loss=0.2004, ctc_loss=0.1369, cr_loss=0.3175, over 17179.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1553, cr_loss=0.3702, over 3359935.70 frames. ], batch size: 41, lr: 7.96e-03, grad_scale: 16.0 2024-09-23 13:29:58,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=268772.0, ans=0.1 2024-09-23 13:30:09,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=268772.0, ans=0.0 2024-09-23 13:30:22,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=268818.6666666667, ans=0.1 2024-09-23 13:30:22,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=268818.6666666667, ans=0.0 2024-09-23 13:30:46,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=268865.3333333333, ans=0.0 2024-09-23 13:31:04,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.49 vs. limit=10.0 2024-09-23 13:31:12,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=268958.6666666667, ans=0.2 2024-09-23 13:31:22,606 INFO [train.py:1198] (0/4) Epoch 15, batch 3100, loss[loss=0.2604, ctc_loss=0.1747, cr_loss=0.4285, over 15958.00 frames. ], tot_loss[loss=0.2294, ctc_loss=0.1555, cr_loss=0.3694, over 3348519.11 frames. ], batch size: 74, lr: 7.96e-03, grad_scale: 16.0 2024-09-23 13:31:30,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=269005.3333333333, ans=0.0 2024-09-23 13:31:46,079 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.323e+02 1.397e+02 1.534e+02 2.167e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 13:32:02,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269098.6666666667, ans=0.1 2024-09-23 13:32:08,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=269145.3333333333, ans=0.125 2024-09-23 13:32:41,150 INFO [train.py:1198] (0/4) Epoch 15, batch 3150, loss[loss=0.2334, ctc_loss=0.1555, cr_loss=0.3892, over 16972.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1545, cr_loss=0.3693, over 3351088.54 frames. ], batch size: 53, lr: 7.96e-03, grad_scale: 16.0 2024-09-23 13:32:43,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=269238.6666666667, ans=0.125 2024-09-23 13:32:58,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269285.3333333333, ans=0.1 2024-09-23 13:33:12,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=269332.0, ans=0.025 2024-09-23 13:33:12,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=269332.0, ans=0.0 2024-09-23 13:33:36,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-09-23 13:33:48,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=269425.3333333333, ans=0.125 2024-09-23 13:33:51,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=269425.3333333333, ans=0.025 2024-09-23 13:33:59,102 INFO [train.py:1198] (0/4) Epoch 15, batch 3200, loss[loss=0.2308, ctc_loss=0.1552, cr_loss=0.3779, over 17363.00 frames. ], tot_loss[loss=0.2284, ctc_loss=0.1546, cr_loss=0.3691, over 3356042.78 frames. ], batch size: 48, lr: 7.95e-03, grad_scale: 32.0 2024-09-23 13:34:12,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-09-23 13:34:18,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=269518.6666666667, ans=0.2 2024-09-23 13:34:19,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=269518.6666666667, ans=0.95 2024-09-23 13:34:22,376 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.268e+02 1.359e+02 1.495e+02 2.696e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-23 13:34:32,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2024-09-23 13:34:39,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=269565.3333333333, ans=0.0 2024-09-23 13:34:41,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-09-23 13:34:49,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=269612.0, ans=0.0 2024-09-23 13:34:51,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=22.5 2024-09-23 13:35:08,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=269658.6666666667, ans=0.125 2024-09-23 13:35:14,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-09-23 13:35:17,026 INFO [train.py:1198] (0/4) Epoch 15, batch 3250, loss[loss=0.2448, ctc_loss=0.1682, cr_loss=0.3826, over 17117.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1534, cr_loss=0.3673, over 3354997.84 frames. ], batch size: 49, lr: 7.95e-03, grad_scale: 32.0 2024-09-23 13:35:22,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=269705.3333333333, ans=10.0 2024-09-23 13:35:27,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=269705.3333333333, ans=0.0 2024-09-23 13:35:40,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=269752.0, ans=15.0 2024-09-23 13:35:57,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269798.6666666667, ans=0.1 2024-09-23 13:36:05,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=269845.3333333333, ans=15.0 2024-09-23 13:36:10,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-23 13:36:35,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=269892.0, ans=0.1 2024-09-23 13:36:37,978 INFO [train.py:1198] (0/4) Epoch 15, batch 3300, loss[loss=0.2075, ctc_loss=0.1368, cr_loss=0.3533, over 17028.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1536, cr_loss=0.3679, over 3363832.25 frames. ], batch size: 44, lr: 7.95e-03, grad_scale: 32.0 2024-09-23 13:36:47,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=269938.6666666667, ans=0.0 2024-09-23 13:36:49,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=269938.6666666667, ans=0.125 2024-09-23 13:36:56,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-09-23 13:36:58,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=269985.3333333333, ans=0.125 2024-09-23 13:37:01,532 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.277e+02 1.362e+02 1.536e+02 4.994e+02, threshold=2.723e+02, percent-clipped=1.0 2024-09-23 13:37:06,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=269985.3333333333, ans=0.125 2024-09-23 13:37:14,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=270032.0, ans=0.125 2024-09-23 13:37:31,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=270078.6666666667, ans=0.125 2024-09-23 13:37:33,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-09-23 13:37:47,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=270125.3333333333, ans=0.0 2024-09-23 13:37:48,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=270125.3333333333, ans=0.09899494936611666 2024-09-23 13:37:49,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=270125.3333333333, ans=0.0 2024-09-23 13:37:55,867 INFO [train.py:1198] (0/4) Epoch 15, batch 3350, loss[loss=0.2778, ctc_loss=0.1915, cr_loss=0.4315, over 17226.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1541, cr_loss=0.3683, over 3362770.69 frames. ], batch size: 55, lr: 7.94e-03, grad_scale: 32.0 2024-09-23 13:38:00,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=270172.0, ans=0.1 2024-09-23 13:38:30,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270265.3333333333, ans=0.1 2024-09-23 13:38:33,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=270265.3333333333, ans=0.025 2024-09-23 13:38:39,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=270265.3333333333, ans=0.025 2024-09-23 13:39:04,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=270358.6666666667, ans=0.09899494936611666 2024-09-23 13:39:14,019 INFO [train.py:1198] (0/4) Epoch 15, batch 3400, loss[loss=0.2647, ctc_loss=0.1834, cr_loss=0.4066, over 16998.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1544, cr_loss=0.3692, over 3371159.42 frames. ], batch size: 53, lr: 7.94e-03, grad_scale: 32.0 2024-09-23 13:39:15,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=270405.3333333333, ans=0.0 2024-09-23 13:39:17,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=270405.3333333333, ans=0.125 2024-09-23 13:39:23,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=270405.3333333333, ans=0.125 2024-09-23 13:39:37,360 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.273e+02 1.371e+02 1.521e+02 2.083e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-23 13:39:59,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=270545.3333333333, ans=0.125 2024-09-23 13:40:26,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=270592.0, ans=0.125 2024-09-23 13:40:32,269 INFO [train.py:1198] (0/4) Epoch 15, batch 3450, loss[loss=0.2834, ctc_loss=0.1933, cr_loss=0.4505, over 16983.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1538, cr_loss=0.3683, over 3374284.38 frames. ], batch size: 56, lr: 7.94e-03, grad_scale: 32.0 2024-09-23 13:41:36,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=270778.6666666667, ans=0.025 2024-09-23 13:41:38,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=270778.6666666667, ans=0.125 2024-09-23 13:41:39,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=270825.3333333333, ans=0.2 2024-09-23 13:41:52,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=270825.3333333333, ans=0.125 2024-09-23 13:41:52,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=270825.3333333333, ans=0.125 2024-09-23 13:41:56,575 INFO [train.py:1198] (0/4) Epoch 15, batch 3500, loss[loss=0.2138, ctc_loss=0.147, cr_loss=0.3337, over 16922.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1546, cr_loss=0.3682, over 3366234.86 frames. ], batch size: 58, lr: 7.93e-03, grad_scale: 32.0 2024-09-23 13:42:00,157 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:42:19,966 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.268e+02 1.406e+02 1.575e+02 2.426e+02, threshold=2.811e+02, percent-clipped=0.0 2024-09-23 13:42:26,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=270965.3333333333, ans=0.125 2024-09-23 13:42:41,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=270965.3333333333, ans=0.0 2024-09-23 13:42:49,170 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:43:04,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=271058.6666666667, ans=0.0 2024-09-23 13:43:06,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=271058.6666666667, ans=0.125 2024-09-23 13:43:15,426 INFO [train.py:1198] (0/4) Epoch 15, batch 3550, loss[loss=0.2065, ctc_loss=0.1351, cr_loss=0.3567, over 17031.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1546, cr_loss=0.3684, over 3368115.99 frames. ], batch size: 39, lr: 7.93e-03, grad_scale: 32.0 2024-09-23 13:43:23,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271105.3333333333, ans=0.1 2024-09-23 13:43:28,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=271105.3333333333, ans=0.125 2024-09-23 13:43:45,352 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:43:56,405 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 13:44:29,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=271292.0, ans=0.125 2024-09-23 13:44:33,808 INFO [train.py:1198] (0/4) Epoch 15, batch 3600, loss[loss=0.2503, ctc_loss=0.1715, cr_loss=0.3941, over 17244.00 frames. ], tot_loss[loss=0.2288, ctc_loss=0.1549, cr_loss=0.3694, over 3371899.20 frames. ], batch size: 50, lr: 7.93e-03, grad_scale: 32.0 2024-09-23 13:44:40,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=271338.6666666667, ans=0.0 2024-09-23 13:44:41,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2024-09-23 13:44:48,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=271385.3333333333, ans=0.95 2024-09-23 13:44:56,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-23 13:44:57,246 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.288e+02 1.401e+02 1.561e+02 2.194e+02, threshold=2.802e+02, percent-clipped=0.0 2024-09-23 13:44:59,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271385.3333333333, ans=0.1 2024-09-23 13:45:18,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.49 vs. limit=10.0 2024-09-23 13:45:51,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-09-23 13:45:53,865 INFO [train.py:1198] (0/4) Epoch 15, batch 3650, loss[loss=0.2601, ctc_loss=0.1757, cr_loss=0.4222, over 17207.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.154, cr_loss=0.3689, over 3366652.55 frames. ], batch size: 55, lr: 7.92e-03, grad_scale: 16.0 2024-09-23 13:46:17,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=271618.6666666667, ans=0.125 2024-09-23 13:46:29,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=271665.3333333333, ans=0.2 2024-09-23 13:46:34,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=271665.3333333333, ans=0.125 2024-09-23 13:46:47,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=271712.0, ans=6.0 2024-09-23 13:46:51,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=271712.0, ans=0.07 2024-09-23 13:47:12,694 INFO [train.py:1198] (0/4) Epoch 15, batch 3700, loss[loss=0.2023, ctc_loss=0.1327, cr_loss=0.3479, over 16963.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1541, cr_loss=0.3695, over 3369800.03 frames. ], batch size: 42, lr: 7.92e-03, grad_scale: 16.0 2024-09-23 13:47:33,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=271852.0, ans=0.0 2024-09-23 13:47:37,539 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.268e+02 1.357e+02 1.522e+02 2.745e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-23 13:47:57,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=22.5 2024-09-23 13:48:31,496 INFO [train.py:1198] (0/4) Epoch 15, batch 3750, loss[loss=0.2645, ctc_loss=0.186, cr_loss=0.3926, over 15033.00 frames. ], tot_loss[loss=0.2282, ctc_loss=0.1543, cr_loss=0.3691, over 3351971.21 frames. ], batch size: 89, lr: 7.92e-03, grad_scale: 16.0 2024-09-23 13:48:33,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=272038.6666666667, ans=0.0 2024-09-23 13:48:45,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=272085.3333333333, ans=0.1 2024-09-23 13:49:30,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=272178.6666666667, ans=0.0 2024-09-23 13:49:50,618 INFO [train.py:1198] (0/4) Epoch 15, batch 3800, loss[loss=0.2511, ctc_loss=0.1698, cr_loss=0.4064, over 17322.00 frames. ], tot_loss[loss=0.227, ctc_loss=0.1534, cr_loss=0.3678, over 3353177.46 frames. ], batch size: 51, lr: 7.91e-03, grad_scale: 16.0 2024-09-23 13:49:52,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=272272.0, ans=0.04949747468305833 2024-09-23 13:49:52,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=272272.0, ans=0.125 2024-09-23 13:49:53,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=272272.0, ans=0.125 2024-09-23 13:50:16,159 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.263e+02 1.390e+02 1.534e+02 1.887e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-23 13:50:30,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=272365.3333333333, ans=0.1 2024-09-23 13:50:47,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=272412.0, ans=0.025 2024-09-23 13:51:01,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=272458.6666666667, ans=15.0 2024-09-23 13:51:11,676 INFO [train.py:1198] (0/4) Epoch 15, batch 3850, loss[loss=0.2587, ctc_loss=0.1844, cr_loss=0.3715, over 11650.00 frames. ], tot_loss[loss=0.228, ctc_loss=0.1546, cr_loss=0.3672, over 3306787.03 frames. ], batch size: 123, lr: 7.91e-03, grad_scale: 16.0 2024-09-23 13:52:03,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272645.3333333333, ans=0.0 2024-09-23 13:52:21,305 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-15.pt 2024-09-23 13:53:14,135 INFO [train.py:1198] (0/4) Epoch 16, batch 0, loss[loss=0.2525, ctc_loss=0.1719, cr_loss=0.403, over 17036.00 frames. ], tot_loss[loss=0.2525, ctc_loss=0.1719, cr_loss=0.403, over 17036.00 frames. ], batch size: 56, lr: 7.65e-03, grad_scale: 32.0 2024-09-23 13:53:14,136 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 13:53:30,202 INFO [train.py:1230] (0/4) Epoch 16, validation: loss=0.04222, ctc_loss=0.04222, cr_loss=7.738e-15, over 944034.00 frames. 2024-09-23 13:53:30,203 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 13:53:36,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=272720.0, ans=0.09899494936611666 2024-09-23 13:53:39,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=272720.0, ans=0.0 2024-09-23 13:53:40,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=272720.0, ans=0.125 2024-09-23 13:54:02,002 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.456e+02 1.611e+02 1.770e+02 2.340e+02, threshold=3.223e+02, percent-clipped=0.0 2024-09-23 13:54:39,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=272906.6666666667, ans=0.125 2024-09-23 13:54:42,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272906.6666666667, ans=0.125 2024-09-23 13:54:50,286 INFO [train.py:1198] (0/4) Epoch 16, batch 50, loss[loss=0.238, ctc_loss=0.1627, cr_loss=0.3767, over 17284.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.152, cr_loss=0.3608, over 750706.90 frames. ], batch size: 51, lr: 7.65e-03, grad_scale: 32.0 2024-09-23 13:55:14,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=273000.0, ans=0.0 2024-09-23 13:55:33,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=273046.6666666667, ans=0.125 2024-09-23 13:56:09,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=12.0 2024-09-23 13:56:09,766 INFO [train.py:1198] (0/4) Epoch 16, batch 100, loss[loss=0.2206, ctc_loss=0.1522, cr_loss=0.3416, over 17172.00 frames. ], tot_loss[loss=0.2245, ctc_loss=0.1518, cr_loss=0.3634, over 1332191.10 frames. ], batch size: 45, lr: 7.65e-03, grad_scale: 32.0 2024-09-23 13:56:11,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=273186.6666666667, ans=0.125 2024-09-23 13:56:31,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=273233.3333333333, ans=0.125 2024-09-23 13:56:51,568 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.239e+02 1.320e+02 1.510e+02 1.777e+02, threshold=2.639e+02, percent-clipped=0.0 2024-09-23 13:57:17,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=273326.6666666667, ans=0.125 2024-09-23 13:57:18,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=273326.6666666667, ans=0.125 2024-09-23 13:57:29,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=273373.3333333333, ans=0.125 2024-09-23 13:57:36,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=273373.3333333333, ans=0.0 2024-09-23 13:57:38,812 INFO [train.py:1198] (0/4) Epoch 16, batch 150, loss[loss=0.2009, ctc_loss=0.1344, cr_loss=0.3327, over 17291.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1516, cr_loss=0.3653, over 1788498.53 frames. ], batch size: 46, lr: 7.64e-03, grad_scale: 32.0 2024-09-23 13:58:19,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=273513.3333333333, ans=0.0 2024-09-23 13:58:49,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=273606.6666666667, ans=0.125 2024-09-23 13:58:55,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-09-23 13:58:59,126 INFO [train.py:1198] (0/4) Epoch 16, batch 200, loss[loss=0.2131, ctc_loss=0.142, cr_loss=0.3556, over 17138.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1513, cr_loss=0.3658, over 2142263.45 frames. ], batch size: 45, lr: 7.64e-03, grad_scale: 32.0 2024-09-23 13:58:59,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=273653.3333333333, ans=0.0 2024-09-23 13:59:13,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-23 13:59:15,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=273700.0, ans=0.0 2024-09-23 13:59:31,013 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.259e+02 1.374e+02 1.473e+02 2.348e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-23 13:59:31,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=273746.6666666667, ans=0.125 2024-09-23 13:59:36,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=273746.6666666667, ans=0.0 2024-09-23 13:59:55,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=273793.3333333333, ans=0.0 2024-09-23 14:00:07,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2024-09-23 14:00:12,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=273840.0, ans=0.125 2024-09-23 14:00:18,771 INFO [train.py:1198] (0/4) Epoch 16, batch 250, loss[loss=0.2284, ctc_loss=0.1534, cr_loss=0.3754, over 16863.00 frames. ], tot_loss[loss=0.2246, ctc_loss=0.1515, cr_loss=0.3653, over 2412906.59 frames. ], batch size: 58, lr: 7.64e-03, grad_scale: 32.0 2024-09-23 14:00:22,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=273886.6666666667, ans=0.125 2024-09-23 14:00:22,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=273886.6666666667, ans=0.125 2024-09-23 14:00:31,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=273886.6666666667, ans=0.1 2024-09-23 14:00:46,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=273933.3333333333, ans=0.0 2024-09-23 14:01:46,160 INFO [train.py:1198] (0/4) Epoch 16, batch 300, loss[loss=0.2279, ctc_loss=0.1527, cr_loss=0.3758, over 17298.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1519, cr_loss=0.3658, over 2628622.54 frames. ], batch size: 51, lr: 7.63e-03, grad_scale: 32.0 2024-09-23 14:01:52,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.94 vs. limit=10.0 2024-09-23 14:01:57,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2024-09-23 14:02:22,714 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.271e+02 1.353e+02 1.525e+02 2.781e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-23 14:02:23,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=61.31 vs. limit=15.0 2024-09-23 14:02:58,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=274306.6666666667, ans=0.125 2024-09-23 14:03:01,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=274306.6666666667, ans=0.125 2024-09-23 14:03:08,655 INFO [train.py:1198] (0/4) Epoch 16, batch 350, loss[loss=0.2475, ctc_loss=0.1674, cr_loss=0.4004, over 17014.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.153, cr_loss=0.3677, over 2785518.40 frames. ], batch size: 51, lr: 7.63e-03, grad_scale: 16.0 2024-09-23 14:03:12,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-09-23 14:03:14,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=22.5 2024-09-23 14:03:17,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=22.5 2024-09-23 14:03:35,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=274400.0, ans=0.0 2024-09-23 14:03:55,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-23 14:04:03,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=274493.3333333333, ans=0.125 2024-09-23 14:04:07,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=274493.3333333333, ans=0.125 2024-09-23 14:04:28,535 INFO [train.py:1198] (0/4) Epoch 16, batch 400, loss[loss=0.1773, ctc_loss=0.1158, cr_loss=0.3074, over 16343.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1528, cr_loss=0.3669, over 2914834.27 frames. ], batch size: 36, lr: 7.63e-03, grad_scale: 32.0 2024-09-23 14:04:44,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=274633.3333333333, ans=0.0 2024-09-23 14:04:59,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=274680.0, ans=0.125 2024-09-23 14:05:01,857 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.271e+02 1.344e+02 1.500e+02 2.841e+02, threshold=2.689e+02, percent-clipped=1.0 2024-09-23 14:05:42,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=274773.3333333333, ans=0.025 2024-09-23 14:05:45,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2024-09-23 14:05:47,970 INFO [train.py:1198] (0/4) Epoch 16, batch 450, loss[loss=0.1993, ctc_loss=0.1346, cr_loss=0.3234, over 17014.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1524, cr_loss=0.3668, over 3020414.83 frames. ], batch size: 44, lr: 7.62e-03, grad_scale: 32.0 2024-09-23 14:06:15,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=274866.6666666667, ans=0.0 2024-09-23 14:06:33,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-23 14:06:59,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=275006.6666666667, ans=0.125 2024-09-23 14:07:05,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=15.0 2024-09-23 14:07:11,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-23 14:07:12,877 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:07:12,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=275006.6666666667, ans=0.0 2024-09-23 14:07:15,704 INFO [train.py:1198] (0/4) Epoch 16, batch 500, loss[loss=0.2836, ctc_loss=0.2088, cr_loss=0.3739, over 11804.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1534, cr_loss=0.3687, over 3092909.99 frames. ], batch size: 123, lr: 7.62e-03, grad_scale: 16.0 2024-09-23 14:07:31,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2024-09-23 14:07:33,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=275100.0, ans=0.125 2024-09-23 14:07:51,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=275146.6666666667, ans=0.035 2024-09-23 14:07:52,532 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.254e+02 1.336e+02 1.469e+02 8.615e+02, threshold=2.672e+02, percent-clipped=1.0 2024-09-23 14:08:13,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=275193.3333333333, ans=0.125 2024-09-23 14:08:28,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=275240.0, ans=0.2 2024-09-23 14:08:36,361 INFO [train.py:1198] (0/4) Epoch 16, batch 550, loss[loss=0.275, ctc_loss=0.1906, cr_loss=0.4222, over 17037.00 frames. ], tot_loss[loss=0.2279, ctc_loss=0.154, cr_loss=0.3693, over 3147899.72 frames. ], batch size: 52, lr: 7.62e-03, grad_scale: 8.0 2024-09-23 14:08:38,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=275286.6666666667, ans=0.125 2024-09-23 14:08:39,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=275286.6666666667, ans=0.0 2024-09-23 14:08:43,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-23 14:08:47,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=275286.6666666667, ans=0.125 2024-09-23 14:08:54,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=275333.3333333333, ans=0.125 2024-09-23 14:09:06,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=275380.0, ans=0.1 2024-09-23 14:09:19,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=275380.0, ans=0.025 2024-09-23 14:09:56,356 INFO [train.py:1198] (0/4) Epoch 16, batch 600, loss[loss=0.2583, ctc_loss=0.1785, cr_loss=0.3992, over 16917.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1532, cr_loss=0.3678, over 3196939.87 frames. ], batch size: 58, lr: 7.61e-03, grad_scale: 8.0 2024-09-23 14:10:10,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=275566.6666666667, ans=0.0 2024-09-23 14:10:32,682 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.280e+02 1.400e+02 1.549e+02 2.106e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-23 14:11:21,419 INFO [train.py:1198] (0/4) Epoch 16, batch 650, loss[loss=0.2309, ctc_loss=0.1551, cr_loss=0.3792, over 17016.00 frames. ], tot_loss[loss=0.2269, ctc_loss=0.1531, cr_loss=0.3689, over 3238417.17 frames. ], batch size: 44, lr: 7.61e-03, grad_scale: 8.0 2024-09-23 14:11:26,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=275753.3333333333, ans=0.0 2024-09-23 14:11:31,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=275753.3333333333, ans=0.025 2024-09-23 14:11:51,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=275800.0, ans=0.125 2024-09-23 14:12:29,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=275940.0, ans=0.125 2024-09-23 14:12:43,639 INFO [train.py:1198] (0/4) Epoch 16, batch 700, loss[loss=0.2379, ctc_loss=0.1609, cr_loss=0.385, over 17232.00 frames. ], tot_loss[loss=0.2283, ctc_loss=0.1543, cr_loss=0.3698, over 3260048.98 frames. ], batch size: 50, lr: 7.61e-03, grad_scale: 8.0 2024-09-23 14:12:44,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=275986.6666666667, ans=0.125 2024-09-23 14:12:57,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=22.5 2024-09-23 14:12:58,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=276033.3333333333, ans=0.125 2024-09-23 14:13:20,907 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.263e+02 1.367e+02 1.500e+02 2.228e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-23 14:13:54,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=276173.3333333333, ans=0.125 2024-09-23 14:13:56,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=276173.3333333333, ans=0.125 2024-09-23 14:14:03,949 INFO [train.py:1198] (0/4) Epoch 16, batch 750, loss[loss=0.2153, ctc_loss=0.1422, cr_loss=0.3657, over 17291.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1542, cr_loss=0.3694, over 3279787.53 frames. ], batch size: 49, lr: 7.61e-03, grad_scale: 8.0 2024-09-23 14:14:17,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2024-09-23 14:14:23,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=276266.6666666667, ans=0.125 2024-09-23 14:15:19,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=276406.6666666667, ans=0.125 2024-09-23 14:15:23,832 INFO [train.py:1198] (0/4) Epoch 16, batch 800, loss[loss=0.2199, ctc_loss=0.1465, cr_loss=0.3669, over 16995.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1537, cr_loss=0.3685, over 3305344.77 frames. ], batch size: 56, lr: 7.60e-03, grad_scale: 16.0 2024-09-23 14:15:47,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=276500.0, ans=0.2 2024-09-23 14:15:57,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=276546.6666666667, ans=0.2 2024-09-23 14:16:00,467 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.250e+02 1.343e+02 1.464e+02 3.040e+02, threshold=2.687e+02, percent-clipped=1.0 2024-09-23 14:16:20,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=276593.3333333333, ans=0.125 2024-09-23 14:16:22,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=276593.3333333333, ans=0.125 2024-09-23 14:16:48,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=276640.0, ans=0.0 2024-09-23 14:16:51,110 INFO [train.py:1198] (0/4) Epoch 16, batch 850, loss[loss=0.2459, ctc_loss=0.1649, cr_loss=0.4052, over 16537.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1535, cr_loss=0.368, over 3321685.75 frames. ], batch size: 66, lr: 7.60e-03, grad_scale: 16.0 2024-09-23 14:16:56,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=276686.6666666667, ans=0.0 2024-09-23 14:16:59,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-09-23 14:17:09,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=22.5 2024-09-23 14:17:34,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=276780.0, ans=0.125 2024-09-23 14:17:42,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=276826.6666666667, ans=0.0 2024-09-23 14:18:02,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=276873.3333333333, ans=0.125 2024-09-23 14:18:11,441 INFO [train.py:1198] (0/4) Epoch 16, batch 900, loss[loss=0.2187, ctc_loss=0.1451, cr_loss=0.3678, over 17079.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1534, cr_loss=0.3687, over 3340025.11 frames. ], batch size: 49, lr: 7.60e-03, grad_scale: 16.0 2024-09-23 14:18:30,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=276966.6666666667, ans=0.125 2024-09-23 14:18:35,654 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:18:43,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=277013.3333333333, ans=0.125 2024-09-23 14:18:47,947 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.278e+02 1.412e+02 1.648e+02 4.971e+02, threshold=2.824e+02, percent-clipped=1.0 2024-09-23 14:19:05,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=277060.0, ans=10.0 2024-09-23 14:19:11,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=277060.0, ans=0.0 2024-09-23 14:19:25,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=12.0 2024-09-23 14:19:29,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=277153.3333333333, ans=0.2 2024-09-23 14:19:30,871 INFO [train.py:1198] (0/4) Epoch 16, batch 950, loss[loss=0.2537, ctc_loss=0.1772, cr_loss=0.3826, over 17037.00 frames. ], tot_loss[loss=0.2275, ctc_loss=0.1537, cr_loss=0.3691, over 3347255.61 frames. ], batch size: 52, lr: 7.59e-03, grad_scale: 16.0 2024-09-23 14:19:48,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=277200.0, ans=0.0 2024-09-23 14:20:12,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=277246.6666666667, ans=0.2 2024-09-23 14:20:31,587 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:20:50,274 INFO [train.py:1198] (0/4) Epoch 16, batch 1000, loss[loss=0.2072, ctc_loss=0.1378, cr_loss=0.3471, over 17153.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.153, cr_loss=0.3671, over 3344695.50 frames. ], batch size: 48, lr: 7.59e-03, grad_scale: 16.0 2024-09-23 14:20:53,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=277386.6666666667, ans=0.125 2024-09-23 14:21:14,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=277433.3333333333, ans=0.125 2024-09-23 14:21:17,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=277433.3333333333, ans=0.125 2024-09-23 14:21:25,044 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.66 vs. limit=5.0 2024-09-23 14:21:30,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=277480.0, ans=0.125 2024-09-23 14:21:37,360 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.289e+02 1.371e+02 1.510e+02 2.522e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-23 14:21:37,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=277480.0, ans=0.125 2024-09-23 14:21:40,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=277480.0, ans=0.0 2024-09-23 14:21:47,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=277526.6666666667, ans=0.0 2024-09-23 14:21:56,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=277526.6666666667, ans=0.2 2024-09-23 14:21:59,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=277526.6666666667, ans=6.0 2024-09-23 14:22:01,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=277526.6666666667, ans=0.125 2024-09-23 14:22:03,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277573.3333333333, ans=0.0 2024-09-23 14:22:04,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=22.5 2024-09-23 14:22:20,156 INFO [train.py:1198] (0/4) Epoch 16, batch 1050, loss[loss=0.1919, ctc_loss=0.128, cr_loss=0.3192, over 16575.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1526, cr_loss=0.3657, over 3357292.96 frames. ], batch size: 37, lr: 7.59e-03, grad_scale: 16.0 2024-09-23 14:22:33,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=277620.0, ans=0.025 2024-09-23 14:22:53,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277713.3333333333, ans=0.1 2024-09-23 14:22:57,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=277713.3333333333, ans=0.1 2024-09-23 14:23:14,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=277760.0, ans=0.0 2024-09-23 14:23:39,993 INFO [train.py:1198] (0/4) Epoch 16, batch 1100, loss[loss=0.1782, ctc_loss=0.1194, cr_loss=0.2936, over 17049.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1518, cr_loss=0.3642, over 3351787.18 frames. ], batch size: 39, lr: 7.58e-03, grad_scale: 16.0 2024-09-23 14:23:48,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.38 vs. limit=6.0 2024-09-23 14:23:57,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=277900.0, ans=0.125 2024-09-23 14:24:02,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=277900.0, ans=0.125 2024-09-23 14:24:05,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=277900.0, ans=0.125 2024-09-23 14:24:08,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=277900.0, ans=10.0 2024-09-23 14:24:11,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277946.6666666667, ans=0.1 2024-09-23 14:24:16,322 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.265e+02 1.345e+02 1.521e+02 2.827e+02, threshold=2.691e+02, percent-clipped=1.0 2024-09-23 14:24:22,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=277946.6666666667, ans=0.1 2024-09-23 14:24:28,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2024-09-23 14:24:39,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-09-23 14:24:47,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2024-09-23 14:24:59,429 INFO [train.py:1198] (0/4) Epoch 16, batch 1150, loss[loss=0.2199, ctc_loss=0.1457, cr_loss=0.3712, over 17042.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1526, cr_loss=0.3669, over 3358229.68 frames. ], batch size: 44, lr: 7.58e-03, grad_scale: 16.0 2024-09-23 14:25:10,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=278086.6666666667, ans=0.125 2024-09-23 14:25:16,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2024-09-23 14:25:38,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=278180.0, ans=0.2 2024-09-23 14:26:08,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=278226.6666666667, ans=0.0 2024-09-23 14:26:13,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=278273.3333333333, ans=0.125 2024-09-23 14:26:24,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2024-09-23 14:26:27,405 INFO [train.py:1198] (0/4) Epoch 16, batch 1200, loss[loss=0.2446, ctc_loss=0.1639, cr_loss=0.4034, over 16522.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1531, cr_loss=0.3678, over 3365516.71 frames. ], batch size: 66, lr: 7.58e-03, grad_scale: 16.0 2024-09-23 14:26:27,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=278320.0, ans=0.025 2024-09-23 14:26:29,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278320.0, ans=0.1 2024-09-23 14:26:29,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=278320.0, ans=0.125 2024-09-23 14:26:29,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2024-09-23 14:26:44,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=278366.6666666667, ans=0.0 2024-09-23 14:27:00,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=278413.3333333333, ans=0.0 2024-09-23 14:27:00,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=278413.3333333333, ans=0.0 2024-09-23 14:27:08,390 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.263e+02 1.375e+02 1.495e+02 2.178e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 14:27:49,957 INFO [train.py:1198] (0/4) Epoch 16, batch 1250, loss[loss=0.2099, ctc_loss=0.1428, cr_loss=0.3358, over 16940.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1528, cr_loss=0.3673, over 3361491.36 frames. ], batch size: 42, lr: 7.57e-03, grad_scale: 16.0 2024-09-23 14:28:14,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=278600.0, ans=0.05 2024-09-23 14:28:35,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.26 vs. limit=22.5 2024-09-23 14:28:38,381 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:28:44,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=278693.3333333333, ans=0.0 2024-09-23 14:29:10,196 INFO [train.py:1198] (0/4) Epoch 16, batch 1300, loss[loss=0.1887, ctc_loss=0.123, cr_loss=0.3289, over 17296.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1523, cr_loss=0.3669, over 3366408.65 frames. ], batch size: 42, lr: 7.57e-03, grad_scale: 16.0 2024-09-23 14:29:35,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=278833.3333333333, ans=0.5 2024-09-23 14:29:48,136 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.264e+02 1.359e+02 1.551e+02 2.304e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-23 14:30:06,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=278926.6666666667, ans=0.125 2024-09-23 14:30:17,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=278973.3333333333, ans=0.025 2024-09-23 14:30:29,689 INFO [train.py:1198] (0/4) Epoch 16, batch 1350, loss[loss=0.2152, ctc_loss=0.1431, cr_loss=0.3602, over 17150.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1518, cr_loss=0.3664, over 3373611.77 frames. ], batch size: 48, lr: 7.57e-03, grad_scale: 16.0 2024-09-23 14:31:43,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.97 vs. limit=10.0 2024-09-23 14:31:58,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=279253.3333333333, ans=0.125 2024-09-23 14:32:00,030 INFO [train.py:1198] (0/4) Epoch 16, batch 1400, loss[loss=0.2155, ctc_loss=0.1437, cr_loss=0.3592, over 17073.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1523, cr_loss=0.3671, over 3358018.35 frames. ], batch size: 49, lr: 7.56e-03, grad_scale: 16.0 2024-09-23 14:32:38,353 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.268e+02 1.363e+02 1.512e+02 2.184e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-23 14:32:54,898 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:32:58,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=279393.3333333333, ans=0.1 2024-09-23 14:33:07,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=279440.0, ans=0.1 2024-09-23 14:33:20,076 INFO [train.py:1198] (0/4) Epoch 16, batch 1450, loss[loss=0.2268, ctc_loss=0.1541, cr_loss=0.3636, over 16919.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.153, cr_loss=0.3685, over 3354922.66 frames. ], batch size: 58, lr: 7.56e-03, grad_scale: 16.0 2024-09-23 14:33:28,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279486.6666666667, ans=0.1 2024-09-23 14:33:49,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=279533.3333333333, ans=0.125 2024-09-23 14:34:05,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=279580.0, ans=0.2 2024-09-23 14:34:39,815 INFO [train.py:1198] (0/4) Epoch 16, batch 1500, loss[loss=0.2101, ctc_loss=0.1407, cr_loss=0.347, over 17256.00 frames. ], tot_loss[loss=0.2272, ctc_loss=0.1535, cr_loss=0.3686, over 3357617.08 frames. ], batch size: 42, lr: 7.56e-03, grad_scale: 16.0 2024-09-23 14:34:41,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=279720.0, ans=0.125 2024-09-23 14:34:54,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=279766.6666666667, ans=0.125 2024-09-23 14:34:59,307 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:35:16,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=279813.3333333333, ans=0.2 2024-09-23 14:35:18,097 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.287e+02 1.379e+02 1.521e+02 2.599e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-23 14:35:19,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279813.3333333333, ans=0.1 2024-09-23 14:35:37,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=279860.0, ans=0.0 2024-09-23 14:35:43,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=279906.6666666667, ans=0.0 2024-09-23 14:35:43,836 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:36:06,619 INFO [train.py:1198] (0/4) Epoch 16, batch 1550, loss[loss=0.2447, ctc_loss=0.1635, cr_loss=0.4063, over 17078.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1534, cr_loss=0.3683, over 3358796.68 frames. ], batch size: 46, lr: 7.56e-03, grad_scale: 16.0 2024-09-23 14:36:21,360 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-60000.pt 2024-09-23 14:37:15,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=280140.0, ans=0.125 2024-09-23 14:37:21,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=280140.0, ans=0.0 2024-09-23 14:37:28,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=280140.0, ans=0.0 2024-09-23 14:37:30,855 INFO [train.py:1198] (0/4) Epoch 16, batch 1600, loss[loss=0.2003, ctc_loss=0.1317, cr_loss=0.3431, over 17081.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1525, cr_loss=0.3663, over 3355654.92 frames. ], batch size: 39, lr: 7.55e-03, grad_scale: 32.0 2024-09-23 14:37:57,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=22.5 2024-09-23 14:37:58,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=280233.3333333333, ans=0.0 2024-09-23 14:38:10,324 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.239e+02 1.329e+02 1.436e+02 2.606e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-23 14:38:10,795 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:38:13,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=280280.0, ans=0.1 2024-09-23 14:38:50,312 INFO [train.py:1198] (0/4) Epoch 16, batch 1650, loss[loss=0.2273, ctc_loss=0.1555, cr_loss=0.3593, over 16762.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1527, cr_loss=0.3666, over 3359636.50 frames. ], batch size: 61, lr: 7.55e-03, grad_scale: 16.0 2024-09-23 14:38:50,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=280420.0, ans=0.125 2024-09-23 14:38:55,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=280420.0, ans=0.025 2024-09-23 14:38:58,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=280420.0, ans=0.2 2024-09-23 14:39:34,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=280513.3333333333, ans=0.95 2024-09-23 14:39:35,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280513.3333333333, ans=0.125 2024-09-23 14:39:55,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=280606.6666666667, ans=0.0 2024-09-23 14:39:59,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=280606.6666666667, ans=0.2 2024-09-23 14:40:09,865 INFO [train.py:1198] (0/4) Epoch 16, batch 1700, loss[loss=0.193, ctc_loss=0.1275, cr_loss=0.3275, over 17049.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1518, cr_loss=0.3647, over 3357412.02 frames. ], batch size: 39, lr: 7.55e-03, grad_scale: 16.0 2024-09-23 14:40:29,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=280700.0, ans=0.125 2024-09-23 14:40:35,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280700.0, ans=0.125 2024-09-23 14:40:43,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2024-09-23 14:40:52,361 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.241e+02 1.314e+02 1.415e+02 2.347e+02, threshold=2.628e+02, percent-clipped=0.0 2024-09-23 14:41:00,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=280746.6666666667, ans=0.125 2024-09-23 14:41:04,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=280793.3333333333, ans=0.125 2024-09-23 14:41:09,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=280793.3333333333, ans=0.0 2024-09-23 14:41:12,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-23 14:41:19,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2024-09-23 14:41:23,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=280840.0, ans=0.125 2024-09-23 14:41:35,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=280840.0, ans=0.125 2024-09-23 14:41:40,568 INFO [train.py:1198] (0/4) Epoch 16, batch 1750, loss[loss=0.2311, ctc_loss=0.1566, cr_loss=0.3725, over 16769.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1511, cr_loss=0.3649, over 3369673.26 frames. ], batch size: 61, lr: 7.54e-03, grad_scale: 16.0 2024-09-23 14:41:55,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=280933.3333333333, ans=0.2 2024-09-23 14:42:01,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=280933.3333333333, ans=0.125 2024-09-23 14:42:06,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=280933.3333333333, ans=0.2 2024-09-23 14:42:30,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=281026.6666666667, ans=0.125 2024-09-23 14:42:45,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=12.0 2024-09-23 14:42:51,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=281073.3333333333, ans=0.125 2024-09-23 14:42:54,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=281073.3333333333, ans=0.0 2024-09-23 14:43:00,738 INFO [train.py:1198] (0/4) Epoch 16, batch 1800, loss[loss=0.2257, ctc_loss=0.1524, cr_loss=0.3666, over 17303.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1521, cr_loss=0.3669, over 3360681.65 frames. ], batch size: 46, lr: 7.54e-03, grad_scale: 16.0 2024-09-23 14:43:40,372 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.265e+02 1.365e+02 1.521e+02 2.085e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-23 14:43:52,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=22.5 2024-09-23 14:44:07,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=12.0 2024-09-23 14:44:20,024 INFO [train.py:1198] (0/4) Epoch 16, batch 1850, loss[loss=0.2159, ctc_loss=0.1435, cr_loss=0.362, over 17306.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1518, cr_loss=0.3667, over 3361270.53 frames. ], batch size: 51, lr: 7.54e-03, grad_scale: 16.0 2024-09-23 14:44:50,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=281446.6666666667, ans=0.0 2024-09-23 14:44:55,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=281446.6666666667, ans=0.125 2024-09-23 14:45:11,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2024-09-23 14:45:19,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=281493.3333333333, ans=0.125 2024-09-23 14:45:40,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-23 14:45:40,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=281586.6666666667, ans=0.125 2024-09-23 14:45:42,171 INFO [train.py:1198] (0/4) Epoch 16, batch 1900, loss[loss=0.2139, ctc_loss=0.1435, cr_loss=0.3523, over 17201.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.151, cr_loss=0.3651, over 3361961.48 frames. ], batch size: 47, lr: 7.53e-03, grad_scale: 16.0 2024-09-23 14:45:43,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=281586.6666666667, ans=0.0 2024-09-23 14:45:47,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=281586.6666666667, ans=0.125 2024-09-23 14:45:51,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=281586.6666666667, ans=0.0 2024-09-23 14:46:27,054 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.270e+02 1.346e+02 1.420e+02 2.030e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-23 14:46:54,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=281773.3333333333, ans=0.125 2024-09-23 14:47:09,368 INFO [train.py:1198] (0/4) Epoch 16, batch 1950, loss[loss=0.2305, ctc_loss=0.1562, cr_loss=0.3712, over 17357.00 frames. ], tot_loss[loss=0.2247, ctc_loss=0.1515, cr_loss=0.3661, over 3370687.33 frames. ], batch size: 48, lr: 7.53e-03, grad_scale: 16.0 2024-09-23 14:47:50,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=281913.3333333333, ans=0.125 2024-09-23 14:48:03,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=22.5 2024-09-23 14:48:26,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=282006.6666666667, ans=0.1 2024-09-23 14:48:29,656 INFO [train.py:1198] (0/4) Epoch 16, batch 2000, loss[loss=0.2327, ctc_loss=0.1578, cr_loss=0.3743, over 17014.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1519, cr_loss=0.3666, over 3371324.57 frames. ], batch size: 52, lr: 7.53e-03, grad_scale: 32.0 2024-09-23 14:48:31,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=282053.3333333333, ans=0.1 2024-09-23 14:48:41,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-23 14:48:48,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2024-09-23 14:48:50,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=282100.0, ans=0.0 2024-09-23 14:48:55,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=282100.0, ans=0.0 2024-09-23 14:48:58,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=282100.0, ans=0.125 2024-09-23 14:49:09,155 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.302e+02 1.409e+02 1.610e+02 3.619e+02, threshold=2.818e+02, percent-clipped=1.0 2024-09-23 14:49:18,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.98 vs. limit=15.0 2024-09-23 14:49:24,082 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 14:49:30,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=282193.3333333333, ans=0.125 2024-09-23 14:49:49,171 INFO [train.py:1198] (0/4) Epoch 16, batch 2050, loss[loss=0.2469, ctc_loss=0.1684, cr_loss=0.3927, over 16738.00 frames. ], tot_loss[loss=0.2262, ctc_loss=0.1526, cr_loss=0.3677, over 3360725.54 frames. ], batch size: 61, lr: 7.52e-03, grad_scale: 32.0 2024-09-23 14:50:05,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=282333.3333333333, ans=0.04949747468305833 2024-09-23 14:50:33,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=282380.0, ans=0.0 2024-09-23 14:50:43,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.39 vs. limit=22.5 2024-09-23 14:51:04,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=282473.3333333333, ans=0.125 2024-09-23 14:51:16,378 INFO [train.py:1198] (0/4) Epoch 16, batch 2100, loss[loss=0.2593, ctc_loss=0.1783, cr_loss=0.4052, over 16610.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1524, cr_loss=0.3675, over 3361876.32 frames. ], batch size: 66, lr: 7.52e-03, grad_scale: 32.0 2024-09-23 14:51:26,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=282520.0, ans=0.125 2024-09-23 14:51:58,360 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.277e+02 1.347e+02 1.477e+02 3.259e+02, threshold=2.693e+02, percent-clipped=1.0 2024-09-23 14:52:38,060 INFO [train.py:1198] (0/4) Epoch 16, batch 2150, loss[loss=0.2238, ctc_loss=0.1477, cr_loss=0.3807, over 17306.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1526, cr_loss=0.3678, over 3354364.25 frames. ], batch size: 46, lr: 7.52e-03, grad_scale: 32.0 2024-09-23 14:52:59,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=282800.0, ans=0.0 2024-09-23 14:53:42,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=282940.0, ans=0.125 2024-09-23 14:53:52,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=282940.0, ans=0.0 2024-09-23 14:53:58,104 INFO [train.py:1198] (0/4) Epoch 16, batch 2200, loss[loss=0.21, ctc_loss=0.1403, cr_loss=0.3485, over 17175.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1521, cr_loss=0.3667, over 3350511.24 frames. ], batch size: 45, lr: 7.52e-03, grad_scale: 32.0 2024-09-23 14:54:31,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2024-09-23 14:54:38,170 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.245e+02 1.360e+02 1.489e+02 2.276e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-23 14:55:14,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=283173.3333333333, ans=0.0 2024-09-23 14:55:18,499 INFO [train.py:1198] (0/4) Epoch 16, batch 2250, loss[loss=0.2079, ctc_loss=0.1408, cr_loss=0.3354, over 17263.00 frames. ], tot_loss[loss=0.2264, ctc_loss=0.1527, cr_loss=0.3682, over 3350523.31 frames. ], batch size: 44, lr: 7.51e-03, grad_scale: 32.0 2024-09-23 14:55:31,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=12.0 2024-09-23 14:56:02,084 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2024-09-23 14:56:35,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=283406.6666666667, ans=0.0 2024-09-23 14:56:47,893 INFO [train.py:1198] (0/4) Epoch 16, batch 2300, loss[loss=0.2314, ctc_loss=0.1564, cr_loss=0.3746, over 17225.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1521, cr_loss=0.3668, over 3351646.62 frames. ], batch size: 47, lr: 7.51e-03, grad_scale: 32.0 2024-09-23 14:57:10,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=283500.0, ans=0.025 2024-09-23 14:57:15,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.39 vs. limit=15.0 2024-09-23 14:57:27,772 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.282e+02 1.381e+02 1.559e+02 2.607e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-23 14:58:07,689 INFO [train.py:1198] (0/4) Epoch 16, batch 2350, loss[loss=0.2494, ctc_loss=0.171, cr_loss=0.3919, over 16370.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1524, cr_loss=0.3667, over 3351492.13 frames. ], batch size: 66, lr: 7.51e-03, grad_scale: 32.0 2024-09-23 14:58:30,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283733.3333333333, ans=0.1 2024-09-23 14:58:40,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=283780.0, ans=0.0 2024-09-23 14:59:20,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=283873.3333333333, ans=0.2 2024-09-23 14:59:23,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2024-09-23 14:59:27,617 INFO [train.py:1198] (0/4) Epoch 16, batch 2400, loss[loss=0.2557, ctc_loss=0.1764, cr_loss=0.3968, over 16438.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1526, cr_loss=0.3673, over 3345653.75 frames. ], batch size: 66, lr: 7.50e-03, grad_scale: 32.0 2024-09-23 14:59:32,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283920.0, ans=0.1 2024-09-23 14:59:36,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=283920.0, ans=0.09899494936611666 2024-09-23 15:00:07,592 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.274e+02 1.411e+02 1.575e+02 2.045e+02, threshold=2.822e+02, percent-clipped=0.0 2024-09-23 15:00:30,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=284106.6666666667, ans=0.125 2024-09-23 15:00:34,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=284106.6666666667, ans=0.0 2024-09-23 15:00:39,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=284106.6666666667, ans=0.2 2024-09-23 15:00:54,750 INFO [train.py:1198] (0/4) Epoch 16, batch 2450, loss[loss=0.2024, ctc_loss=0.1376, cr_loss=0.324, over 17078.00 frames. ], tot_loss[loss=0.2251, ctc_loss=0.1518, cr_loss=0.3661, over 3356286.87 frames. ], batch size: 46, lr: 7.50e-03, grad_scale: 32.0 2024-09-23 15:00:57,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-09-23 15:01:09,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=284200.0, ans=0.0 2024-09-23 15:01:14,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=284200.0, ans=0.0 2024-09-23 15:01:20,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284200.0, ans=0.1 2024-09-23 15:01:21,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=284200.0, ans=0.125 2024-09-23 15:01:44,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.19 vs. limit=22.5 2024-09-23 15:02:17,036 INFO [train.py:1198] (0/4) Epoch 16, batch 2500, loss[loss=0.2479, ctc_loss=0.1669, cr_loss=0.4051, over 17297.00 frames. ], tot_loss[loss=0.2265, ctc_loss=0.1531, cr_loss=0.367, over 3343083.74 frames. ], batch size: 51, lr: 7.50e-03, grad_scale: 32.0 2024-09-23 15:02:17,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=284386.6666666667, ans=0.2 2024-09-23 15:02:47,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=284480.0, ans=0.125 2024-09-23 15:02:56,745 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.256e+02 1.352e+02 1.487e+02 2.472e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-23 15:03:08,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=284526.6666666667, ans=0.2 2024-09-23 15:03:36,421 INFO [train.py:1198] (0/4) Epoch 16, batch 2550, loss[loss=0.2115, ctc_loss=0.1416, cr_loss=0.3494, over 16276.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1525, cr_loss=0.3656, over 3344206.96 frames. ], batch size: 36, lr: 7.49e-03, grad_scale: 32.0 2024-09-23 15:04:48,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=12.0 2024-09-23 15:04:48,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2024-09-23 15:04:56,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2024-09-23 15:04:56,920 INFO [train.py:1198] (0/4) Epoch 16, batch 2600, loss[loss=0.251, ctc_loss=0.1683, cr_loss=0.4136, over 16436.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1525, cr_loss=0.3664, over 3345527.51 frames. ], batch size: 66, lr: 7.49e-03, grad_scale: 32.0 2024-09-23 15:05:11,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=284900.0, ans=0.125 2024-09-23 15:05:14,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=284900.0, ans=0.04949747468305833 2024-09-23 15:05:41,680 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.252e+02 1.364e+02 1.589e+02 2.408e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 15:06:24,112 INFO [train.py:1198] (0/4) Epoch 16, batch 2650, loss[loss=0.1937, ctc_loss=0.1268, cr_loss=0.3344, over 16256.00 frames. ], tot_loss[loss=0.2249, ctc_loss=0.1517, cr_loss=0.3658, over 3351101.80 frames. ], batch size: 36, lr: 7.49e-03, grad_scale: 32.0 2024-09-23 15:06:30,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=285086.6666666667, ans=0.125 2024-09-23 15:06:59,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=285180.0, ans=0.125 2024-09-23 15:07:07,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=285180.0, ans=0.025 2024-09-23 15:07:45,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2024-09-23 15:07:46,887 INFO [train.py:1198] (0/4) Epoch 16, batch 2700, loss[loss=0.1775, ctc_loss=0.1161, cr_loss=0.307, over 17248.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.1514, cr_loss=0.365, over 3355438.56 frames. ], batch size: 42, lr: 7.48e-03, grad_scale: 32.0 2024-09-23 15:07:50,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=285320.0, ans=0.1 2024-09-23 15:08:01,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=285366.6666666667, ans=0.2 2024-09-23 15:08:09,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=285366.6666666667, ans=0.125 2024-09-23 15:08:26,735 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.250e+02 1.328e+02 1.437e+02 2.275e+02, threshold=2.655e+02, percent-clipped=0.0 2024-09-23 15:08:38,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=285460.0, ans=0.0 2024-09-23 15:09:06,363 INFO [train.py:1198] (0/4) Epoch 16, batch 2750, loss[loss=0.2628, ctc_loss=0.1835, cr_loss=0.3963, over 16592.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.152, cr_loss=0.3661, over 3350703.76 frames. ], batch size: 66, lr: 7.48e-03, grad_scale: 32.0 2024-09-23 15:09:17,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=285553.3333333333, ans=0.125 2024-09-23 15:09:21,354 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:09:21,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2024-09-23 15:09:48,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=285646.6666666667, ans=0.125 2024-09-23 15:09:49,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=285646.6666666667, ans=0.125 2024-09-23 15:10:26,019 INFO [train.py:1198] (0/4) Epoch 16, batch 2800, loss[loss=0.2419, ctc_loss=0.1611, cr_loss=0.404, over 17293.00 frames. ], tot_loss[loss=0.2252, ctc_loss=0.1519, cr_loss=0.3666, over 3362820.99 frames. ], batch size: 51, lr: 7.48e-03, grad_scale: 32.0 2024-09-23 15:10:33,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285786.6666666667, ans=0.1 2024-09-23 15:10:38,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=285786.6666666667, ans=0.125 2024-09-23 15:10:53,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=285833.3333333333, ans=0.0 2024-09-23 15:11:11,236 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.268e+02 1.378e+02 1.547e+02 2.101e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-23 15:11:35,023 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:11:41,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=285973.3333333333, ans=0.125 2024-09-23 15:11:53,455 INFO [train.py:1198] (0/4) Epoch 16, batch 2850, loss[loss=0.271, ctc_loss=0.1875, cr_loss=0.4178, over 14855.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1526, cr_loss=0.3684, over 3364541.20 frames. ], batch size: 89, lr: 7.48e-03, grad_scale: 32.0 2024-09-23 15:11:56,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=286020.0, ans=0.0 2024-09-23 15:12:06,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=286020.0, ans=0.125 2024-09-23 15:12:23,433 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:12:42,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=286160.0, ans=0.07 2024-09-23 15:12:44,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=286160.0, ans=0.125 2024-09-23 15:12:52,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286160.0, ans=0.1 2024-09-23 15:13:00,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=286206.6666666667, ans=0.125 2024-09-23 15:13:03,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=286206.6666666667, ans=0.125 2024-09-23 15:13:13,162 INFO [train.py:1198] (0/4) Epoch 16, batch 2900, loss[loss=0.2598, ctc_loss=0.1795, cr_loss=0.4019, over 17063.00 frames. ], tot_loss[loss=0.2254, ctc_loss=0.1519, cr_loss=0.3673, over 3372475.52 frames. ], batch size: 52, lr: 7.47e-03, grad_scale: 32.0 2024-09-23 15:13:23,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-23 15:13:30,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-23 15:13:32,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286300.0, ans=0.1 2024-09-23 15:13:37,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=286300.0, ans=0.125 2024-09-23 15:13:53,319 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.305e+02 1.434e+02 1.582e+02 2.466e+02, threshold=2.868e+02, percent-clipped=0.0 2024-09-23 15:14:09,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=286393.3333333333, ans=0.125 2024-09-23 15:14:19,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=286440.0, ans=0.125 2024-09-23 15:14:27,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=286440.0, ans=0.2 2024-09-23 15:14:30,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=286440.0, ans=0.0 2024-09-23 15:14:33,435 INFO [train.py:1198] (0/4) Epoch 16, batch 2950, loss[loss=0.2397, ctc_loss=0.1629, cr_loss=0.3837, over 17094.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1521, cr_loss=0.367, over 3374596.93 frames. ], batch size: 49, lr: 7.47e-03, grad_scale: 32.0 2024-09-23 15:14:45,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=286486.6666666667, ans=0.025 2024-09-23 15:15:06,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286580.0, ans=0.1 2024-09-23 15:15:45,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=286673.3333333333, ans=0.0 2024-09-23 15:15:54,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=286673.3333333333, ans=0.015 2024-09-23 15:15:58,478 INFO [train.py:1198] (0/4) Epoch 16, batch 3000, loss[loss=0.2263, ctc_loss=0.152, cr_loss=0.3713, over 17014.00 frames. ], tot_loss[loss=0.2277, ctc_loss=0.1537, cr_loss=0.3697, over 3370472.36 frames. ], batch size: 52, lr: 7.47e-03, grad_scale: 32.0 2024-09-23 15:15:58,479 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 15:16:14,051 INFO [train.py:1230] (0/4) Epoch 16, validation: loss=0.04215, ctc_loss=0.04215, cr_loss=7.551e-15, over 944034.00 frames. 2024-09-23 15:16:14,051 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 15:16:34,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=286766.6666666667, ans=0.125 2024-09-23 15:16:36,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=286766.6666666667, ans=0.125 2024-09-23 15:16:45,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286813.3333333333, ans=0.1 2024-09-23 15:16:53,330 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.253e+02 1.384e+02 1.509e+02 2.501e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-23 15:17:06,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=286860.0, ans=0.125 2024-09-23 15:17:13,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=286860.0, ans=0.0 2024-09-23 15:17:21,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=286906.6666666667, ans=0.2 2024-09-23 15:17:26,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-09-23 15:17:31,988 INFO [train.py:1198] (0/4) Epoch 16, batch 3050, loss[loss=0.2552, ctc_loss=0.1752, cr_loss=0.3999, over 16721.00 frames. ], tot_loss[loss=0.2274, ctc_loss=0.1535, cr_loss=0.3698, over 3369925.26 frames. ], batch size: 61, lr: 7.46e-03, grad_scale: 32.0 2024-09-23 15:17:32,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=286953.3333333333, ans=0.0 2024-09-23 15:17:32,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=286953.3333333333, ans=0.125 2024-09-23 15:17:32,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2024-09-23 15:18:24,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-23 15:18:50,319 INFO [train.py:1198] (0/4) Epoch 16, batch 3100, loss[loss=0.2562, ctc_loss=0.1744, cr_loss=0.4088, over 16868.00 frames. ], tot_loss[loss=0.2267, ctc_loss=0.1528, cr_loss=0.3696, over 3378044.76 frames. ], batch size: 58, lr: 7.46e-03, grad_scale: 32.0 2024-09-23 15:18:52,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=287186.6666666667, ans=0.2 2024-09-23 15:19:09,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287233.3333333333, ans=0.1 2024-09-23 15:19:29,119 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.248e+02 1.355e+02 1.474e+02 3.116e+02, threshold=2.710e+02, percent-clipped=1.0 2024-09-23 15:19:32,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=287280.0, ans=0.125 2024-09-23 15:19:53,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=287373.3333333333, ans=0.0 2024-09-23 15:19:54,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=287373.3333333333, ans=0.125 2024-09-23 15:20:02,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=22.5 2024-09-23 15:20:04,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=287373.3333333333, ans=0.125 2024-09-23 15:20:08,481 INFO [train.py:1198] (0/4) Epoch 16, batch 3150, loss[loss=0.2411, ctc_loss=0.1655, cr_loss=0.3782, over 16500.00 frames. ], tot_loss[loss=0.2261, ctc_loss=0.1523, cr_loss=0.3687, over 3376952.76 frames. ], batch size: 66, lr: 7.46e-03, grad_scale: 16.0 2024-09-23 15:20:36,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=287466.6666666667, ans=0.0 2024-09-23 15:20:38,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=287513.3333333333, ans=0.125 2024-09-23 15:20:40,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=12.0 2024-09-23 15:20:49,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=287513.3333333333, ans=0.125 2024-09-23 15:20:57,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287560.0, ans=0.1 2024-09-23 15:21:10,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=287606.6666666667, ans=0.0 2024-09-23 15:21:26,306 INFO [train.py:1198] (0/4) Epoch 16, batch 3200, loss[loss=0.2372, ctc_loss=0.1602, cr_loss=0.3852, over 17008.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1521, cr_loss=0.3684, over 3378676.21 frames. ], batch size: 52, lr: 7.45e-03, grad_scale: 32.0 2024-09-23 15:21:29,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=12.0 2024-09-23 15:21:45,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.97 vs. limit=15.0 2024-09-23 15:21:48,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=287700.0, ans=0.0 2024-09-23 15:21:52,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-23 15:22:06,510 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.274e+02 1.402e+02 1.535e+02 1.896e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-23 15:22:22,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=287793.3333333333, ans=0.125 2024-09-23 15:22:43,941 INFO [train.py:1198] (0/4) Epoch 16, batch 3250, loss[loss=0.2101, ctc_loss=0.139, cr_loss=0.3557, over 17209.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1521, cr_loss=0.3685, over 3375146.38 frames. ], batch size: 47, lr: 7.45e-03, grad_scale: 32.0 2024-09-23 15:22:50,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=287886.6666666667, ans=0.125 2024-09-23 15:23:00,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=52.59 vs. limit=15.0 2024-09-23 15:23:01,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=287933.3333333333, ans=0.125 2024-09-23 15:23:41,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=288026.6666666667, ans=0.125 2024-09-23 15:23:51,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=12.0 2024-09-23 15:23:53,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=12.0 2024-09-23 15:23:57,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=288073.3333333333, ans=0.125 2024-09-23 15:24:01,926 INFO [train.py:1198] (0/4) Epoch 16, batch 3300, loss[loss=0.2701, ctc_loss=0.1826, cr_loss=0.4374, over 17310.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1523, cr_loss=0.3685, over 3367204.51 frames. ], batch size: 46, lr: 7.45e-03, grad_scale: 32.0 2024-09-23 15:24:28,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=288166.6666666667, ans=0.0 2024-09-23 15:24:36,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=288213.3333333333, ans=0.0 2024-09-23 15:24:46,430 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.267e+02 1.362e+02 1.524e+02 2.882e+02, threshold=2.723e+02, percent-clipped=1.0 2024-09-23 15:24:49,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=288213.3333333333, ans=0.125 2024-09-23 15:24:49,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=288213.3333333333, ans=0.125 2024-09-23 15:25:04,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288260.0, ans=0.1 2024-09-23 15:25:06,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-09-23 15:25:24,525 INFO [train.py:1198] (0/4) Epoch 16, batch 3350, loss[loss=0.2501, ctc_loss=0.1669, cr_loss=0.4161, over 17059.00 frames. ], tot_loss[loss=0.2271, ctc_loss=0.1532, cr_loss=0.3696, over 3361843.70 frames. ], batch size: 56, lr: 7.45e-03, grad_scale: 32.0 2024-09-23 15:25:41,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=288400.0, ans=0.1 2024-09-23 15:26:06,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=288446.6666666667, ans=0.0 2024-09-23 15:26:06,503 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:26:14,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=288493.3333333333, ans=0.125 2024-09-23 15:26:19,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=288493.3333333333, ans=0.025 2024-09-23 15:26:23,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=288493.3333333333, ans=0.125 2024-09-23 15:26:31,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=288540.0, ans=0.125 2024-09-23 15:26:39,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2024-09-23 15:26:42,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288540.0, ans=0.1 2024-09-23 15:26:45,110 INFO [train.py:1198] (0/4) Epoch 16, batch 3400, loss[loss=0.1889, ctc_loss=0.1247, cr_loss=0.3209, over 17064.00 frames. ], tot_loss[loss=0.2266, ctc_loss=0.1529, cr_loss=0.3687, over 3360050.89 frames. ], batch size: 46, lr: 7.44e-03, grad_scale: 32.0 2024-09-23 15:26:50,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=288586.6666666667, ans=0.2 2024-09-23 15:26:59,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=288633.3333333333, ans=0.0 2024-09-23 15:27:02,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=288633.3333333333, ans=0.125 2024-09-23 15:27:12,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288633.3333333333, ans=0.1 2024-09-23 15:27:20,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=288680.0, ans=0.0 2024-09-23 15:27:27,932 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.234e+02 1.362e+02 1.557e+02 3.833e+02, threshold=2.724e+02, percent-clipped=1.0 2024-09-23 15:27:34,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=288726.6666666667, ans=0.0 2024-09-23 15:28:05,749 INFO [train.py:1198] (0/4) Epoch 16, batch 3450, loss[loss=0.2014, ctc_loss=0.1342, cr_loss=0.336, over 17195.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1521, cr_loss=0.3675, over 3368211.01 frames. ], batch size: 41, lr: 7.44e-03, grad_scale: 32.0 2024-09-23 15:28:32,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=22.5 2024-09-23 15:28:48,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=288913.3333333333, ans=0.125 2024-09-23 15:29:06,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=289006.6666666667, ans=0.0 2024-09-23 15:29:14,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=289006.6666666667, ans=0.125 2024-09-23 15:29:19,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=289006.6666666667, ans=0.0 2024-09-23 15:29:19,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.94 vs. limit=10.0 2024-09-23 15:29:23,287 INFO [train.py:1198] (0/4) Epoch 16, batch 3500, loss[loss=0.1958, ctc_loss=0.1269, cr_loss=0.3445, over 17085.00 frames. ], tot_loss[loss=0.2259, ctc_loss=0.1523, cr_loss=0.368, over 3372501.64 frames. ], batch size: 43, lr: 7.44e-03, grad_scale: 32.0 2024-09-23 15:29:35,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.56 vs. limit=10.0 2024-09-23 15:29:37,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=289100.0, ans=0.0 2024-09-23 15:29:50,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=289100.0, ans=0.125 2024-09-23 15:30:01,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=289146.6666666667, ans=0.125 2024-09-23 15:30:01,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=289146.6666666667, ans=15.0 2024-09-23 15:30:03,921 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.282e+02 1.368e+02 1.525e+02 3.473e+02, threshold=2.736e+02, percent-clipped=1.0 2024-09-23 15:30:05,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=289146.6666666667, ans=0.2 2024-09-23 15:30:07,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=289146.6666666667, ans=0.2 2024-09-23 15:30:09,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-09-23 15:30:12,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289193.3333333333, ans=0.1 2024-09-23 15:30:13,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289193.3333333333, ans=0.1 2024-09-23 15:30:41,434 INFO [train.py:1198] (0/4) Epoch 16, batch 3550, loss[loss=0.2687, ctc_loss=0.1901, cr_loss=0.3927, over 12407.00 frames. ], tot_loss[loss=0.2258, ctc_loss=0.1522, cr_loss=0.3677, over 3361255.36 frames. ], batch size: 123, lr: 7.43e-03, grad_scale: 32.0 2024-09-23 15:30:44,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=289286.6666666667, ans=22.5 2024-09-23 15:30:49,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=289286.6666666667, ans=0.125 2024-09-23 15:31:02,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=289333.3333333333, ans=0.025 2024-09-23 15:31:03,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=22.5 2024-09-23 15:31:13,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=289380.0, ans=0.0 2024-09-23 15:31:15,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=289380.0, ans=0.125 2024-09-23 15:32:00,002 INFO [train.py:1198] (0/4) Epoch 16, batch 3600, loss[loss=0.1775, ctc_loss=0.1157, cr_loss=0.3091, over 17107.00 frames. ], tot_loss[loss=0.226, ctc_loss=0.1524, cr_loss=0.3677, over 3355649.03 frames. ], batch size: 40, lr: 7.43e-03, grad_scale: 32.0 2024-09-23 15:32:19,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-23 15:32:23,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=22.5 2024-09-23 15:32:25,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=289566.6666666667, ans=0.125 2024-09-23 15:32:33,929 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-23 15:32:40,499 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.033e+02 1.232e+02 1.305e+02 1.381e+02 1.906e+02, threshold=2.610e+02, percent-clipped=0.0 2024-09-23 15:32:48,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=289660.0, ans=0.0 2024-09-23 15:33:06,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=22.5 2024-09-23 15:33:08,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=22.5 2024-09-23 15:33:18,231 INFO [train.py:1198] (0/4) Epoch 16, batch 3650, loss[loss=0.2079, ctc_loss=0.1414, cr_loss=0.3324, over 17031.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1515, cr_loss=0.3662, over 3366138.21 frames. ], batch size: 44, lr: 7.43e-03, grad_scale: 16.0 2024-09-23 15:33:24,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=289753.3333333333, ans=0.2 2024-09-23 15:33:32,702 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:33:59,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=289846.6666666667, ans=0.1 2024-09-23 15:34:04,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=289846.6666666667, ans=0.0 2024-09-23 15:34:38,034 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:34:40,767 INFO [train.py:1198] (0/4) Epoch 16, batch 3700, loss[loss=0.2469, ctc_loss=0.1675, cr_loss=0.3969, over 16690.00 frames. ], tot_loss[loss=0.2257, ctc_loss=0.1523, cr_loss=0.3674, over 3366676.15 frames. ], batch size: 61, lr: 7.43e-03, grad_scale: 16.0 2024-09-23 15:35:11,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=290080.0, ans=0.035 2024-09-23 15:35:23,810 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.296e+02 1.389e+02 1.546e+02 2.430e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-23 15:35:38,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=290126.6666666667, ans=0.2 2024-09-23 15:35:41,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=290126.6666666667, ans=0.1 2024-09-23 15:35:56,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=290173.3333333333, ans=0.125 2024-09-23 15:35:59,679 INFO [train.py:1198] (0/4) Epoch 16, batch 3750, loss[loss=0.1997, ctc_loss=0.1334, cr_loss=0.3315, over 17178.00 frames. ], tot_loss[loss=0.2263, ctc_loss=0.1527, cr_loss=0.3681, over 3356699.12 frames. ], batch size: 45, lr: 7.42e-03, grad_scale: 16.0 2024-09-23 15:36:04,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=290220.0, ans=0.05 2024-09-23 15:36:17,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=290266.6666666667, ans=0.125 2024-09-23 15:36:34,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=290313.3333333333, ans=0.125 2024-09-23 15:36:36,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=290313.3333333333, ans=0.025 2024-09-23 15:36:42,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290313.3333333333, ans=0.1 2024-09-23 15:36:42,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=290313.3333333333, ans=0.0 2024-09-23 15:36:50,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=290360.0, ans=0.1 2024-09-23 15:37:02,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=290406.6666666667, ans=0.07 2024-09-23 15:37:08,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.45 vs. limit=10.0 2024-09-23 15:37:09,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-23 15:37:11,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=290406.6666666667, ans=0.025 2024-09-23 15:37:12,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=290406.6666666667, ans=0.125 2024-09-23 15:37:18,711 INFO [train.py:1198] (0/4) Epoch 16, batch 3800, loss[loss=0.2298, ctc_loss=0.1544, cr_loss=0.377, over 17308.00 frames. ], tot_loss[loss=0.2268, ctc_loss=0.1532, cr_loss=0.3683, over 3339767.40 frames. ], batch size: 51, lr: 7.42e-03, grad_scale: 16.0 2024-09-23 15:37:26,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=290453.3333333333, ans=0.125 2024-09-23 15:37:28,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=290453.3333333333, ans=0.0 2024-09-23 15:38:00,417 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.261e+02 1.370e+02 1.506e+02 3.479e+02, threshold=2.739e+02, percent-clipped=1.0 2024-09-23 15:38:00,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=290546.6666666667, ans=0.2 2024-09-23 15:38:08,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=290593.3333333333, ans=0.125 2024-09-23 15:38:16,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=290593.3333333333, ans=0.0 2024-09-23 15:38:23,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2024-09-23 15:38:36,621 INFO [train.py:1198] (0/4) Epoch 16, batch 3850, loss[loss=0.2175, ctc_loss=0.1443, cr_loss=0.366, over 17291.00 frames. ], tot_loss[loss=0.2278, ctc_loss=0.154, cr_loss=0.3689, over 3313331.25 frames. ], batch size: 51, lr: 7.42e-03, grad_scale: 16.0 2024-09-23 15:38:44,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=290686.6666666667, ans=0.0 2024-09-23 15:38:46,416 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:39:06,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=290780.0, ans=0.0 2024-09-23 15:39:16,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=290780.0, ans=0.025 2024-09-23 15:39:40,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2024-09-23 15:39:47,132 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-16.pt 2024-09-23 15:40:38,390 INFO [train.py:1198] (0/4) Epoch 17, batch 0, loss[loss=0.2364, ctc_loss=0.1627, cr_loss=0.3682, over 17304.00 frames. ], tot_loss[loss=0.2364, ctc_loss=0.1627, cr_loss=0.3682, over 17304.00 frames. ], batch size: 51, lr: 7.19e-03, grad_scale: 32.0 2024-09-23 15:40:38,390 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 15:40:45,947 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2106, 3.6783, 3.5426, 4.0708, 3.3711, 3.3966, 4.0474, 4.3269], device='cuda:0') 2024-09-23 15:40:53,767 INFO [train.py:1230] (0/4) Epoch 17, validation: loss=0.04104, ctc_loss=0.04104, cr_loss=7.589e-15, over 944034.00 frames. 2024-09-23 15:40:53,768 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 15:41:06,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=290901.3333333333, ans=0.125 2024-09-23 15:41:11,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=290948.0, ans=0.02 2024-09-23 15:41:24,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=290948.0, ans=0.0 2024-09-23 15:41:24,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=290948.0, ans=0.0 2024-09-23 15:41:46,228 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.415e+02 1.552e+02 1.651e+02 2.695e+02, threshold=3.103e+02, percent-clipped=0.0 2024-09-23 15:41:46,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=291041.3333333333, ans=0.0 2024-09-23 15:41:46,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=12.0 2024-09-23 15:41:48,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=291041.3333333333, ans=0.125 2024-09-23 15:42:18,132 INFO [train.py:1198] (0/4) Epoch 17, batch 50, loss[loss=0.2095, ctc_loss=0.1392, cr_loss=0.3517, over 17178.00 frames. ], tot_loss[loss=0.229, ctc_loss=0.155, cr_loss=0.3696, over 753216.82 frames. ], batch size: 41, lr: 7.19e-03, grad_scale: 16.0 2024-09-23 15:42:26,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=291134.6666666667, ans=0.0 2024-09-23 15:42:29,566 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:42:34,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291181.3333333333, ans=0.1 2024-09-23 15:42:40,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=291181.3333333333, ans=0.1 2024-09-23 15:43:27,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=291321.3333333333, ans=0.0 2024-09-23 15:43:39,790 INFO [train.py:1198] (0/4) Epoch 17, batch 100, loss[loss=0.2875, ctc_loss=0.2072, cr_loss=0.4014, over 11589.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1552, cr_loss=0.3704, over 1321265.87 frames. ], batch size: 123, lr: 7.18e-03, grad_scale: 16.0 2024-09-23 15:43:42,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=15.0 2024-09-23 15:44:26,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=291508.0, ans=0.0 2024-09-23 15:44:31,115 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.243e+02 1.327e+02 1.449e+02 2.389e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-23 15:44:35,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.06 vs. limit=10.0 2024-09-23 15:44:59,989 INFO [train.py:1198] (0/4) Epoch 17, batch 150, loss[loss=0.2634, ctc_loss=0.1827, cr_loss=0.4035, over 16454.00 frames. ], tot_loss[loss=0.2292, ctc_loss=0.1551, cr_loss=0.3705, over 1758643.71 frames. ], batch size: 66, lr: 7.18e-03, grad_scale: 16.0 2024-09-23 15:45:08,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=291601.3333333333, ans=0.125 2024-09-23 15:45:14,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=291648.0, ans=0.02 2024-09-23 15:45:17,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=291648.0, ans=0.0 2024-09-23 15:45:25,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=291648.0, ans=0.125 2024-09-23 15:45:26,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=291648.0, ans=0.125 2024-09-23 15:45:26,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2024-09-23 15:45:41,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=291694.6666666667, ans=0.125 2024-09-23 15:45:46,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=291694.6666666667, ans=0.0 2024-09-23 15:45:46,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=291694.6666666667, ans=0.0 2024-09-23 15:45:50,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2024-09-23 15:45:57,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=291741.3333333333, ans=0.2 2024-09-23 15:46:15,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=291788.0, ans=0.0 2024-09-23 15:46:21,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2024-09-23 15:46:25,714 INFO [train.py:1198] (0/4) Epoch 17, batch 200, loss[loss=0.2252, ctc_loss=0.1503, cr_loss=0.3745, over 17300.00 frames. ], tot_loss[loss=0.2281, ctc_loss=0.1541, cr_loss=0.3698, over 2119862.53 frames. ], batch size: 46, lr: 7.18e-03, grad_scale: 16.0 2024-09-23 15:46:37,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=291834.6666666667, ans=0.2 2024-09-23 15:46:41,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=291881.3333333333, ans=0.125 2024-09-23 15:47:13,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291974.6666666667, ans=0.1 2024-09-23 15:47:18,310 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.306e+02 1.386e+02 1.614e+02 2.877e+02, threshold=2.773e+02, percent-clipped=1.0 2024-09-23 15:47:33,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=292021.3333333333, ans=0.2 2024-09-23 15:47:49,026 INFO [train.py:1198] (0/4) Epoch 17, batch 250, loss[loss=0.2189, ctc_loss=0.1465, cr_loss=0.3619, over 17304.00 frames. ], tot_loss[loss=0.2256, ctc_loss=0.1521, cr_loss=0.3678, over 2401042.11 frames. ], batch size: 51, lr: 7.18e-03, grad_scale: 16.0 2024-09-23 15:48:31,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=292161.3333333333, ans=0.07 2024-09-23 15:48:43,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=292208.0, ans=0.2 2024-09-23 15:48:57,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=292254.6666666667, ans=10.0 2024-09-23 15:49:07,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=292301.3333333333, ans=0.0 2024-09-23 15:49:08,855 INFO [train.py:1198] (0/4) Epoch 17, batch 300, loss[loss=0.2358, ctc_loss=0.1598, cr_loss=0.3802, over 16881.00 frames. ], tot_loss[loss=0.2242, ctc_loss=0.1509, cr_loss=0.3667, over 2623654.37 frames. ], batch size: 58, lr: 7.17e-03, grad_scale: 16.0 2024-09-23 15:49:12,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.90 vs. limit=15.0 2024-09-23 15:49:28,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=292348.0, ans=0.125 2024-09-23 15:49:58,291 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.20 vs. limit=10.0 2024-09-23 15:50:00,584 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.265e+02 1.346e+02 1.452e+02 2.269e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-23 15:50:00,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=292441.3333333333, ans=0.0 2024-09-23 15:50:00,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=292441.3333333333, ans=0.0 2024-09-23 15:50:09,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=292441.3333333333, ans=0.05 2024-09-23 15:50:10,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=292441.3333333333, ans=0.95 2024-09-23 15:50:10,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=292441.3333333333, ans=0.0 2024-09-23 15:50:11,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2024-09-23 15:50:32,874 INFO [train.py:1198] (0/4) Epoch 17, batch 350, loss[loss=0.2122, ctc_loss=0.1403, cr_loss=0.3597, over 17149.00 frames. ], tot_loss[loss=0.2234, ctc_loss=0.1502, cr_loss=0.366, over 2792922.18 frames. ], batch size: 48, lr: 7.17e-03, grad_scale: 16.0 2024-09-23 15:50:45,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=22.5 2024-09-23 15:50:54,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2024-09-23 15:50:57,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-09-23 15:51:28,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=292674.6666666667, ans=0.125 2024-09-23 15:51:57,139 INFO [train.py:1198] (0/4) Epoch 17, batch 400, loss[loss=0.2111, ctc_loss=0.1419, cr_loss=0.3464, over 17067.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1496, cr_loss=0.3652, over 2932538.94 frames. ], batch size: 46, lr: 7.17e-03, grad_scale: 32.0 2024-09-23 15:52:02,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-23 15:52:16,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.36 vs. limit=15.0 2024-09-23 15:52:38,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=292861.3333333333, ans=10.0 2024-09-23 15:52:45,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=292908.0, ans=0.0 2024-09-23 15:52:50,104 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.274e+02 1.356e+02 1.494e+02 2.535e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-23 15:52:56,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=292908.0, ans=0.025 2024-09-23 15:53:18,848 INFO [train.py:1198] (0/4) Epoch 17, batch 450, loss[loss=0.2395, ctc_loss=0.165, cr_loss=0.3722, over 17220.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.15, cr_loss=0.3655, over 3011177.55 frames. ], batch size: 55, lr: 7.16e-03, grad_scale: 32.0 2024-09-23 15:53:33,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=293048.0, ans=0.125 2024-09-23 15:54:17,299 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 15:54:29,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=293188.0, ans=0.125 2024-09-23 15:54:38,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=12.0 2024-09-23 15:54:39,067 INFO [train.py:1198] (0/4) Epoch 17, batch 500, loss[loss=0.2187, ctc_loss=0.1439, cr_loss=0.374, over 17184.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.15, cr_loss=0.3653, over 3087485.56 frames. ], batch size: 45, lr: 7.16e-03, grad_scale: 32.0 2024-09-23 15:54:44,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=293234.6666666667, ans=0.125 2024-09-23 15:54:58,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=293281.3333333333, ans=0.0 2024-09-23 15:55:05,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=293281.3333333333, ans=0.2 2024-09-23 15:55:08,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.45 vs. limit=15.0 2024-09-23 15:55:16,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=293328.0, ans=0.025 2024-09-23 15:55:32,746 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.258e+02 1.318e+02 1.426e+02 2.177e+02, threshold=2.636e+02, percent-clipped=0.0 2024-09-23 15:56:03,961 INFO [train.py:1198] (0/4) Epoch 17, batch 550, loss[loss=0.2123, ctc_loss=0.1409, cr_loss=0.357, over 17029.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1501, cr_loss=0.3659, over 3139551.48 frames. ], batch size: 51, lr: 7.16e-03, grad_scale: 32.0 2024-09-23 15:56:22,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2024-09-23 15:57:27,122 INFO [train.py:1198] (0/4) Epoch 17, batch 600, loss[loss=0.2256, ctc_loss=0.1503, cr_loss=0.3764, over 17029.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1507, cr_loss=0.367, over 3186152.28 frames. ], batch size: 39, lr: 7.16e-03, grad_scale: 32.0 2024-09-23 15:57:50,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2024-09-23 15:58:20,483 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.285e+02 1.384e+02 1.494e+02 2.358e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-23 15:58:37,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-09-23 15:58:40,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=293888.0, ans=0.0 2024-09-23 15:58:48,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293934.6666666667, ans=0.1 2024-09-23 15:58:49,413 INFO [train.py:1198] (0/4) Epoch 17, batch 650, loss[loss=0.2566, ctc_loss=0.1776, cr_loss=0.3954, over 16542.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1489, cr_loss=0.3632, over 3225543.88 frames. ], batch size: 66, lr: 7.15e-03, grad_scale: 32.0 2024-09-23 15:59:32,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=294028.0, ans=0.125 2024-09-23 15:59:36,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=294074.6666666667, ans=0.125 2024-09-23 16:00:03,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=294121.3333333333, ans=0.0 2024-09-23 16:00:09,841 INFO [train.py:1198] (0/4) Epoch 17, batch 700, loss[loss=0.2622, ctc_loss=0.1871, cr_loss=0.3757, over 11857.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.149, cr_loss=0.3632, over 3254409.92 frames. ], batch size: 123, lr: 7.15e-03, grad_scale: 32.0 2024-09-23 16:00:15,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=294168.0, ans=0.0 2024-09-23 16:00:31,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=294214.6666666667, ans=0.2 2024-09-23 16:01:06,301 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.243e+02 1.328e+02 1.463e+02 2.107e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-23 16:01:06,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=294308.0, ans=0.0 2024-09-23 16:01:22,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=294354.6666666667, ans=0.125 2024-09-23 16:01:34,904 INFO [train.py:1198] (0/4) Epoch 17, batch 750, loss[loss=0.2359, ctc_loss=0.161, cr_loss=0.3747, over 17140.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1495, cr_loss=0.3646, over 3282911.73 frames. ], batch size: 48, lr: 7.15e-03, grad_scale: 32.0 2024-09-23 16:01:50,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-23 16:01:55,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=294448.0, ans=0.125 2024-09-23 16:02:03,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=294448.0, ans=10.0 2024-09-23 16:02:08,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=294494.6666666667, ans=0.0 2024-09-23 16:02:17,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.48 vs. limit=22.5 2024-09-23 16:03:00,186 INFO [train.py:1198] (0/4) Epoch 17, batch 800, loss[loss=0.2147, ctc_loss=0.1426, cr_loss=0.3609, over 17017.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1503, cr_loss=0.3651, over 3293697.05 frames. ], batch size: 44, lr: 7.14e-03, grad_scale: 32.0 2024-09-23 16:03:02,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=294634.6666666667, ans=0.2 2024-09-23 16:03:10,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=294634.6666666667, ans=0.2 2024-09-23 16:03:11,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=294634.6666666667, ans=0.2 2024-09-23 16:03:49,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=294774.6666666667, ans=0.125 2024-09-23 16:03:51,161 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.255e+02 1.363e+02 1.453e+02 2.011e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-23 16:04:01,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=294774.6666666667, ans=10.0 2024-09-23 16:04:15,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=294821.3333333333, ans=0.125 2024-09-23 16:04:19,628 INFO [train.py:1198] (0/4) Epoch 17, batch 850, loss[loss=0.2137, ctc_loss=0.1447, cr_loss=0.3451, over 17151.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1504, cr_loss=0.3653, over 3312948.72 frames. ], batch size: 48, lr: 7.14e-03, grad_scale: 32.0 2024-09-23 16:04:26,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=294868.0, ans=0.125 2024-09-23 16:04:27,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-23 16:04:48,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=294914.6666666667, ans=0.0 2024-09-23 16:04:58,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=294961.3333333333, ans=0.1 2024-09-23 16:05:08,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=12.0 2024-09-23 16:05:39,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=295054.6666666667, ans=0.125 2024-09-23 16:05:41,832 INFO [train.py:1198] (0/4) Epoch 17, batch 900, loss[loss=0.1998, ctc_loss=0.1344, cr_loss=0.3271, over 16948.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.15, cr_loss=0.364, over 3332005.03 frames. ], batch size: 42, lr: 7.14e-03, grad_scale: 32.0 2024-09-23 16:06:34,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=295241.3333333333, ans=0.125 2024-09-23 16:06:34,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=15.0 2024-09-23 16:06:35,671 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.252e+02 1.345e+02 1.506e+02 2.061e+02, threshold=2.689e+02, percent-clipped=0.0 2024-09-23 16:06:51,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=295288.0, ans=0.125 2024-09-23 16:06:54,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=295288.0, ans=0.1 2024-09-23 16:07:02,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=295288.0, ans=0.125 2024-09-23 16:07:07,053 INFO [train.py:1198] (0/4) Epoch 17, batch 950, loss[loss=0.2682, ctc_loss=0.1831, cr_loss=0.4256, over 15093.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1502, cr_loss=0.3649, over 3341520.93 frames. ], batch size: 89, lr: 7.14e-03, grad_scale: 32.0 2024-09-23 16:07:23,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=295381.3333333333, ans=0.0 2024-09-23 16:07:23,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=295381.3333333333, ans=0.125 2024-09-23 16:07:34,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=295381.3333333333, ans=0.0 2024-09-23 16:07:49,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=295428.0, ans=0.0 2024-09-23 16:08:29,165 INFO [train.py:1198] (0/4) Epoch 17, batch 1000, loss[loss=0.1826, ctc_loss=0.1211, cr_loss=0.3077, over 16976.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1495, cr_loss=0.3637, over 3352115.03 frames. ], batch size: 42, lr: 7.13e-03, grad_scale: 32.0 2024-09-23 16:08:45,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=295614.6666666667, ans=0.05 2024-09-23 16:08:50,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=12.0 2024-09-23 16:09:19,593 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.278e+02 1.364e+02 1.541e+02 2.196e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 16:09:37,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295754.6666666667, ans=0.1 2024-09-23 16:09:47,781 INFO [train.py:1198] (0/4) Epoch 17, batch 1050, loss[loss=0.2282, ctc_loss=0.1511, cr_loss=0.3857, over 17320.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1501, cr_loss=0.3648, over 3353193.46 frames. ], batch size: 46, lr: 7.13e-03, grad_scale: 32.0 2024-09-23 16:10:40,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=295941.3333333333, ans=0.2 2024-09-23 16:10:50,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2024-09-23 16:11:12,826 INFO [train.py:1198] (0/4) Epoch 17, batch 1100, loss[loss=0.2329, ctc_loss=0.1555, cr_loss=0.3868, over 17160.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1493, cr_loss=0.3636, over 3354222.63 frames. ], batch size: 45, lr: 7.13e-03, grad_scale: 32.0 2024-09-23 16:11:44,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=296081.3333333333, ans=0.1 2024-09-23 16:11:49,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296128.0, ans=0.1 2024-09-23 16:11:54,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=296128.0, ans=0.125 2024-09-23 16:11:57,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=296128.0, ans=0.0 2024-09-23 16:12:06,483 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.273e+02 1.407e+02 1.625e+02 2.289e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-23 16:12:35,107 INFO [train.py:1198] (0/4) Epoch 17, batch 1150, loss[loss=0.2534, ctc_loss=0.1731, cr_loss=0.4012, over 17019.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1501, cr_loss=0.3644, over 3351513.48 frames. ], batch size: 52, lr: 7.13e-03, grad_scale: 32.0 2024-09-23 16:12:43,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296268.0, ans=0.1 2024-09-23 16:13:00,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=296314.6666666667, ans=0.0 2024-09-23 16:13:02,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.96 vs. limit=10.0 2024-09-23 16:13:53,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=296454.6666666667, ans=0.125 2024-09-23 16:13:53,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=296454.6666666667, ans=0.125 2024-09-23 16:13:57,590 INFO [train.py:1198] (0/4) Epoch 17, batch 1200, loss[loss=0.1933, ctc_loss=0.1256, cr_loss=0.3387, over 17124.00 frames. ], tot_loss[loss=0.2233, ctc_loss=0.1503, cr_loss=0.3649, over 3348378.25 frames. ], batch size: 40, lr: 7.12e-03, grad_scale: 32.0 2024-09-23 16:14:29,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=296594.6666666667, ans=0.0 2024-09-23 16:14:46,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=296641.3333333333, ans=0.0 2024-09-23 16:14:49,696 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.250e+02 1.332e+02 1.452e+02 2.634e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-23 16:15:04,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=296688.0, ans=0.125 2024-09-23 16:15:19,279 INFO [train.py:1198] (0/4) Epoch 17, batch 1250, loss[loss=0.2086, ctc_loss=0.1384, cr_loss=0.3513, over 17043.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.15, cr_loss=0.3645, over 3350846.31 frames. ], batch size: 39, lr: 7.12e-03, grad_scale: 32.0 2024-09-23 16:15:51,692 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.79 vs. limit=22.5 2024-09-23 16:15:59,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=296828.0, ans=0.0 2024-09-23 16:16:00,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296828.0, ans=0.125 2024-09-23 16:16:11,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=296874.6666666667, ans=0.0 2024-09-23 16:16:23,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=296874.6666666667, ans=0.125 2024-09-23 16:16:35,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-09-23 16:16:42,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2024-09-23 16:16:44,492 INFO [train.py:1198] (0/4) Epoch 17, batch 1300, loss[loss=0.2516, ctc_loss=0.1713, cr_loss=0.4012, over 17142.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1496, cr_loss=0.3639, over 3362333.72 frames. ], batch size: 48, lr: 7.12e-03, grad_scale: 32.0 2024-09-23 16:17:11,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=297014.6666666667, ans=0.125 2024-09-23 16:17:15,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=297061.3333333333, ans=0.125 2024-09-23 16:17:30,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=297108.0, ans=0.0 2024-09-23 16:17:33,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=297108.0, ans=0.0 2024-09-23 16:17:36,805 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.242e+02 1.322e+02 1.445e+02 3.373e+02, threshold=2.644e+02, percent-clipped=1.0 2024-09-23 16:17:47,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=297108.0, ans=0.025 2024-09-23 16:18:01,212 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2024-09-23 16:18:06,584 INFO [train.py:1198] (0/4) Epoch 17, batch 1350, loss[loss=0.2128, ctc_loss=0.1431, cr_loss=0.3487, over 17351.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1494, cr_loss=0.364, over 3365618.77 frames. ], batch size: 48, lr: 7.11e-03, grad_scale: 32.0 2024-09-23 16:18:06,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=297201.3333333333, ans=0.025 2024-09-23 16:18:06,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=297201.3333333333, ans=0.125 2024-09-23 16:18:10,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=297201.3333333333, ans=0.125 2024-09-23 16:18:16,563 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:19:26,085 INFO [train.py:1198] (0/4) Epoch 17, batch 1400, loss[loss=0.215, ctc_loss=0.1424, cr_loss=0.3629, over 17205.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1492, cr_loss=0.3645, over 3370703.91 frames. ], batch size: 47, lr: 7.11e-03, grad_scale: 32.0 2024-09-23 16:19:45,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=297481.3333333333, ans=0.125 2024-09-23 16:20:17,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=297574.6666666667, ans=0.125 2024-09-23 16:20:20,766 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.246e+02 1.345e+02 1.562e+02 2.130e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-23 16:20:47,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297621.3333333333, ans=0.125 2024-09-23 16:20:50,355 INFO [train.py:1198] (0/4) Epoch 17, batch 1450, loss[loss=0.2163, ctc_loss=0.1474, cr_loss=0.3447, over 17052.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1489, cr_loss=0.3632, over 3361735.27 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2024-09-23 16:20:52,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297668.0, ans=0.1 2024-09-23 16:20:58,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297668.0, ans=0.125 2024-09-23 16:21:08,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.76 vs. limit=10.0 2024-09-23 16:22:01,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=297854.6666666667, ans=0.0 2024-09-23 16:22:02,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-09-23 16:22:03,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=297854.6666666667, ans=0.2 2024-09-23 16:22:09,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=297854.6666666667, ans=0.125 2024-09-23 16:22:12,532 INFO [train.py:1198] (0/4) Epoch 17, batch 1500, loss[loss=0.1793, ctc_loss=0.1182, cr_loss=0.3056, over 17017.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1496, cr_loss=0.3639, over 3358492.03 frames. ], batch size: 39, lr: 7.11e-03, grad_scale: 32.0 2024-09-23 16:22:38,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=297948.0, ans=0.025 2024-09-23 16:23:07,241 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.004e+02 1.245e+02 1.341e+02 1.437e+02 3.249e+02, threshold=2.682e+02, percent-clipped=1.0 2024-09-23 16:23:18,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=298088.0, ans=0.0 2024-09-23 16:23:34,453 INFO [train.py:1198] (0/4) Epoch 17, batch 1550, loss[loss=0.197, ctc_loss=0.129, cr_loss=0.34, over 16312.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1502, cr_loss=0.364, over 3350975.19 frames. ], batch size: 36, lr: 7.10e-03, grad_scale: 32.0 2024-09-23 16:24:02,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-23 16:24:41,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=298321.3333333333, ans=0.0 2024-09-23 16:24:54,220 INFO [train.py:1198] (0/4) Epoch 17, batch 1600, loss[loss=0.1988, ctc_loss=0.13, cr_loss=0.3437, over 17025.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1501, cr_loss=0.3646, over 3349895.20 frames. ], batch size: 44, lr: 7.10e-03, grad_scale: 32.0 2024-09-23 16:24:57,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=298368.0, ans=0.125 2024-09-23 16:25:20,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298414.6666666667, ans=0.1 2024-09-23 16:25:25,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=298414.6666666667, ans=0.125 2024-09-23 16:25:33,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=298461.3333333333, ans=0.0 2024-09-23 16:25:51,724 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.003e+02 1.268e+02 1.375e+02 1.538e+02 2.240e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 16:26:00,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=298508.0, ans=0.0 2024-09-23 16:26:12,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=298554.6666666667, ans=0.95 2024-09-23 16:26:17,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=298601.3333333333, ans=0.125 2024-09-23 16:26:18,870 INFO [train.py:1198] (0/4) Epoch 17, batch 1650, loss[loss=0.2451, ctc_loss=0.1638, cr_loss=0.4063, over 17024.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1497, cr_loss=0.3645, over 3364864.92 frames. ], batch size: 53, lr: 7.10e-03, grad_scale: 32.0 2024-09-23 16:26:29,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=298601.3333333333, ans=0.125 2024-09-23 16:26:42,464 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-64000.pt 2024-09-23 16:26:54,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=298694.6666666667, ans=0.2 2024-09-23 16:27:29,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2024-09-23 16:27:45,762 INFO [train.py:1198] (0/4) Epoch 17, batch 1700, loss[loss=0.2001, ctc_loss=0.1337, cr_loss=0.332, over 17285.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.149, cr_loss=0.3628, over 3355991.22 frames. ], batch size: 46, lr: 7.09e-03, grad_scale: 32.0 2024-09-23 16:27:50,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=15.0 2024-09-23 16:27:55,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=298834.6666666667, ans=0.0 2024-09-23 16:28:06,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=298881.3333333333, ans=0.125 2024-09-23 16:28:07,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-09-23 16:28:11,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=298881.3333333333, ans=0.0 2024-09-23 16:28:29,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=298928.0, ans=0.025 2024-09-23 16:28:37,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=298974.6666666667, ans=0.2 2024-09-23 16:28:37,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=298974.6666666667, ans=0.025 2024-09-23 16:28:38,520 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.237e+02 1.323e+02 1.443e+02 1.876e+02, threshold=2.646e+02, percent-clipped=0.0 2024-09-23 16:28:40,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=298974.6666666667, ans=0.1 2024-09-23 16:28:43,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=298974.6666666667, ans=0.0 2024-09-23 16:28:45,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298974.6666666667, ans=0.1 2024-09-23 16:28:56,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=299021.3333333333, ans=0.035 2024-09-23 16:29:05,469 INFO [train.py:1198] (0/4) Epoch 17, batch 1750, loss[loss=0.1756, ctc_loss=0.1156, cr_loss=0.3, over 16981.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1488, cr_loss=0.3625, over 3354945.86 frames. ], batch size: 42, lr: 7.09e-03, grad_scale: 32.0 2024-09-23 16:29:13,652 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:29:36,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=299161.3333333333, ans=0.125 2024-09-23 16:29:39,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=299161.3333333333, ans=0.125 2024-09-23 16:29:39,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=299161.3333333333, ans=0.125 2024-09-23 16:30:18,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=299254.6666666667, ans=0.0 2024-09-23 16:30:24,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=299254.6666666667, ans=0.125 2024-09-23 16:30:26,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=299301.3333333333, ans=0.125 2024-09-23 16:30:27,808 INFO [train.py:1198] (0/4) Epoch 17, batch 1800, loss[loss=0.1876, ctc_loss=0.1232, cr_loss=0.3221, over 17261.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1486, cr_loss=0.3621, over 3357171.49 frames. ], batch size: 42, lr: 7.09e-03, grad_scale: 32.0 2024-09-23 16:31:23,057 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.259e+02 1.337e+02 1.488e+02 2.205e+02, threshold=2.673e+02, percent-clipped=0.0 2024-09-23 16:31:26,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=299441.3333333333, ans=0.125 2024-09-23 16:31:52,512 INFO [train.py:1198] (0/4) Epoch 17, batch 1850, loss[loss=0.2357, ctc_loss=0.1592, cr_loss=0.3826, over 17269.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1491, cr_loss=0.3625, over 3350378.92 frames. ], batch size: 44, lr: 7.09e-03, grad_scale: 32.0 2024-09-23 16:33:02,546 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:33:07,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=299721.3333333333, ans=0.125 2024-09-23 16:33:11,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=299721.3333333333, ans=0.1 2024-09-23 16:33:14,616 INFO [train.py:1198] (0/4) Epoch 17, batch 1900, loss[loss=0.2689, ctc_loss=0.1894, cr_loss=0.3972, over 11572.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1494, cr_loss=0.3627, over 3350604.20 frames. ], batch size: 123, lr: 7.08e-03, grad_scale: 32.0 2024-09-23 16:33:21,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.04 vs. limit=10.0 2024-09-23 16:33:25,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=299768.0, ans=0.0 2024-09-23 16:33:27,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=299768.0, ans=0.125 2024-09-23 16:33:43,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-09-23 16:33:59,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.09 vs. limit=10.0 2024-09-23 16:34:06,710 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.256e+02 1.309e+02 1.429e+02 1.873e+02, threshold=2.618e+02, percent-clipped=0.0 2024-09-23 16:34:07,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=299908.0, ans=0.0 2024-09-23 16:34:10,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=299908.0, ans=0.5 2024-09-23 16:34:24,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=299954.6666666667, ans=0.0 2024-09-23 16:34:33,599 INFO [train.py:1198] (0/4) Epoch 17, batch 1950, loss[loss=0.2229, ctc_loss=0.1504, cr_loss=0.3624, over 17019.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1496, cr_loss=0.3637, over 3360762.26 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 16.0 2024-09-23 16:34:46,572 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:34:49,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=300048.0, ans=0.0 2024-09-23 16:34:51,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=300048.0, ans=0.125 2024-09-23 16:35:25,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=300141.3333333333, ans=0.2 2024-09-23 16:35:41,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2024-09-23 16:35:47,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=300188.0, ans=0.0 2024-09-23 16:35:58,734 INFO [train.py:1198] (0/4) Epoch 17, batch 2000, loss[loss=0.2352, ctc_loss=0.1587, cr_loss=0.3823, over 17288.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1496, cr_loss=0.3635, over 3359930.32 frames. ], batch size: 46, lr: 7.08e-03, grad_scale: 32.0 2024-09-23 16:36:08,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=300234.6666666667, ans=0.2 2024-09-23 16:36:13,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=300281.3333333333, ans=0.125 2024-09-23 16:36:14,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=300281.3333333333, ans=0.125 2024-09-23 16:36:38,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=300328.0, ans=0.07 2024-09-23 16:36:38,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=300328.0, ans=0.05 2024-09-23 16:36:42,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-09-23 16:36:55,627 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.280e+02 1.363e+02 1.514e+02 2.601e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-23 16:37:21,171 INFO [train.py:1198] (0/4) Epoch 17, batch 2050, loss[loss=0.1969, ctc_loss=0.1305, cr_loss=0.3322, over 17092.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1498, cr_loss=0.364, over 3365192.35 frames. ], batch size: 43, lr: 7.08e-03, grad_scale: 32.0 2024-09-23 16:38:06,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=300561.3333333333, ans=0.125 2024-09-23 16:38:06,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=300561.3333333333, ans=0.125 2024-09-23 16:38:07,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=15.0 2024-09-23 16:38:35,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=300654.6666666667, ans=0.035 2024-09-23 16:38:43,057 INFO [train.py:1198] (0/4) Epoch 17, batch 2100, loss[loss=0.2325, ctc_loss=0.1592, cr_loss=0.3664, over 16105.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1492, cr_loss=0.3625, over 3356775.68 frames. ], batch size: 74, lr: 7.07e-03, grad_scale: 32.0 2024-09-23 16:38:43,469 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:38:44,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300701.3333333333, ans=0.1 2024-09-23 16:38:56,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=300701.3333333333, ans=0.125 2024-09-23 16:39:10,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=300748.0, ans=0.0 2024-09-23 16:39:29,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=300841.3333333333, ans=0.0 2024-09-23 16:39:37,337 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.012e+02 1.282e+02 1.378e+02 1.629e+02 2.500e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 16:40:05,320 INFO [train.py:1198] (0/4) Epoch 17, batch 2150, loss[loss=0.2051, ctc_loss=0.1372, cr_loss=0.3398, over 17278.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1494, cr_loss=0.3625, over 3351665.65 frames. ], batch size: 46, lr: 7.07e-03, grad_scale: 32.0 2024-09-23 16:40:33,653 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:40:51,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=301028.0, ans=0.125 2024-09-23 16:40:57,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=15.0 2024-09-23 16:41:10,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=301121.3333333333, ans=0.125 2024-09-23 16:41:11,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=301121.3333333333, ans=0.125 2024-09-23 16:41:29,569 INFO [train.py:1198] (0/4) Epoch 17, batch 2200, loss[loss=0.2684, ctc_loss=0.1849, cr_loss=0.4174, over 16751.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1496, cr_loss=0.363, over 3354698.97 frames. ], batch size: 61, lr: 7.07e-03, grad_scale: 32.0 2024-09-23 16:41:50,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=301214.6666666667, ans=0.2 2024-09-23 16:41:50,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=301214.6666666667, ans=10.0 2024-09-23 16:41:59,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=301261.3333333333, ans=0.125 2024-09-23 16:42:23,223 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.227e+02 1.315e+02 1.424e+02 2.310e+02, threshold=2.629e+02, percent-clipped=0.0 2024-09-23 16:42:51,585 INFO [train.py:1198] (0/4) Epoch 17, batch 2250, loss[loss=0.2423, ctc_loss=0.1653, cr_loss=0.385, over 17035.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1494, cr_loss=0.3632, over 3353380.78 frames. ], batch size: 52, lr: 7.07e-03, grad_scale: 32.0 2024-09-23 16:43:03,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=12.0 2024-09-23 16:43:07,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=301448.0, ans=0.125 2024-09-23 16:43:07,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=301448.0, ans=0.125 2024-09-23 16:43:14,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=301448.0, ans=0.2 2024-09-23 16:43:22,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=301494.6666666667, ans=0.125 2024-09-23 16:43:23,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=301494.6666666667, ans=0.125 2024-09-23 16:43:29,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=301494.6666666667, ans=0.125 2024-09-23 16:43:53,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=301588.0, ans=0.025 2024-09-23 16:43:54,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-09-23 16:43:56,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=22.5 2024-09-23 16:43:57,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-09-23 16:44:11,232 INFO [train.py:1198] (0/4) Epoch 17, batch 2300, loss[loss=0.2209, ctc_loss=0.1481, cr_loss=0.3639, over 17050.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1493, cr_loss=0.3629, over 3343714.28 frames. ], batch size: 52, lr: 7.06e-03, grad_scale: 32.0 2024-09-23 16:44:16,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=301634.6666666667, ans=0.0 2024-09-23 16:44:19,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=301634.6666666667, ans=0.125 2024-09-23 16:44:56,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=301728.0, ans=0.125 2024-09-23 16:45:08,138 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.278e+02 1.376e+02 1.551e+02 2.468e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 16:45:08,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=301774.6666666667, ans=0.0 2024-09-23 16:45:11,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=301774.6666666667, ans=0.125 2024-09-23 16:45:13,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=301774.6666666667, ans=0.125 2024-09-23 16:45:28,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=301821.3333333333, ans=0.125 2024-09-23 16:45:35,977 INFO [train.py:1198] (0/4) Epoch 17, batch 2350, loss[loss=0.2315, ctc_loss=0.1584, cr_loss=0.3658, over 16029.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1488, cr_loss=0.3622, over 3355702.99 frames. ], batch size: 74, lr: 7.06e-03, grad_scale: 32.0 2024-09-23 16:45:39,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=301868.0, ans=0.125 2024-09-23 16:45:47,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301868.0, ans=0.1 2024-09-23 16:46:25,131 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2024-09-23 16:46:31,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=302008.0, ans=0.125 2024-09-23 16:46:42,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=302054.6666666667, ans=0.125 2024-09-23 16:46:50,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=15.0 2024-09-23 16:46:53,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=302054.6666666667, ans=0.0 2024-09-23 16:46:57,672 INFO [train.py:1198] (0/4) Epoch 17, batch 2400, loss[loss=0.2738, ctc_loss=0.1965, cr_loss=0.3865, over 12296.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1496, cr_loss=0.3637, over 3339120.07 frames. ], batch size: 123, lr: 7.06e-03, grad_scale: 32.0 2024-09-23 16:47:17,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2024-09-23 16:47:21,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=302148.0, ans=0.0 2024-09-23 16:47:54,481 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.300e+02 1.437e+02 1.590e+02 2.245e+02, threshold=2.874e+02, percent-clipped=0.0 2024-09-23 16:48:19,881 INFO [train.py:1198] (0/4) Epoch 17, batch 2450, loss[loss=0.2307, ctc_loss=0.1569, cr_loss=0.3692, over 16750.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1496, cr_loss=0.3637, over 3340389.97 frames. ], batch size: 61, lr: 7.05e-03, grad_scale: 32.0 2024-09-23 16:48:28,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=302334.6666666667, ans=0.125 2024-09-23 16:48:36,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2024-09-23 16:48:39,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=302381.3333333333, ans=0.2 2024-09-23 16:48:46,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=302381.3333333333, ans=0.125 2024-09-23 16:49:10,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=302474.6666666667, ans=0.0 2024-09-23 16:49:29,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302521.3333333333, ans=0.1 2024-09-23 16:49:39,968 INFO [train.py:1198] (0/4) Epoch 17, batch 2500, loss[loss=0.2397, ctc_loss=0.1608, cr_loss=0.3946, over 17023.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1495, cr_loss=0.3642, over 3348676.48 frames. ], batch size: 51, lr: 7.05e-03, grad_scale: 32.0 2024-09-23 16:49:49,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=302568.0, ans=0.2 2024-09-23 16:50:20,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=302661.3333333333, ans=0.125 2024-09-23 16:50:28,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=302661.3333333333, ans=0.125 2024-09-23 16:50:39,595 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.277e+02 1.416e+02 1.598e+02 3.065e+02, threshold=2.832e+02, percent-clipped=1.0 2024-09-23 16:50:46,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=302708.0, ans=0.125 2024-09-23 16:51:07,612 INFO [train.py:1198] (0/4) Epoch 17, batch 2550, loss[loss=0.1892, ctc_loss=0.1287, cr_loss=0.3023, over 16196.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.15, cr_loss=0.3641, over 3348463.00 frames. ], batch size: 36, lr: 7.05e-03, grad_scale: 32.0 2024-09-23 16:51:25,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=302848.0, ans=0.0 2024-09-23 16:51:29,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=302848.0, ans=15.0 2024-09-23 16:51:46,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=302894.6666666667, ans=10.0 2024-09-23 16:51:52,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=302894.6666666667, ans=0.0 2024-09-23 16:52:17,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=302988.0, ans=0.2 2024-09-23 16:52:29,692 INFO [train.py:1198] (0/4) Epoch 17, batch 2600, loss[loss=0.239, ctc_loss=0.1601, cr_loss=0.3943, over 17302.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1499, cr_loss=0.3642, over 3341941.11 frames. ], batch size: 46, lr: 7.05e-03, grad_scale: 32.0 2024-09-23 16:53:00,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=303128.0, ans=0.0 2024-09-23 16:53:05,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=303128.0, ans=0.125 2024-09-23 16:53:14,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=303128.0, ans=0.07 2024-09-23 16:53:16,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303174.6666666667, ans=0.1 2024-09-23 16:53:23,708 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.309e+02 1.432e+02 1.509e+02 2.078e+02, threshold=2.863e+02, percent-clipped=0.0 2024-09-23 16:53:24,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2024-09-23 16:53:27,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303174.6666666667, ans=0.1 2024-09-23 16:53:44,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303221.3333333333, ans=0.1 2024-09-23 16:53:49,177 INFO [train.py:1198] (0/4) Epoch 17, batch 2650, loss[loss=0.2341, ctc_loss=0.1593, cr_loss=0.3735, over 16998.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1498, cr_loss=0.3643, over 3347124.01 frames. ], batch size: 53, lr: 7.04e-03, grad_scale: 32.0 2024-09-23 16:53:51,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=303268.0, ans=0.0 2024-09-23 16:53:58,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303268.0, ans=0.1 2024-09-23 16:54:02,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=303268.0, ans=0.07 2024-09-23 16:54:11,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=303314.6666666667, ans=0.0 2024-09-23 16:54:24,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303361.3333333333, ans=0.1 2024-09-23 16:54:30,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=303361.3333333333, ans=0.0 2024-09-23 16:55:13,735 INFO [train.py:1198] (0/4) Epoch 17, batch 2700, loss[loss=0.2043, ctc_loss=0.1371, cr_loss=0.3362, over 17292.00 frames. ], tot_loss[loss=0.2231, ctc_loss=0.15, cr_loss=0.3653, over 3362038.36 frames. ], batch size: 46, lr: 7.04e-03, grad_scale: 32.0 2024-09-23 16:55:20,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.42 vs. limit=6.0 2024-09-23 16:55:29,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=303548.0, ans=0.125 2024-09-23 16:55:53,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2024-09-23 16:56:10,277 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.299e+02 1.395e+02 1.578e+02 3.213e+02, threshold=2.790e+02, percent-clipped=1.0 2024-09-23 16:56:15,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-09-23 16:56:23,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=303688.0, ans=0.0 2024-09-23 16:56:31,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=303688.0, ans=0.09899494936611666 2024-09-23 16:56:35,877 INFO [train.py:1198] (0/4) Epoch 17, batch 2750, loss[loss=0.236, ctc_loss=0.1576, cr_loss=0.3921, over 17010.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1494, cr_loss=0.3643, over 3369808.46 frames. ], batch size: 51, lr: 7.04e-03, grad_scale: 32.0 2024-09-23 16:56:46,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-23 16:57:11,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=303828.0, ans=0.0 2024-09-23 16:57:31,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=303874.6666666667, ans=0.0 2024-09-23 16:57:55,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=303921.3333333333, ans=0.0 2024-09-23 16:57:58,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.30 vs. limit=15.0 2024-09-23 16:57:58,358 INFO [train.py:1198] (0/4) Epoch 17, batch 2800, loss[loss=0.2231, ctc_loss=0.1525, cr_loss=0.3528, over 16798.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1492, cr_loss=0.3644, over 3372525.01 frames. ], batch size: 61, lr: 7.04e-03, grad_scale: 32.0 2024-09-23 16:58:12,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=304014.6666666667, ans=0.125 2024-09-23 16:58:27,056 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 16:58:53,780 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.311e+02 1.385e+02 1.466e+02 2.534e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-23 16:58:55,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=304108.0, ans=0.125 2024-09-23 16:59:13,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=304154.6666666667, ans=0.0 2024-09-23 16:59:17,882 INFO [train.py:1198] (0/4) Epoch 17, batch 2850, loss[loss=0.2119, ctc_loss=0.1452, cr_loss=0.3333, over 17056.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1492, cr_loss=0.3636, over 3365345.58 frames. ], batch size: 46, lr: 7.03e-03, grad_scale: 16.0 2024-09-23 16:59:20,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.93 vs. limit=10.0 2024-09-23 16:59:34,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304248.0, ans=0.1 2024-09-23 16:59:48,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=304248.0, ans=0.125 2024-09-23 16:59:52,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=304294.6666666667, ans=0.2 2024-09-23 17:00:14,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=304341.3333333333, ans=0.07 2024-09-23 17:00:28,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=304388.0, ans=0.035 2024-09-23 17:00:38,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=304388.0, ans=0.125 2024-09-23 17:00:42,927 INFO [train.py:1198] (0/4) Epoch 17, batch 2900, loss[loss=0.2201, ctc_loss=0.1446, cr_loss=0.3775, over 17130.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1491, cr_loss=0.3645, over 3376505.68 frames. ], batch size: 48, lr: 7.03e-03, grad_scale: 16.0 2024-09-23 17:00:46,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=304434.6666666667, ans=0.0 2024-09-23 17:01:16,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=304528.0, ans=0.125 2024-09-23 17:01:29,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304528.0, ans=0.1 2024-09-23 17:01:31,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.98 vs. limit=10.0 2024-09-23 17:01:35,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=304574.6666666667, ans=0.2 2024-09-23 17:01:42,988 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.263e+02 1.350e+02 1.442e+02 2.620e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-23 17:01:50,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.18 vs. limit=10.0 2024-09-23 17:02:05,521 INFO [train.py:1198] (0/4) Epoch 17, batch 2950, loss[loss=0.2444, ctc_loss=0.1679, cr_loss=0.3826, over 16985.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1486, cr_loss=0.3633, over 3368922.10 frames. ], batch size: 53, lr: 7.03e-03, grad_scale: 8.0 2024-09-23 17:02:08,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=304668.0, ans=0.0 2024-09-23 17:02:08,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=304668.0, ans=0.1 2024-09-23 17:02:15,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=304668.0, ans=0.2 2024-09-23 17:02:15,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=304668.0, ans=0.125 2024-09-23 17:02:32,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=22.5 2024-09-23 17:02:33,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=304714.6666666667, ans=0.125 2024-09-23 17:02:35,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=304714.6666666667, ans=0.125 2024-09-23 17:03:26,627 INFO [train.py:1198] (0/4) Epoch 17, batch 3000, loss[loss=0.281, ctc_loss=0.201, cr_loss=0.3998, over 11640.00 frames. ], tot_loss[loss=0.223, ctc_loss=0.1499, cr_loss=0.3654, over 3358590.01 frames. ], batch size: 123, lr: 7.02e-03, grad_scale: 8.0 2024-09-23 17:03:26,628 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 17:03:42,428 INFO [train.py:1230] (0/4) Epoch 17, validation: loss=0.0409, ctc_loss=0.0409, cr_loss=7.678e-15, over 944034.00 frames. 2024-09-23 17:03:42,429 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 17:03:52,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=304901.3333333333, ans=0.025 2024-09-23 17:04:32,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=305041.3333333333, ans=0.125 2024-09-23 17:04:38,120 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.276e+02 1.374e+02 1.513e+02 2.906e+02, threshold=2.749e+02, percent-clipped=1.0 2024-09-23 17:04:59,731 INFO [train.py:1198] (0/4) Epoch 17, batch 3050, loss[loss=0.2148, ctc_loss=0.143, cr_loss=0.3591, over 17118.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1491, cr_loss=0.3638, over 3347657.71 frames. ], batch size: 40, lr: 7.02e-03, grad_scale: 8.0 2024-09-23 17:05:10,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=305134.6666666667, ans=0.125 2024-09-23 17:05:12,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305134.6666666667, ans=0.1 2024-09-23 17:05:12,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=305134.6666666667, ans=0.0 2024-09-23 17:05:22,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=305181.3333333333, ans=0.125 2024-09-23 17:06:20,562 INFO [train.py:1198] (0/4) Epoch 17, batch 3100, loss[loss=0.2317, ctc_loss=0.1576, cr_loss=0.3705, over 16769.00 frames. ], tot_loss[loss=0.2209, ctc_loss=0.1484, cr_loss=0.3625, over 3344979.45 frames. ], batch size: 61, lr: 7.02e-03, grad_scale: 8.0 2024-09-23 17:06:20,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=305368.0, ans=0.125 2024-09-23 17:06:34,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=305414.6666666667, ans=0.1 2024-09-23 17:07:08,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=305508.0, ans=0.125 2024-09-23 17:07:11,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=305508.0, ans=0.125 2024-09-23 17:07:18,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=22.5 2024-09-23 17:07:19,081 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.257e+02 1.347e+02 1.447e+02 2.070e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-23 17:07:22,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=305508.0, ans=0.125 2024-09-23 17:07:27,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=305554.6666666667, ans=0.09899494936611666 2024-09-23 17:07:31,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=305554.6666666667, ans=0.125 2024-09-23 17:07:32,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=305554.6666666667, ans=0.2 2024-09-23 17:07:41,116 INFO [train.py:1198] (0/4) Epoch 17, batch 3150, loss[loss=0.2124, ctc_loss=0.1366, cr_loss=0.379, over 16938.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1486, cr_loss=0.3636, over 3349821.07 frames. ], batch size: 42, lr: 7.02e-03, grad_scale: 8.0 2024-09-23 17:07:43,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=305601.3333333333, ans=0.125 2024-09-23 17:07:46,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=305601.3333333333, ans=0.0 2024-09-23 17:08:20,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=305694.6666666667, ans=0.0 2024-09-23 17:08:27,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=305741.3333333333, ans=0.0 2024-09-23 17:08:54,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=305788.0, ans=0.0 2024-09-23 17:09:00,534 INFO [train.py:1198] (0/4) Epoch 17, batch 3200, loss[loss=0.2243, ctc_loss=0.1473, cr_loss=0.3852, over 17261.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.149, cr_loss=0.3643, over 3345554.81 frames. ], batch size: 44, lr: 7.01e-03, grad_scale: 16.0 2024-09-23 17:09:46,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2024-09-23 17:09:50,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.04 vs. limit=15.0 2024-09-23 17:09:56,502 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.224e+02 1.300e+02 1.391e+02 2.057e+02, threshold=2.599e+02, percent-clipped=0.0 2024-09-23 17:10:18,313 INFO [train.py:1198] (0/4) Epoch 17, batch 3250, loss[loss=0.1885, ctc_loss=0.1294, cr_loss=0.2958, over 16283.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1505, cr_loss=0.3663, over 3338624.16 frames. ], batch size: 36, lr: 7.01e-03, grad_scale: 16.0 2024-09-23 17:10:19,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.55 vs. limit=15.0 2024-09-23 17:10:40,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=306114.6666666667, ans=0.125 2024-09-23 17:10:40,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306114.6666666667, ans=0.1 2024-09-23 17:10:42,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=306114.6666666667, ans=0.125 2024-09-23 17:10:46,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=306114.6666666667, ans=0.2 2024-09-23 17:11:19,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=306254.6666666667, ans=0.125 2024-09-23 17:11:36,836 INFO [train.py:1198] (0/4) Epoch 17, batch 3300, loss[loss=0.2326, ctc_loss=0.1594, cr_loss=0.3659, over 17153.00 frames. ], tot_loss[loss=0.2244, ctc_loss=0.151, cr_loss=0.3669, over 3330544.49 frames. ], batch size: 48, lr: 7.01e-03, grad_scale: 16.0 2024-09-23 17:11:42,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306301.3333333333, ans=0.1 2024-09-23 17:11:53,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=306348.0, ans=0.0 2024-09-23 17:12:31,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=306441.3333333333, ans=0.025 2024-09-23 17:12:32,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2024-09-23 17:12:34,842 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.271e+02 1.388e+02 1.557e+02 2.598e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-23 17:12:41,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=306488.0, ans=0.125 2024-09-23 17:12:56,558 INFO [train.py:1198] (0/4) Epoch 17, batch 3350, loss[loss=0.228, ctc_loss=0.1542, cr_loss=0.3693, over 16955.00 frames. ], tot_loss[loss=0.224, ctc_loss=0.1507, cr_loss=0.3666, over 3344859.62 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 16.0 2024-09-23 17:13:03,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=306534.6666666667, ans=0.0 2024-09-23 17:13:07,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=306534.6666666667, ans=0.05 2024-09-23 17:13:43,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=306674.6666666667, ans=0.125 2024-09-23 17:14:00,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=306721.3333333333, ans=0.025 2024-09-23 17:14:14,501 INFO [train.py:1198] (0/4) Epoch 17, batch 3400, loss[loss=0.2207, ctc_loss=0.1484, cr_loss=0.3617, over 16736.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1504, cr_loss=0.3657, over 3349282.83 frames. ], batch size: 61, lr: 7.00e-03, grad_scale: 16.0 2024-09-23 17:14:41,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-09-23 17:14:42,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=306814.6666666667, ans=0.2 2024-09-23 17:14:49,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.10 vs. limit=15.0 2024-09-23 17:14:57,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=306861.3333333333, ans=0.125 2024-09-23 17:15:07,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.83 vs. limit=6.0 2024-09-23 17:15:09,980 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.230e+02 1.305e+02 1.432e+02 2.019e+02, threshold=2.610e+02, percent-clipped=0.0 2024-09-23 17:15:32,032 INFO [train.py:1198] (0/4) Epoch 17, batch 3450, loss[loss=0.2318, ctc_loss=0.1546, cr_loss=0.386, over 17016.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1498, cr_loss=0.3656, over 3357898.48 frames. ], batch size: 52, lr: 7.00e-03, grad_scale: 16.0 2024-09-23 17:15:35,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2024-09-23 17:15:36,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=307001.3333333333, ans=22.5 2024-09-23 17:15:38,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307001.3333333333, ans=0.1 2024-09-23 17:15:52,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=307048.0, ans=0.125 2024-09-23 17:16:12,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-09-23 17:16:47,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307188.0, ans=0.1 2024-09-23 17:16:48,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2024-09-23 17:16:53,995 INFO [train.py:1198] (0/4) Epoch 17, batch 3500, loss[loss=0.2341, ctc_loss=0.1611, cr_loss=0.3648, over 17103.00 frames. ], tot_loss[loss=0.2248, ctc_loss=0.1512, cr_loss=0.3684, over 3353502.78 frames. ], batch size: 49, lr: 7.00e-03, grad_scale: 16.0 2024-09-23 17:17:05,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=307234.6666666667, ans=0.125 2024-09-23 17:17:19,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=307281.3333333333, ans=0.1 2024-09-23 17:17:34,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=307328.0, ans=0.09899494936611666 2024-09-23 17:17:36,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=307328.0, ans=0.125 2024-09-23 17:17:46,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-09-23 17:17:47,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.57 vs. limit=15.0 2024-09-23 17:17:50,327 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.294e+02 1.368e+02 1.478e+02 3.708e+02, threshold=2.737e+02, percent-clipped=1.0 2024-09-23 17:18:12,058 INFO [train.py:1198] (0/4) Epoch 17, batch 3550, loss[loss=0.2479, ctc_loss=0.1702, cr_loss=0.3888, over 17009.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1499, cr_loss=0.3665, over 3357304.30 frames. ], batch size: 51, lr: 7.00e-03, grad_scale: 16.0 2024-09-23 17:18:20,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=307468.0, ans=0.125 2024-09-23 17:18:21,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=307468.0, ans=0.125 2024-09-23 17:18:28,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307514.6666666667, ans=0.1 2024-09-23 17:18:34,730 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:18:37,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=307514.6666666667, ans=0.0 2024-09-23 17:18:51,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307561.3333333333, ans=0.1 2024-09-23 17:18:54,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=307561.3333333333, ans=0.0 2024-09-23 17:19:10,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=307608.0, ans=0.0 2024-09-23 17:19:13,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=307608.0, ans=0.125 2024-09-23 17:19:31,531 INFO [train.py:1198] (0/4) Epoch 17, batch 3600, loss[loss=0.1847, ctc_loss=0.123, cr_loss=0.3081, over 16862.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1486, cr_loss=0.3645, over 3365869.34 frames. ], batch size: 37, lr: 6.99e-03, grad_scale: 16.0 2024-09-23 17:19:47,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=307748.0, ans=0.125 2024-09-23 17:19:55,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=307748.0, ans=0.125 2024-09-23 17:19:56,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=307748.0, ans=0.0 2024-09-23 17:20:14,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2024-09-23 17:20:17,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=307841.3333333333, ans=0.125 2024-09-23 17:20:29,251 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.229e+02 1.313e+02 1.436e+02 1.870e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-23 17:20:32,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=307888.0, ans=0.5 2024-09-23 17:20:49,556 INFO [train.py:1198] (0/4) Epoch 17, batch 3650, loss[loss=0.226, ctc_loss=0.1521, cr_loss=0.3695, over 17169.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1491, cr_loss=0.3643, over 3361662.22 frames. ], batch size: 45, lr: 6.99e-03, grad_scale: 16.0 2024-09-23 17:20:59,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=307934.6666666667, ans=0.0 2024-09-23 17:21:07,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=307981.3333333333, ans=0.2 2024-09-23 17:21:10,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=307981.3333333333, ans=0.0 2024-09-23 17:21:16,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=307981.3333333333, ans=0.04949747468305833 2024-09-23 17:21:19,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=308028.0, ans=0.07 2024-09-23 17:21:19,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=308028.0, ans=0.07 2024-09-23 17:21:19,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=308028.0, ans=0.125 2024-09-23 17:21:20,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=308028.0, ans=0.025 2024-09-23 17:21:31,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308028.0, ans=0.1 2024-09-23 17:21:46,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=308074.6666666667, ans=0.125 2024-09-23 17:22:04,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=308121.3333333333, ans=0.2 2024-09-23 17:22:10,753 INFO [train.py:1198] (0/4) Epoch 17, batch 3700, loss[loss=0.1626, ctc_loss=0.1037, cr_loss=0.2945, over 17257.00 frames. ], tot_loss[loss=0.2218, ctc_loss=0.1491, cr_loss=0.3636, over 3351066.41 frames. ], batch size: 42, lr: 6.99e-03, grad_scale: 16.0 2024-09-23 17:22:11,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=308168.0, ans=0.025 2024-09-23 17:23:07,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=308308.0, ans=0.025 2024-09-23 17:23:08,762 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.332e+02 1.441e+02 1.620e+02 2.318e+02, threshold=2.882e+02, percent-clipped=0.0 2024-09-23 17:23:19,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=308354.6666666667, ans=0.1 2024-09-23 17:23:28,834 INFO [train.py:1198] (0/4) Epoch 17, batch 3750, loss[loss=0.2037, ctc_loss=0.1361, cr_loss=0.3376, over 17138.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1497, cr_loss=0.364, over 3341109.86 frames. ], batch size: 48, lr: 6.99e-03, grad_scale: 16.0 2024-09-23 17:23:35,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=308401.3333333333, ans=0.125 2024-09-23 17:24:13,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308494.6666666667, ans=0.1 2024-09-23 17:24:16,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=308541.3333333333, ans=0.2 2024-09-23 17:24:28,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=308541.3333333333, ans=0.125 2024-09-23 17:24:31,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=308588.0, ans=0.2 2024-09-23 17:24:35,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308588.0, ans=0.1 2024-09-23 17:24:40,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.62 vs. limit=6.0 2024-09-23 17:24:46,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=308634.6666666667, ans=0.2 2024-09-23 17:24:47,308 INFO [train.py:1198] (0/4) Epoch 17, batch 3800, loss[loss=0.2274, ctc_loss=0.1525, cr_loss=0.3742, over 16547.00 frames. ], tot_loss[loss=0.2237, ctc_loss=0.1508, cr_loss=0.3648, over 3317304.86 frames. ], batch size: 66, lr: 6.98e-03, grad_scale: 16.0 2024-09-23 17:24:51,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-23 17:25:22,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=308728.0, ans=0.125 2024-09-23 17:25:30,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=308728.0, ans=0.0 2024-09-23 17:25:45,745 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.280e+02 1.358e+02 1.516e+02 2.710e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-23 17:26:00,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=308821.3333333333, ans=0.125 2024-09-23 17:26:05,960 INFO [train.py:1198] (0/4) Epoch 17, batch 3850, loss[loss=0.2712, ctc_loss=0.1977, cr_loss=0.3673, over 11625.00 frames. ], tot_loss[loss=0.2255, ctc_loss=0.1522, cr_loss=0.3665, over 3282870.26 frames. ], batch size: 123, lr: 6.98e-03, grad_scale: 16.0 2024-09-23 17:26:07,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=308868.0, ans=0.0 2024-09-23 17:26:13,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=308868.0, ans=0.07 2024-09-23 17:26:41,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=308961.3333333333, ans=0.125 2024-09-23 17:27:10,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=309054.6666666667, ans=0.125 2024-09-23 17:27:15,513 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-17.pt 2024-09-23 17:28:06,785 INFO [train.py:1198] (0/4) Epoch 18, batch 0, loss[loss=0.2093, ctc_loss=0.1363, cr_loss=0.3653, over 17364.00 frames. ], tot_loss[loss=0.2093, ctc_loss=0.1363, cr_loss=0.3653, over 17364.00 frames. ], batch size: 48, lr: 6.78e-03, grad_scale: 32.0 2024-09-23 17:28:06,786 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 17:28:21,935 INFO [train.py:1230] (0/4) Epoch 18, validation: loss=0.03994, ctc_loss=0.03994, cr_loss=8.27e-15, over 944034.00 frames. 2024-09-23 17:28:21,936 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 17:28:26,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=309082.6666666667, ans=0.035 2024-09-23 17:28:54,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.73 vs. limit=22.5 2024-09-23 17:29:08,526 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:29:28,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=309269.3333333333, ans=0.125 2024-09-23 17:29:30,231 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.283e+02 1.487e+02 1.642e+02 2.774e+02, threshold=2.974e+02, percent-clipped=1.0 2024-09-23 17:29:44,551 INFO [train.py:1198] (0/4) Epoch 18, batch 50, loss[loss=0.1752, ctc_loss=0.1147, cr_loss=0.3028, over 17090.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1475, cr_loss=0.3602, over 762494.90 frames. ], batch size: 43, lr: 6.78e-03, grad_scale: 32.0 2024-09-23 17:29:55,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=309316.0, ans=0.125 2024-09-23 17:30:11,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309362.6666666667, ans=0.1 2024-09-23 17:30:39,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=309456.0, ans=0.05 2024-09-23 17:30:59,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=309502.6666666667, ans=0.125 2024-09-23 17:31:06,912 INFO [train.py:1198] (0/4) Epoch 18, batch 100, loss[loss=0.1861, ctc_loss=0.1227, cr_loss=0.317, over 16731.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1473, cr_loss=0.3592, over 1327418.02 frames. ], batch size: 37, lr: 6.77e-03, grad_scale: 16.0 2024-09-23 17:31:13,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=309549.3333333333, ans=0.04949747468305833 2024-09-23 17:31:50,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=309642.6666666667, ans=0.0 2024-09-23 17:32:07,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=309689.3333333333, ans=0.0 2024-09-23 17:32:09,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=309736.0, ans=0.0 2024-09-23 17:32:13,779 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.252e+02 1.337e+02 1.407e+02 3.310e+02, threshold=2.674e+02, percent-clipped=1.0 2024-09-23 17:32:23,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=309736.0, ans=0.125 2024-09-23 17:32:28,262 INFO [train.py:1198] (0/4) Epoch 18, batch 150, loss[loss=0.2366, ctc_loss=0.1619, cr_loss=0.3734, over 14863.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1476, cr_loss=0.3619, over 1783059.36 frames. ], batch size: 89, lr: 6.77e-03, grad_scale: 16.0 2024-09-23 17:32:49,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=309829.3333333333, ans=0.1 2024-09-23 17:33:16,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=309876.0, ans=0.035 2024-09-23 17:33:39,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=309969.3333333333, ans=0.2 2024-09-23 17:33:50,779 INFO [train.py:1198] (0/4) Epoch 18, batch 200, loss[loss=0.2054, ctc_loss=0.1347, cr_loss=0.3533, over 17099.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1472, cr_loss=0.3618, over 2140241.53 frames. ], batch size: 49, lr: 6.77e-03, grad_scale: 16.0 2024-09-23 17:33:52,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=310016.0, ans=0.125 2024-09-23 17:33:52,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=310016.0, ans=0.125 2024-09-23 17:33:54,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2024-09-23 17:34:12,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.17 vs. limit=15.0 2024-09-23 17:34:24,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=310109.3333333333, ans=0.2 2024-09-23 17:34:27,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=310109.3333333333, ans=0.125 2024-09-23 17:34:33,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=310109.3333333333, ans=0.2 2024-09-23 17:34:43,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=310156.0, ans=0.125 2024-09-23 17:34:45,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-09-23 17:34:51,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=310156.0, ans=0.0 2024-09-23 17:35:00,727 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.233e+02 1.340e+02 1.521e+02 2.141e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-23 17:35:13,461 INFO [train.py:1198] (0/4) Epoch 18, batch 250, loss[loss=0.251, ctc_loss=0.1717, cr_loss=0.3961, over 17075.00 frames. ], tot_loss[loss=0.2223, ctc_loss=0.1494, cr_loss=0.3647, over 2411817.42 frames. ], batch size: 56, lr: 6.77e-03, grad_scale: 16.0 2024-09-23 17:35:17,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310249.3333333333, ans=0.1 2024-09-23 17:35:40,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=310296.0, ans=0.125 2024-09-23 17:36:14,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=310389.3333333333, ans=0.125 2024-09-23 17:36:15,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310389.3333333333, ans=0.1 2024-09-23 17:36:27,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=310436.0, ans=0.07 2024-09-23 17:36:36,397 INFO [train.py:1198] (0/4) Epoch 18, batch 300, loss[loss=0.1948, ctc_loss=0.1264, cr_loss=0.3418, over 16942.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1484, cr_loss=0.3636, over 2621875.89 frames. ], batch size: 42, lr: 6.76e-03, grad_scale: 16.0 2024-09-23 17:36:50,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=310529.3333333333, ans=0.125 2024-09-23 17:36:59,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-09-23 17:37:11,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=310576.0, ans=0.125 2024-09-23 17:37:27,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=310622.6666666667, ans=0.125 2024-09-23 17:37:46,273 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.225e+02 1.352e+02 1.557e+02 2.949e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-23 17:37:58,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=310669.3333333333, ans=0.2 2024-09-23 17:38:01,326 INFO [train.py:1198] (0/4) Epoch 18, batch 350, loss[loss=0.219, ctc_loss=0.1444, cr_loss=0.373, over 17327.00 frames. ], tot_loss[loss=0.2236, ctc_loss=0.1503, cr_loss=0.3668, over 2785708.61 frames. ], batch size: 51, lr: 6.76e-03, grad_scale: 16.0 2024-09-23 17:38:54,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=310856.0, ans=0.125 2024-09-23 17:39:24,260 INFO [train.py:1198] (0/4) Epoch 18, batch 400, loss[loss=0.208, ctc_loss=0.1376, cr_loss=0.3523, over 17072.00 frames. ], tot_loss[loss=0.2232, ctc_loss=0.1499, cr_loss=0.3663, over 2910250.25 frames. ], batch size: 46, lr: 6.76e-03, grad_scale: 32.0 2024-09-23 17:39:24,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=310949.3333333333, ans=0.2 2024-09-23 17:39:38,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=310996.0, ans=0.2 2024-09-23 17:39:42,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=310996.0, ans=0.04949747468305833 2024-09-23 17:40:08,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311042.6666666667, ans=0.1 2024-09-23 17:40:12,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2024-09-23 17:40:21,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=311089.3333333333, ans=0.2 2024-09-23 17:40:31,100 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.269e+02 1.393e+02 1.575e+02 2.470e+02, threshold=2.786e+02, percent-clipped=0.0 2024-09-23 17:40:43,673 INFO [train.py:1198] (0/4) Epoch 18, batch 450, loss[loss=0.2079, ctc_loss=0.1373, cr_loss=0.3528, over 17357.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.1502, cr_loss=0.3664, over 3006746.62 frames. ], batch size: 48, lr: 6.76e-03, grad_scale: 32.0 2024-09-23 17:41:10,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=311229.3333333333, ans=0.0 2024-09-23 17:41:15,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-09-23 17:41:37,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=311322.6666666667, ans=0.125 2024-09-23 17:41:39,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-09-23 17:41:49,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=311369.3333333333, ans=0.125 2024-09-23 17:42:05,310 INFO [train.py:1198] (0/4) Epoch 18, batch 500, loss[loss=0.2698, ctc_loss=0.191, cr_loss=0.3939, over 11826.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1495, cr_loss=0.3649, over 3067784.15 frames. ], batch size: 123, lr: 6.75e-03, grad_scale: 32.0 2024-09-23 17:42:14,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=311416.0, ans=0.125 2024-09-23 17:42:21,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=311416.0, ans=0.125 2024-09-23 17:42:27,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=311462.6666666667, ans=0.0 2024-09-23 17:42:41,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=311509.3333333333, ans=0.5 2024-09-23 17:43:17,644 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.265e+02 1.370e+02 1.572e+02 2.414e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-23 17:43:19,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=311602.6666666667, ans=0.2 2024-09-23 17:43:30,323 INFO [train.py:1198] (0/4) Epoch 18, batch 550, loss[loss=0.1936, ctc_loss=0.1284, cr_loss=0.3257, over 16216.00 frames. ], tot_loss[loss=0.2226, ctc_loss=0.1497, cr_loss=0.3644, over 3124441.17 frames. ], batch size: 36, lr: 6.75e-03, grad_scale: 32.0 2024-09-23 17:43:57,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=311696.0, ans=0.125 2024-09-23 17:43:57,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=311696.0, ans=0.1 2024-09-23 17:44:13,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=311742.6666666667, ans=0.2 2024-09-23 17:44:13,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2024-09-23 17:44:16,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=311742.6666666667, ans=0.125 2024-09-23 17:44:21,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=311789.3333333333, ans=0.1 2024-09-23 17:44:32,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=311789.3333333333, ans=0.125 2024-09-23 17:44:53,376 INFO [train.py:1198] (0/4) Epoch 18, batch 600, loss[loss=0.2622, ctc_loss=0.1786, cr_loss=0.4183, over 14860.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.148, cr_loss=0.3627, over 3183555.27 frames. ], batch size: 89, lr: 6.75e-03, grad_scale: 32.0 2024-09-23 17:44:55,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=311882.6666666667, ans=12.0 2024-09-23 17:45:52,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312022.6666666667, ans=0.1 2024-09-23 17:45:54,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=312022.6666666667, ans=0.125 2024-09-23 17:46:03,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=312069.3333333333, ans=0.025 2024-09-23 17:46:04,639 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.282e+02 1.384e+02 1.538e+02 2.458e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-23 17:46:16,006 INFO [train.py:1198] (0/4) Epoch 18, batch 650, loss[loss=0.1876, ctc_loss=0.126, cr_loss=0.3081, over 16314.00 frames. ], tot_loss[loss=0.2216, ctc_loss=0.1487, cr_loss=0.3646, over 3224370.90 frames. ], batch size: 36, lr: 6.75e-03, grad_scale: 16.0 2024-09-23 17:46:27,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=312116.0, ans=0.125 2024-09-23 17:46:27,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=312116.0, ans=0.0 2024-09-23 17:46:38,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=312162.6666666667, ans=0.2 2024-09-23 17:46:45,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=312162.6666666667, ans=0.125 2024-09-23 17:47:01,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=312209.3333333333, ans=0.125 2024-09-23 17:47:34,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=312302.6666666667, ans=0.125 2024-09-23 17:47:34,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=312302.6666666667, ans=0.125 2024-09-23 17:47:38,726 INFO [train.py:1198] (0/4) Epoch 18, batch 700, loss[loss=0.2439, ctc_loss=0.1655, cr_loss=0.3918, over 17107.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1478, cr_loss=0.3635, over 3262981.96 frames. ], batch size: 49, lr: 6.74e-03, grad_scale: 16.0 2024-09-23 17:48:13,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=312442.6666666667, ans=15.0 2024-09-23 17:48:49,978 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.252e+02 1.369e+02 1.593e+02 2.409e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-23 17:48:55,126 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:49:00,934 INFO [train.py:1198] (0/4) Epoch 18, batch 750, loss[loss=0.2358, ctc_loss=0.1585, cr_loss=0.3864, over 16945.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1476, cr_loss=0.3634, over 3283346.05 frames. ], batch size: 58, lr: 6.74e-03, grad_scale: 16.0 2024-09-23 17:49:07,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2024-09-23 17:49:18,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=312629.3333333333, ans=0.0 2024-09-23 17:49:21,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=312629.3333333333, ans=0.0 2024-09-23 17:49:37,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=312676.0, ans=0.0 2024-09-23 17:49:58,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=22.5 2024-09-23 17:50:11,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=22.5 2024-09-23 17:50:15,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=312769.3333333333, ans=0.0 2024-09-23 17:50:21,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=312816.0, ans=0.125 2024-09-23 17:50:23,308 INFO [train.py:1198] (0/4) Epoch 18, batch 800, loss[loss=0.2761, ctc_loss=0.1911, cr_loss=0.4252, over 15930.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1486, cr_loss=0.3639, over 3295873.01 frames. ], batch size: 74, lr: 6.74e-03, grad_scale: 32.0 2024-09-23 17:50:25,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312816.0, ans=0.1 2024-09-23 17:50:49,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=312862.6666666667, ans=0.0 2024-09-23 17:51:18,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=312956.0, ans=0.125 2024-09-23 17:51:19,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=312956.0, ans=10.0 2024-09-23 17:51:20,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=312956.0, ans=0.1 2024-09-23 17:51:28,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=313002.6666666667, ans=0.2 2024-09-23 17:51:34,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.78 vs. limit=5.0 2024-09-23 17:51:34,556 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.268e+02 1.368e+02 1.483e+02 2.318e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-23 17:51:42,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=313002.6666666667, ans=0.025 2024-09-23 17:51:45,659 INFO [train.py:1198] (0/4) Epoch 18, batch 850, loss[loss=0.2387, ctc_loss=0.1605, cr_loss=0.3909, over 17008.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1477, cr_loss=0.3627, over 3314864.68 frames. ], batch size: 53, lr: 6.74e-03, grad_scale: 32.0 2024-09-23 17:51:46,049 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:52:10,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=313096.0, ans=0.0 2024-09-23 17:52:15,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=313096.0, ans=0.0 2024-09-23 17:52:23,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=313142.6666666667, ans=0.125 2024-09-23 17:52:28,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=313142.6666666667, ans=0.125 2024-09-23 17:52:46,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=313189.3333333333, ans=0.0 2024-09-23 17:52:46,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=313189.3333333333, ans=0.0 2024-09-23 17:52:48,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=22.5 2024-09-23 17:53:01,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2024-09-23 17:53:10,236 INFO [train.py:1198] (0/4) Epoch 18, batch 900, loss[loss=0.2345, ctc_loss=0.1587, cr_loss=0.3794, over 17214.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1475, cr_loss=0.3627, over 3329292.55 frames. ], batch size: 50, lr: 6.73e-03, grad_scale: 32.0 2024-09-23 17:53:39,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=313329.3333333333, ans=0.2 2024-09-23 17:53:44,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=22.5 2024-09-23 17:54:06,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=313422.6666666667, ans=0.125 2024-09-23 17:54:21,420 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.286e+02 1.405e+02 1.600e+02 2.669e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-23 17:54:28,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=313469.3333333333, ans=0.2 2024-09-23 17:54:32,764 INFO [train.py:1198] (0/4) Epoch 18, batch 950, loss[loss=0.2236, ctc_loss=0.1484, cr_loss=0.3762, over 17032.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1475, cr_loss=0.3626, over 3342350.35 frames. ], batch size: 44, lr: 6.73e-03, grad_scale: 32.0 2024-09-23 17:55:34,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=313656.0, ans=0.125 2024-09-23 17:55:54,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=313749.3333333333, ans=0.2 2024-09-23 17:55:54,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=313749.3333333333, ans=0.0 2024-09-23 17:55:55,794 INFO [train.py:1198] (0/4) Epoch 18, batch 1000, loss[loss=0.2017, ctc_loss=0.1324, cr_loss=0.3465, over 17012.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.148, cr_loss=0.3632, over 3342515.47 frames. ], batch size: 53, lr: 6.73e-03, grad_scale: 32.0 2024-09-23 17:57:06,779 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.230e+02 1.328e+02 1.431e+02 2.241e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-23 17:57:17,823 INFO [train.py:1198] (0/4) Epoch 18, batch 1050, loss[loss=0.1972, ctc_loss=0.1297, cr_loss=0.3372, over 17013.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1478, cr_loss=0.3635, over 3349217.15 frames. ], batch size: 51, lr: 6.73e-03, grad_scale: 32.0 2024-09-23 17:57:22,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=313982.6666666667, ans=0.0 2024-09-23 17:57:24,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=313982.6666666667, ans=0.0 2024-09-23 17:57:28,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2024-09-23 17:57:56,164 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2024-09-23 17:58:03,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=314076.0, ans=15.0 2024-09-23 17:58:11,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=314122.6666666667, ans=0.0 2024-09-23 17:58:14,731 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 17:58:37,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-09-23 17:58:40,069 INFO [train.py:1198] (0/4) Epoch 18, batch 1100, loss[loss=0.2416, ctc_loss=0.1661, cr_loss=0.3774, over 16575.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1479, cr_loss=0.3634, over 3340854.89 frames. ], batch size: 66, lr: 6.72e-03, grad_scale: 32.0 2024-09-23 17:58:41,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=314216.0, ans=0.125 2024-09-23 17:59:17,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=314309.3333333333, ans=0.125 2024-09-23 17:59:25,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=314309.3333333333, ans=0.05 2024-09-23 17:59:35,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=314356.0, ans=0.125 2024-09-23 17:59:38,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314356.0, ans=0.1 2024-09-23 17:59:41,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=314356.0, ans=0.0 2024-09-23 17:59:51,145 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.280e+02 1.383e+02 1.491e+02 2.284e+02, threshold=2.766e+02, percent-clipped=0.0 2024-09-23 17:59:58,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.52 vs. limit=15.0 2024-09-23 18:00:02,378 INFO [train.py:1198] (0/4) Epoch 18, batch 1150, loss[loss=0.2038, ctc_loss=0.1364, cr_loss=0.3371, over 17234.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.148, cr_loss=0.3634, over 3345917.50 frames. ], batch size: 50, lr: 6.72e-03, grad_scale: 32.0 2024-09-23 18:00:15,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=314449.3333333333, ans=0.0 2024-09-23 18:00:21,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2024-09-23 18:00:30,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2024-09-23 18:01:22,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=314636.0, ans=0.125 2024-09-23 18:01:25,141 INFO [train.py:1198] (0/4) Epoch 18, batch 1200, loss[loss=0.2447, ctc_loss=0.1657, cr_loss=0.3948, over 17020.00 frames. ], tot_loss[loss=0.2211, ctc_loss=0.1482, cr_loss=0.3644, over 3352483.89 frames. ], batch size: 52, lr: 6.72e-03, grad_scale: 32.0 2024-09-23 18:02:12,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=314776.0, ans=0.125 2024-09-23 18:02:15,895 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 18:02:38,965 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.291e+02 1.389e+02 1.552e+02 2.070e+02, threshold=2.777e+02, percent-clipped=0.0 2024-09-23 18:02:49,969 INFO [train.py:1198] (0/4) Epoch 18, batch 1250, loss[loss=0.2292, ctc_loss=0.1548, cr_loss=0.3718, over 17106.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1488, cr_loss=0.3659, over 3358922.13 frames. ], batch size: 49, lr: 6.72e-03, grad_scale: 32.0 2024-09-23 18:03:17,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=314962.6666666667, ans=0.125 2024-09-23 18:03:28,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=315009.3333333333, ans=0.0 2024-09-23 18:03:32,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2024-09-23 18:03:33,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=315009.3333333333, ans=0.2 2024-09-23 18:03:38,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=315056.0, ans=0.125 2024-09-23 18:03:38,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=315056.0, ans=0.0 2024-09-23 18:03:41,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=315056.0, ans=0.0 2024-09-23 18:03:47,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=315056.0, ans=0.125 2024-09-23 18:03:49,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=315056.0, ans=0.125 2024-09-23 18:03:49,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2024-09-23 18:04:10,279 INFO [train.py:1198] (0/4) Epoch 18, batch 1300, loss[loss=0.1805, ctc_loss=0.1188, cr_loss=0.3085, over 17186.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1488, cr_loss=0.3656, over 3363370.87 frames. ], batch size: 41, lr: 6.71e-03, grad_scale: 32.0 2024-09-23 18:04:12,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=22.5 2024-09-23 18:04:38,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315196.0, ans=0.1 2024-09-23 18:04:52,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=315242.6666666667, ans=0.2 2024-09-23 18:04:57,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=315242.6666666667, ans=10.0 2024-09-23 18:05:00,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=315289.3333333333, ans=0.125 2024-09-23 18:05:09,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=315289.3333333333, ans=0.125 2024-09-23 18:05:17,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=315336.0, ans=0.07 2024-09-23 18:05:21,639 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.312e+02 1.416e+02 1.661e+02 2.501e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 18:05:27,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2024-09-23 18:05:32,772 INFO [train.py:1198] (0/4) Epoch 18, batch 1350, loss[loss=0.248, ctc_loss=0.1671, cr_loss=0.4044, over 17301.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1488, cr_loss=0.3653, over 3363128.70 frames. ], batch size: 49, lr: 6.71e-03, grad_scale: 32.0 2024-09-23 18:05:44,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=315382.6666666667, ans=0.125 2024-09-23 18:05:44,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=315382.6666666667, ans=0.125 2024-09-23 18:05:53,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-09-23 18:05:54,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=315429.3333333333, ans=0.125 2024-09-23 18:05:56,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=315429.3333333333, ans=0.125 2024-09-23 18:06:02,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=315429.3333333333, ans=0.1 2024-09-23 18:06:30,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-09-23 18:06:34,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=315522.6666666667, ans=0.125 2024-09-23 18:06:43,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.72 vs. limit=22.5 2024-09-23 18:06:45,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=315569.3333333333, ans=0.125 2024-09-23 18:06:48,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=315569.3333333333, ans=0.125 2024-09-23 18:06:53,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=315616.0, ans=0.04949747468305833 2024-09-23 18:06:54,863 INFO [train.py:1198] (0/4) Epoch 18, batch 1400, loss[loss=0.1853, ctc_loss=0.1195, cr_loss=0.3293, over 16295.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1484, cr_loss=0.3647, over 3369308.63 frames. ], batch size: 36, lr: 6.71e-03, grad_scale: 32.0 2024-09-23 18:07:22,630 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=15.0 2024-09-23 18:07:32,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=315709.3333333333, ans=0.0 2024-09-23 18:07:33,587 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2024-09-23 18:07:47,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-23 18:08:08,561 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.236e+02 1.313e+02 1.450e+02 2.376e+02, threshold=2.626e+02, percent-clipped=0.0 2024-09-23 18:08:19,847 INFO [train.py:1198] (0/4) Epoch 18, batch 1450, loss[loss=0.2373, ctc_loss=0.1575, cr_loss=0.3988, over 17042.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1493, cr_loss=0.3657, over 3366541.37 frames. ], batch size: 56, lr: 6.71e-03, grad_scale: 32.0 2024-09-23 18:08:44,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=315896.0, ans=0.125 2024-09-23 18:08:50,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=315942.6666666667, ans=0.125 2024-09-23 18:09:06,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=315989.3333333333, ans=0.125 2024-09-23 18:09:15,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=315989.3333333333, ans=0.0 2024-09-23 18:09:28,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=316036.0, ans=0.125 2024-09-23 18:09:42,228 INFO [train.py:1198] (0/4) Epoch 18, batch 1500, loss[loss=0.2474, ctc_loss=0.1696, cr_loss=0.3892, over 17225.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1493, cr_loss=0.3654, over 3361826.33 frames. ], batch size: 50, lr: 6.70e-03, grad_scale: 32.0 2024-09-23 18:09:45,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=316082.6666666667, ans=0.125 2024-09-23 18:09:49,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=316082.6666666667, ans=0.125 2024-09-23 18:09:49,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=316082.6666666667, ans=0.0 2024-09-23 18:10:02,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=316129.3333333333, ans=0.125 2024-09-23 18:10:53,732 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.296e+02 1.405e+02 1.565e+02 2.957e+02, threshold=2.810e+02, percent-clipped=1.0 2024-09-23 18:10:55,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=316269.3333333333, ans=0.125 2024-09-23 18:10:56,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2024-09-23 18:10:57,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=316269.3333333333, ans=0.125 2024-09-23 18:11:05,019 INFO [train.py:1198] (0/4) Epoch 18, batch 1550, loss[loss=0.1885, ctc_loss=0.1267, cr_loss=0.3093, over 17070.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1486, cr_loss=0.3644, over 3371519.75 frames. ], batch size: 39, lr: 6.70e-03, grad_scale: 32.0 2024-09-23 18:11:05,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=316316.0, ans=0.0 2024-09-23 18:11:11,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=316316.0, ans=0.1 2024-09-23 18:11:13,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=316316.0, ans=0.125 2024-09-23 18:11:40,727 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 18:12:02,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=316456.0, ans=0.2 2024-09-23 18:12:25,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=316502.6666666667, ans=0.0 2024-09-23 18:12:28,051 INFO [train.py:1198] (0/4) Epoch 18, batch 1600, loss[loss=0.2108, ctc_loss=0.1387, cr_loss=0.3605, over 17149.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1486, cr_loss=0.3645, over 3372621.68 frames. ], batch size: 45, lr: 6.70e-03, grad_scale: 32.0 2024-09-23 18:12:28,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=316549.3333333333, ans=0.125 2024-09-23 18:12:53,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316596.0, ans=0.1 2024-09-23 18:13:18,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=316689.3333333333, ans=0.025 2024-09-23 18:13:22,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=316689.3333333333, ans=0.05 2024-09-23 18:13:39,405 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.280e+02 1.386e+02 1.539e+02 3.392e+02, threshold=2.772e+02, percent-clipped=1.0 2024-09-23 18:13:50,544 INFO [train.py:1198] (0/4) Epoch 18, batch 1650, loss[loss=0.2885, ctc_loss=0.2097, cr_loss=0.3942, over 11380.00 frames. ], tot_loss[loss=0.2215, ctc_loss=0.1486, cr_loss=0.3647, over 3363886.83 frames. ], batch size: 123, lr: 6.70e-03, grad_scale: 32.0 2024-09-23 18:14:20,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=316829.3333333333, ans=0.125 2024-09-23 18:14:23,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=316876.0, ans=0.125 2024-09-23 18:14:30,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=316876.0, ans=0.125 2024-09-23 18:15:13,364 INFO [train.py:1198] (0/4) Epoch 18, batch 1700, loss[loss=0.22, ctc_loss=0.145, cr_loss=0.3749, over 17194.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.149, cr_loss=0.3652, over 3357333.40 frames. ], batch size: 47, lr: 6.69e-03, grad_scale: 32.0 2024-09-23 18:15:27,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-09-23 18:15:34,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=317062.6666666667, ans=0.035 2024-09-23 18:15:37,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=317062.6666666667, ans=0.125 2024-09-23 18:16:24,388 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.272e+02 1.364e+02 1.549e+02 1.950e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 18:16:35,524 INFO [train.py:1198] (0/4) Epoch 18, batch 1750, loss[loss=0.2289, ctc_loss=0.1543, cr_loss=0.3732, over 17044.00 frames. ], tot_loss[loss=0.2229, ctc_loss=0.1497, cr_loss=0.3662, over 3357145.10 frames. ], batch size: 52, lr: 6.69e-03, grad_scale: 32.0 2024-09-23 18:16:42,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=317249.3333333333, ans=0.0 2024-09-23 18:17:06,087 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-68000.pt 2024-09-23 18:17:21,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=317342.6666666667, ans=0.0 2024-09-23 18:17:33,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=317389.3333333333, ans=0.125 2024-09-23 18:17:39,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2024-09-23 18:17:44,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=317389.3333333333, ans=0.0 2024-09-23 18:17:50,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=317436.0, ans=0.2 2024-09-23 18:18:02,971 INFO [train.py:1198] (0/4) Epoch 18, batch 1800, loss[loss=0.2533, ctc_loss=0.171, cr_loss=0.4112, over 16774.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.149, cr_loss=0.3653, over 3367188.51 frames. ], batch size: 61, lr: 6.69e-03, grad_scale: 32.0 2024-09-23 18:18:11,953 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-23 18:18:22,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2024-09-23 18:18:28,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-23 18:18:45,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=317576.0, ans=0.2 2024-09-23 18:18:52,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=317622.6666666667, ans=0.125 2024-09-23 18:18:54,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=317622.6666666667, ans=0.0 2024-09-23 18:18:59,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=15.0 2024-09-23 18:18:59,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=317622.6666666667, ans=0.2 2024-09-23 18:19:09,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=317669.3333333333, ans=0.125 2024-09-23 18:19:09,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=317669.3333333333, ans=0.2 2024-09-23 18:19:14,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-09-23 18:19:14,902 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.243e+02 1.320e+02 1.455e+02 2.029e+02, threshold=2.640e+02, percent-clipped=0.0 2024-09-23 18:19:21,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317669.3333333333, ans=0.1 2024-09-23 18:19:26,178 INFO [train.py:1198] (0/4) Epoch 18, batch 1850, loss[loss=0.218, ctc_loss=0.1444, cr_loss=0.3682, over 17301.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.148, cr_loss=0.3637, over 3369709.78 frames. ], batch size: 49, lr: 6.69e-03, grad_scale: 32.0 2024-09-23 18:19:33,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317716.0, ans=0.1 2024-09-23 18:19:41,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=317762.6666666667, ans=0.125 2024-09-23 18:20:40,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=317902.6666666667, ans=15.0 2024-09-23 18:20:49,119 INFO [train.py:1198] (0/4) Epoch 18, batch 1900, loss[loss=0.2272, ctc_loss=0.15, cr_loss=0.3859, over 17353.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1474, cr_loss=0.3631, over 3369751.27 frames. ], batch size: 48, lr: 6.68e-03, grad_scale: 32.0 2024-09-23 18:21:04,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=317996.0, ans=0.125 2024-09-23 18:21:10,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=317996.0, ans=0.125 2024-09-23 18:21:11,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2024-09-23 18:21:58,144 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.235e+02 1.336e+02 1.423e+02 1.924e+02, threshold=2.671e+02, percent-clipped=0.0 2024-09-23 18:22:11,955 INFO [train.py:1198] (0/4) Epoch 18, batch 1950, loss[loss=0.1908, ctc_loss=0.122, cr_loss=0.3438, over 17086.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1469, cr_loss=0.3622, over 3370353.75 frames. ], batch size: 43, lr: 6.68e-03, grad_scale: 32.0 2024-09-23 18:22:12,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=318182.6666666667, ans=0.025 2024-09-23 18:22:19,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.52 vs. limit=10.0 2024-09-23 18:22:27,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-23 18:23:24,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2024-09-23 18:23:34,527 INFO [train.py:1198] (0/4) Epoch 18, batch 2000, loss[loss=0.2051, ctc_loss=0.1389, cr_loss=0.3311, over 17146.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1474, cr_loss=0.3632, over 3373266.52 frames. ], batch size: 45, lr: 6.68e-03, grad_scale: 32.0 2024-09-23 18:23:40,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.97 vs. limit=15.0 2024-09-23 18:23:47,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=318416.0, ans=0.0 2024-09-23 18:23:59,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=15.0 2024-09-23 18:24:03,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=318462.6666666667, ans=0.2 2024-09-23 18:24:40,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318602.6666666667, ans=0.1 2024-09-23 18:24:44,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=318602.6666666667, ans=0.125 2024-09-23 18:24:46,013 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.249e+02 1.342e+02 1.462e+02 2.296e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-23 18:24:57,329 INFO [train.py:1198] (0/4) Epoch 18, batch 2050, loss[loss=0.2471, ctc_loss=0.1658, cr_loss=0.4064, over 17308.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1473, cr_loss=0.363, over 3369945.59 frames. ], batch size: 49, lr: 6.68e-03, grad_scale: 32.0 2024-09-23 18:25:31,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=318742.6666666667, ans=0.125 2024-09-23 18:25:37,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=318742.6666666667, ans=0.125 2024-09-23 18:26:01,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=22.5 2024-09-23 18:26:06,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2024-09-23 18:26:12,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=318836.0, ans=0.125 2024-09-23 18:26:16,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318836.0, ans=0.125 2024-09-23 18:26:19,582 INFO [train.py:1198] (0/4) Epoch 18, batch 2100, loss[loss=0.2324, ctc_loss=0.1566, cr_loss=0.3791, over 17019.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1483, cr_loss=0.3655, over 3367123.06 frames. ], batch size: 56, lr: 6.67e-03, grad_scale: 32.0 2024-09-23 18:26:32,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318882.6666666667, ans=0.1 2024-09-23 18:26:33,191 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.96 vs. limit=15.0 2024-09-23 18:26:34,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318929.3333333333, ans=0.1 2024-09-23 18:26:37,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318929.3333333333, ans=0.125 2024-09-23 18:26:50,926 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2024-09-23 18:27:18,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=319022.6666666667, ans=0.0 2024-09-23 18:27:19,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=319022.6666666667, ans=0.125 2024-09-23 18:27:31,219 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.263e+02 1.339e+02 1.437e+02 2.082e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-23 18:27:44,929 INFO [train.py:1198] (0/4) Epoch 18, batch 2150, loss[loss=0.2331, ctc_loss=0.1572, cr_loss=0.3791, over 15821.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1489, cr_loss=0.3655, over 3350027.67 frames. ], batch size: 74, lr: 6.67e-03, grad_scale: 32.0 2024-09-23 18:27:45,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-09-23 18:28:39,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.11 vs. limit=12.0 2024-09-23 18:28:55,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=319302.6666666667, ans=0.025 2024-09-23 18:29:05,237 INFO [train.py:1198] (0/4) Epoch 18, batch 2200, loss[loss=0.2199, ctc_loss=0.1475, cr_loss=0.3617, over 17172.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1479, cr_loss=0.3633, over 3358027.31 frames. ], batch size: 45, lr: 6.67e-03, grad_scale: 16.0 2024-09-23 18:29:22,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=319396.0, ans=0.025 2024-09-23 18:29:44,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=319442.6666666667, ans=0.125 2024-09-23 18:30:03,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=319489.3333333333, ans=0.0 2024-09-23 18:30:17,667 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.244e+02 1.317e+02 1.469e+02 2.029e+02, threshold=2.633e+02, percent-clipped=0.0 2024-09-23 18:30:27,434 INFO [train.py:1198] (0/4) Epoch 18, batch 2250, loss[loss=0.2241, ctc_loss=0.1477, cr_loss=0.3825, over 17202.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1475, cr_loss=0.3636, over 3365171.64 frames. ], batch size: 50, lr: 6.67e-03, grad_scale: 16.0 2024-09-23 18:31:50,360 INFO [train.py:1198] (0/4) Epoch 18, batch 2300, loss[loss=0.2214, ctc_loss=0.1488, cr_loss=0.3632, over 17065.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1478, cr_loss=0.3635, over 3358187.87 frames. ], batch size: 46, lr: 6.67e-03, grad_scale: 16.0 2024-09-23 18:32:01,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-09-23 18:32:01,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2024-09-23 18:32:06,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-09-23 18:32:36,593 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 18:32:58,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=320002.6666666667, ans=0.07 2024-09-23 18:33:04,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=320002.6666666667, ans=0.0 2024-09-23 18:33:05,708 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.272e+02 1.367e+02 1.553e+02 2.225e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-23 18:33:11,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-09-23 18:33:15,258 INFO [train.py:1198] (0/4) Epoch 18, batch 2350, loss[loss=0.2477, ctc_loss=0.1653, cr_loss=0.4116, over 17054.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1472, cr_loss=0.3623, over 3359077.06 frames. ], batch size: 56, lr: 6.66e-03, grad_scale: 16.0 2024-09-23 18:33:33,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320096.0, ans=0.1 2024-09-23 18:33:41,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=320096.0, ans=0.035 2024-09-23 18:33:41,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=320096.0, ans=0.1 2024-09-23 18:34:03,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=320189.3333333333, ans=0.0 2024-09-23 18:34:37,847 INFO [train.py:1198] (0/4) Epoch 18, batch 2400, loss[loss=0.1875, ctc_loss=0.1268, cr_loss=0.3035, over 17109.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.1461, cr_loss=0.3607, over 3362108.71 frames. ], batch size: 40, lr: 6.66e-03, grad_scale: 32.0 2024-09-23 18:34:59,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=320329.3333333333, ans=0.125 2024-09-23 18:35:07,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=320329.3333333333, ans=0.125 2024-09-23 18:35:13,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320376.0, ans=0.1 2024-09-23 18:35:20,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2024-09-23 18:35:35,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320422.6666666667, ans=0.1 2024-09-23 18:35:50,940 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.273e+02 1.349e+02 1.474e+02 2.344e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-23 18:35:58,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=22.5 2024-09-23 18:36:00,512 INFO [train.py:1198] (0/4) Epoch 18, batch 2450, loss[loss=0.2425, ctc_loss=0.1653, cr_loss=0.3857, over 17011.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.146, cr_loss=0.3609, over 3362852.69 frames. ], batch size: 56, lr: 6.66e-03, grad_scale: 32.0 2024-09-23 18:36:32,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=320609.3333333333, ans=0.125 2024-09-23 18:37:22,753 INFO [train.py:1198] (0/4) Epoch 18, batch 2500, loss[loss=0.2501, ctc_loss=0.1805, cr_loss=0.3483, over 11836.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1469, cr_loss=0.3622, over 3361552.17 frames. ], batch size: 123, lr: 6.66e-03, grad_scale: 32.0 2024-09-23 18:37:48,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320796.0, ans=0.1 2024-09-23 18:37:52,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=320796.0, ans=0.125 2024-09-23 18:38:21,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=320889.3333333333, ans=0.125 2024-09-23 18:38:37,301 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.301e+02 1.381e+02 1.501e+02 2.403e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-23 18:38:45,214 INFO [train.py:1198] (0/4) Epoch 18, batch 2550, loss[loss=0.2815, ctc_loss=0.1919, cr_loss=0.4482, over 14942.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1474, cr_loss=0.3629, over 3352767.06 frames. ], batch size: 89, lr: 6.65e-03, grad_scale: 16.0 2024-09-23 18:38:53,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=320982.6666666667, ans=0.125 2024-09-23 18:39:17,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=321029.3333333333, ans=0.0 2024-09-23 18:40:07,840 INFO [train.py:1198] (0/4) Epoch 18, batch 2600, loss[loss=0.172, ctc_loss=0.1139, cr_loss=0.2907, over 17127.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1476, cr_loss=0.3627, over 3341671.10 frames. ], batch size: 40, lr: 6.65e-03, grad_scale: 16.0 2024-09-23 18:40:18,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2024-09-23 18:40:19,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=321216.0, ans=0.1 2024-09-23 18:40:50,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=321309.3333333333, ans=0.0 2024-09-23 18:41:22,324 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.312e+02 1.421e+02 1.594e+02 2.386e+02, threshold=2.841e+02, percent-clipped=0.0 2024-09-23 18:41:30,436 INFO [train.py:1198] (0/4) Epoch 18, batch 2650, loss[loss=0.1947, ctc_loss=0.127, cr_loss=0.3388, over 17089.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1479, cr_loss=0.3634, over 3337954.91 frames. ], batch size: 43, lr: 6.65e-03, grad_scale: 16.0 2024-09-23 18:41:41,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=321449.3333333333, ans=0.125 2024-09-23 18:41:58,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=321496.0, ans=0.0 2024-09-23 18:42:00,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=321496.0, ans=0.1 2024-09-23 18:42:06,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=321542.6666666667, ans=0.04949747468305833 2024-09-23 18:42:55,425 INFO [train.py:1198] (0/4) Epoch 18, batch 2700, loss[loss=0.1932, ctc_loss=0.1285, cr_loss=0.3236, over 17104.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1473, cr_loss=0.3619, over 3336804.68 frames. ], batch size: 43, lr: 6.65e-03, grad_scale: 16.0 2024-09-23 18:43:05,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=321682.6666666667, ans=0.0 2024-09-23 18:43:21,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2024-09-23 18:43:51,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=321822.6666666667, ans=0.125 2024-09-23 18:43:53,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=321822.6666666667, ans=0.0 2024-09-23 18:44:02,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=321869.3333333333, ans=0.125 2024-09-23 18:44:09,952 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.267e+02 1.364e+02 1.525e+02 2.538e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 18:44:11,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=321869.3333333333, ans=0.0 2024-09-23 18:44:13,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321869.3333333333, ans=0.1 2024-09-23 18:44:17,774 INFO [train.py:1198] (0/4) Epoch 18, batch 2750, loss[loss=0.2243, ctc_loss=0.1489, cr_loss=0.3771, over 17343.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1478, cr_loss=0.363, over 3340931.85 frames. ], batch size: 52, lr: 6.64e-03, grad_scale: 16.0 2024-09-23 18:44:27,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=321916.0, ans=0.1 2024-09-23 18:44:29,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=321916.0, ans=0.0 2024-09-23 18:44:32,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=321962.6666666667, ans=0.125 2024-09-23 18:44:41,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-09-23 18:45:05,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.94 vs. limit=10.0 2024-09-23 18:45:11,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-09-23 18:45:19,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=322056.0, ans=0.125 2024-09-23 18:45:39,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=322149.3333333333, ans=0.0 2024-09-23 18:45:40,675 INFO [train.py:1198] (0/4) Epoch 18, batch 2800, loss[loss=0.2289, ctc_loss=0.1571, cr_loss=0.359, over 17224.00 frames. ], tot_loss[loss=0.22, ctc_loss=0.1477, cr_loss=0.3619, over 3335903.92 frames. ], batch size: 50, lr: 6.64e-03, grad_scale: 32.0 2024-09-23 18:45:48,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=322149.3333333333, ans=0.125 2024-09-23 18:46:20,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=322242.6666666667, ans=0.0 2024-09-23 18:46:42,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=322289.3333333333, ans=0.0 2024-09-23 18:46:55,163 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.235e+02 1.345e+02 1.484e+02 2.490e+02, threshold=2.689e+02, percent-clipped=0.0 2024-09-23 18:47:00,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=322336.0, ans=0.125 2024-09-23 18:47:03,239 INFO [train.py:1198] (0/4) Epoch 18, batch 2850, loss[loss=0.2008, ctc_loss=0.1347, cr_loss=0.3306, over 17267.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1476, cr_loss=0.3622, over 3337583.21 frames. ], batch size: 42, lr: 6.64e-03, grad_scale: 32.0 2024-09-23 18:47:07,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-09-23 18:47:11,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=322382.6666666667, ans=0.1 2024-09-23 18:47:39,008 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=22.5 2024-09-23 18:47:43,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=322476.0, ans=0.125 2024-09-23 18:47:45,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.06 vs. limit=15.0 2024-09-23 18:48:03,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=322522.6666666667, ans=0.125 2024-09-23 18:48:26,238 INFO [train.py:1198] (0/4) Epoch 18, batch 2900, loss[loss=0.2299, ctc_loss=0.1519, cr_loss=0.3898, over 17368.00 frames. ], tot_loss[loss=0.2208, ctc_loss=0.1481, cr_loss=0.3635, over 3343624.89 frames. ], batch size: 48, lr: 6.64e-03, grad_scale: 32.0 2024-09-23 18:48:36,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=322616.0, ans=0.04949747468305833 2024-09-23 18:49:40,537 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.288e+02 1.376e+02 1.532e+02 2.384e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-23 18:49:48,607 INFO [train.py:1198] (0/4) Epoch 18, batch 2950, loss[loss=0.238, ctc_loss=0.1621, cr_loss=0.3798, over 16895.00 frames. ], tot_loss[loss=0.2198, ctc_loss=0.1474, cr_loss=0.3624, over 3357465.19 frames. ], batch size: 58, lr: 6.63e-03, grad_scale: 32.0 2024-09-23 18:50:07,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.06 vs. limit=6.0 2024-09-23 18:50:22,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=322942.6666666667, ans=0.0 2024-09-23 18:50:25,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=322942.6666666667, ans=0.2 2024-09-23 18:50:27,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=322942.6666666667, ans=0.125 2024-09-23 18:50:33,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=322942.6666666667, ans=0.125 2024-09-23 18:50:46,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=322989.3333333333, ans=0.125 2024-09-23 18:51:07,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323036.0, ans=0.1 2024-09-23 18:51:10,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=323082.6666666667, ans=0.125 2024-09-23 18:51:11,432 INFO [train.py:1198] (0/4) Epoch 18, batch 3000, loss[loss=0.2378, ctc_loss=0.1611, cr_loss=0.3836, over 16084.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1475, cr_loss=0.3629, over 3356383.37 frames. ], batch size: 74, lr: 6.63e-03, grad_scale: 32.0 2024-09-23 18:51:11,433 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 18:51:26,836 INFO [train.py:1230] (0/4) Epoch 18, validation: loss=0.04062, ctc_loss=0.04062, cr_loss=7.511e-15, over 944034.00 frames. 2024-09-23 18:51:26,837 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 18:51:38,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.56 vs. limit=10.0 2024-09-23 18:52:04,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=323176.0, ans=0.0 2024-09-23 18:52:23,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323222.6666666667, ans=0.1 2024-09-23 18:52:27,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=323222.6666666667, ans=0.0 2024-09-23 18:52:31,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=12.0 2024-09-23 18:52:39,991 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.237e+02 1.313e+02 1.391e+02 1.785e+02, threshold=2.627e+02, percent-clipped=0.0 2024-09-23 18:52:47,794 INFO [train.py:1198] (0/4) Epoch 18, batch 3050, loss[loss=0.2453, ctc_loss=0.1673, cr_loss=0.39, over 16899.00 frames. ], tot_loss[loss=0.2207, ctc_loss=0.1479, cr_loss=0.3641, over 3360249.14 frames. ], batch size: 58, lr: 6.63e-03, grad_scale: 32.0 2024-09-23 18:52:54,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=323316.0, ans=0.09899494936611666 2024-09-23 18:53:01,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-09-23 18:53:51,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323502.6666666667, ans=0.1 2024-09-23 18:54:01,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=323502.6666666667, ans=0.125 2024-09-23 18:54:07,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=323549.3333333333, ans=0.125 2024-09-23 18:54:08,394 INFO [train.py:1198] (0/4) Epoch 18, batch 3100, loss[loss=0.1637, ctc_loss=0.1058, cr_loss=0.2896, over 17077.00 frames. ], tot_loss[loss=0.2213, ctc_loss=0.1483, cr_loss=0.365, over 3364151.91 frames. ], batch size: 40, lr: 6.63e-03, grad_scale: 32.0 2024-09-23 18:54:19,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=323549.3333333333, ans=0.125 2024-09-23 18:54:36,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=323596.0, ans=0.125 2024-09-23 18:54:44,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=323642.6666666667, ans=0.125 2024-09-23 18:54:47,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2024-09-23 18:54:50,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=323642.6666666667, ans=0.09899494936611666 2024-09-23 18:55:06,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=323689.3333333333, ans=0.025 2024-09-23 18:55:19,121 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.021e+02 1.256e+02 1.342e+02 1.439e+02 3.475e+02, threshold=2.684e+02, percent-clipped=1.0 2024-09-23 18:55:26,828 INFO [train.py:1198] (0/4) Epoch 18, batch 3150, loss[loss=0.2377, ctc_loss=0.1624, cr_loss=0.3766, over 17142.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1488, cr_loss=0.3656, over 3360957.92 frames. ], batch size: 48, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 18:55:44,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=323829.3333333333, ans=0.125 2024-09-23 18:56:01,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.50 vs. limit=15.0 2024-09-23 18:56:14,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=323922.6666666667, ans=0.05 2024-09-23 18:56:20,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=323922.6666666667, ans=0.2 2024-09-23 18:56:30,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=323969.3333333333, ans=0.0 2024-09-23 18:56:45,843 INFO [train.py:1198] (0/4) Epoch 18, batch 3200, loss[loss=0.2447, ctc_loss=0.1663, cr_loss=0.3921, over 17301.00 frames. ], tot_loss[loss=0.2235, ctc_loss=0.15, cr_loss=0.3674, over 3345337.05 frames. ], batch size: 51, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 18:56:57,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=324016.0, ans=0.025 2024-09-23 18:56:58,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324016.0, ans=0.1 2024-09-23 18:57:44,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=324156.0, ans=0.125 2024-09-23 18:57:46,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-23 18:57:58,436 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.249e+02 1.354e+02 1.459e+02 2.306e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-23 18:58:06,267 INFO [train.py:1198] (0/4) Epoch 18, batch 3250, loss[loss=0.2235, ctc_loss=0.1502, cr_loss=0.3667, over 17216.00 frames. ], tot_loss[loss=0.2238, ctc_loss=0.1503, cr_loss=0.3674, over 3338296.44 frames. ], batch size: 47, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 18:58:31,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2024-09-23 18:59:24,402 INFO [train.py:1198] (0/4) Epoch 18, batch 3300, loss[loss=0.2169, ctc_loss=0.1458, cr_loss=0.3554, over 17198.00 frames. ], tot_loss[loss=0.2228, ctc_loss=0.1497, cr_loss=0.3656, over 3328814.81 frames. ], batch size: 55, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 18:59:32,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=324482.6666666667, ans=0.2 2024-09-23 18:59:32,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=324482.6666666667, ans=0.0 2024-09-23 18:59:41,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=324529.3333333333, ans=0.125 2024-09-23 18:59:57,562 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=22.5 2024-09-23 19:00:03,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=324576.0, ans=0.05 2024-09-23 19:00:36,180 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.318e+02 1.406e+02 1.543e+02 2.189e+02, threshold=2.811e+02, percent-clipped=0.0 2024-09-23 19:00:44,002 INFO [train.py:1198] (0/4) Epoch 18, batch 3350, loss[loss=0.2606, ctc_loss=0.1796, cr_loss=0.4053, over 14980.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1493, cr_loss=0.3648, over 3328969.60 frames. ], batch size: 89, lr: 6.62e-03, grad_scale: 32.0 2024-09-23 19:01:17,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-23 19:01:25,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=324809.3333333333, ans=0.125 2024-09-23 19:01:25,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324809.3333333333, ans=0.1 2024-09-23 19:01:31,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=324856.0, ans=0.0 2024-09-23 19:01:33,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-09-23 19:02:04,711 INFO [train.py:1198] (0/4) Epoch 18, batch 3400, loss[loss=0.2628, ctc_loss=0.1822, cr_loss=0.4027, over 16996.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.1493, cr_loss=0.364, over 3331266.30 frames. ], batch size: 53, lr: 6.61e-03, grad_scale: 32.0 2024-09-23 19:02:11,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2024-09-23 19:02:17,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=324949.3333333333, ans=0.2 2024-09-23 19:02:23,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=324996.0, ans=0.0 2024-09-23 19:02:36,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=325042.6666666667, ans=15.0 2024-09-23 19:02:51,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325089.3333333333, ans=0.1 2024-09-23 19:02:54,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=325089.3333333333, ans=0.035 2024-09-23 19:02:58,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=325089.3333333333, ans=0.125 2024-09-23 19:03:10,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=325136.0, ans=0.025 2024-09-23 19:03:14,634 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.235e+02 1.326e+02 1.505e+02 2.372e+02, threshold=2.653e+02, percent-clipped=0.0 2024-09-23 19:03:22,390 INFO [train.py:1198] (0/4) Epoch 18, batch 3450, loss[loss=0.2134, ctc_loss=0.1428, cr_loss=0.3534, over 17088.00 frames. ], tot_loss[loss=0.222, ctc_loss=0.1492, cr_loss=0.3645, over 3339254.40 frames. ], batch size: 40, lr: 6.61e-03, grad_scale: 32.0 2024-09-23 19:03:22,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=325182.6666666667, ans=0.1 2024-09-23 19:03:47,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325229.3333333333, ans=0.1 2024-09-23 19:04:19,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=325322.6666666667, ans=0.125 2024-09-23 19:04:42,603 INFO [train.py:1198] (0/4) Epoch 18, batch 3500, loss[loss=0.2202, ctc_loss=0.143, cr_loss=0.3857, over 17128.00 frames. ], tot_loss[loss=0.2224, ctc_loss=0.1493, cr_loss=0.3656, over 3342037.33 frames. ], batch size: 48, lr: 6.61e-03, grad_scale: 32.0 2024-09-23 19:04:44,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=325416.0, ans=0.0 2024-09-23 19:04:48,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2024-09-23 19:05:03,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2024-09-23 19:05:14,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=325509.3333333333, ans=0.125 2024-09-23 19:05:31,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=325556.0, ans=0.125 2024-09-23 19:05:36,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=325556.0, ans=0.0 2024-09-23 19:05:37,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=325556.0, ans=0.0 2024-09-23 19:05:53,068 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.293e+02 1.389e+02 1.509e+02 2.092e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-23 19:05:53,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=325602.6666666667, ans=0.1 2024-09-23 19:05:53,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=325602.6666666667, ans=0.125 2024-09-23 19:06:00,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-09-23 19:06:00,906 INFO [train.py:1198] (0/4) Epoch 18, batch 3550, loss[loss=0.2143, ctc_loss=0.1435, cr_loss=0.3538, over 17011.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1496, cr_loss=0.3656, over 3339061.49 frames. ], batch size: 51, lr: 6.61e-03, grad_scale: 32.0 2024-09-23 19:06:22,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-09-23 19:06:23,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325696.0, ans=0.125 2024-09-23 19:06:43,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325742.6666666667, ans=0.125 2024-09-23 19:07:03,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2024-09-23 19:07:21,665 INFO [train.py:1198] (0/4) Epoch 18, batch 3600, loss[loss=0.261, ctc_loss=0.1878, cr_loss=0.3661, over 11192.00 frames. ], tot_loss[loss=0.2221, ctc_loss=0.149, cr_loss=0.3652, over 3344047.95 frames. ], batch size: 123, lr: 6.60e-03, grad_scale: 32.0 2024-09-23 19:07:30,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=22.5 2024-09-23 19:07:30,216 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=15.0 2024-09-23 19:08:07,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=326022.6666666667, ans=0.025 2024-09-23 19:08:16,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=326022.6666666667, ans=0.2 2024-09-23 19:08:31,458 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.271e+02 1.404e+02 1.560e+02 2.337e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-23 19:08:39,251 INFO [train.py:1198] (0/4) Epoch 18, batch 3650, loss[loss=0.2486, ctc_loss=0.1721, cr_loss=0.3824, over 16473.00 frames. ], tot_loss[loss=0.2219, ctc_loss=0.1488, cr_loss=0.3652, over 3341011.21 frames. ], batch size: 66, lr: 6.60e-03, grad_scale: 32.0 2024-09-23 19:08:45,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=326116.0, ans=0.2 2024-09-23 19:08:47,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=326116.0, ans=0.125 2024-09-23 19:09:27,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=326256.0, ans=0.07 2024-09-23 19:09:58,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2024-09-23 19:10:00,633 INFO [train.py:1198] (0/4) Epoch 18, batch 3700, loss[loss=0.2244, ctc_loss=0.1499, cr_loss=0.3726, over 17282.00 frames. ], tot_loss[loss=0.2217, ctc_loss=0.1487, cr_loss=0.3651, over 3341838.70 frames. ], batch size: 44, lr: 6.60e-03, grad_scale: 32.0 2024-09-23 19:10:08,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=326349.3333333333, ans=0.0 2024-09-23 19:10:18,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=326396.0, ans=0.125 2024-09-23 19:10:54,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-09-23 19:11:10,728 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.270e+02 1.376e+02 1.485e+02 2.172e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 19:11:13,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=326536.0, ans=0.0 2024-09-23 19:11:18,531 INFO [train.py:1198] (0/4) Epoch 18, batch 3750, loss[loss=0.2041, ctc_loss=0.1372, cr_loss=0.3348, over 17307.00 frames. ], tot_loss[loss=0.2222, ctc_loss=0.1491, cr_loss=0.3656, over 3336208.97 frames. ], batch size: 51, lr: 6.60e-03, grad_scale: 32.0 2024-09-23 19:11:25,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=326582.6666666667, ans=0.5 2024-09-23 19:11:27,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2024-09-23 19:11:48,690 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=22.5 2024-09-23 19:11:49,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=326676.0, ans=0.0 2024-09-23 19:12:03,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=326676.0, ans=0.125 2024-09-23 19:12:13,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=326722.6666666667, ans=0.0 2024-09-23 19:12:17,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=326722.6666666667, ans=0.0 2024-09-23 19:12:20,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=326769.3333333333, ans=0.125 2024-09-23 19:12:30,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=326769.3333333333, ans=0.025 2024-09-23 19:12:31,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=326769.3333333333, ans=0.0 2024-09-23 19:12:37,969 INFO [train.py:1198] (0/4) Epoch 18, batch 3800, loss[loss=0.2474, ctc_loss=0.1664, cr_loss=0.4049, over 15901.00 frames. ], tot_loss[loss=0.2225, ctc_loss=0.1496, cr_loss=0.3649, over 3307036.73 frames. ], batch size: 74, lr: 6.59e-03, grad_scale: 32.0 2024-09-23 19:13:01,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=326862.6666666667, ans=0.1 2024-09-23 19:13:11,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=326909.3333333333, ans=0.2 2024-09-23 19:13:13,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=326909.3333333333, ans=0.0 2024-09-23 19:13:25,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=326956.0, ans=0.1 2024-09-23 19:13:30,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=326956.0, ans=0.125 2024-09-23 19:13:49,015 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.286e+02 1.423e+02 1.561e+02 2.075e+02, threshold=2.847e+02, percent-clipped=0.0 2024-09-23 19:13:56,834 INFO [train.py:1198] (0/4) Epoch 18, batch 3850, loss[loss=0.289, ctc_loss=0.2061, cr_loss=0.4144, over 12020.00 frames. ], tot_loss[loss=0.2214, ctc_loss=0.1488, cr_loss=0.363, over 3292302.12 frames. ], batch size: 124, lr: 6.59e-03, grad_scale: 32.0 2024-09-23 19:14:43,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=327189.3333333333, ans=0.0 2024-09-23 19:14:47,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=327189.3333333333, ans=0.0 2024-09-23 19:14:51,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=327189.3333333333, ans=0.125 2024-09-23 19:14:52,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.73 vs. limit=10.0 2024-09-23 19:14:53,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=327189.3333333333, ans=10.0 2024-09-23 19:14:54,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=327189.3333333333, ans=0.125 2024-09-23 19:15:06,662 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-18.pt 2024-09-23 19:15:59,119 INFO [train.py:1198] (0/4) Epoch 19, batch 0, loss[loss=0.2329, ctc_loss=0.1585, cr_loss=0.3723, over 16937.00 frames. ], tot_loss[loss=0.2329, ctc_loss=0.1585, cr_loss=0.3723, over 16937.00 frames. ], batch size: 58, lr: 6.41e-03, grad_scale: 32.0 2024-09-23 19:15:59,120 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 19:16:14,372 INFO [train.py:1230] (0/4) Epoch 19, validation: loss=0.03972, ctc_loss=0.03972, cr_loss=8.025e-15, over 944034.00 frames. 2024-09-23 19:16:14,373 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 19:16:19,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=327264.0, ans=0.125 2024-09-23 19:16:27,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=327264.0, ans=0.07 2024-09-23 19:16:39,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=327310.6666666667, ans=0.0 2024-09-23 19:17:03,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=327357.3333333333, ans=0.125 2024-09-23 19:17:11,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2024-09-23 19:17:27,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=327450.6666666667, ans=0.125 2024-09-23 19:17:39,623 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.290e+02 1.473e+02 1.789e+02 2.583e+02, threshold=2.945e+02, percent-clipped=0.0 2024-09-23 19:17:41,332 INFO [train.py:1198] (0/4) Epoch 19, batch 50, loss[loss=0.1885, ctc_loss=0.119, cr_loss=0.3474, over 17109.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1457, cr_loss=0.36, over 757377.51 frames. ], batch size: 40, lr: 6.41e-03, grad_scale: 32.0 2024-09-23 19:18:08,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=327544.0, ans=0.0 2024-09-23 19:18:10,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=327544.0, ans=0.0 2024-09-23 19:18:19,572 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2024-09-23 19:18:48,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=22.5 2024-09-23 19:18:50,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=327684.0, ans=0.125 2024-09-23 19:18:51,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=327684.0, ans=0.05 2024-09-23 19:18:55,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=327684.0, ans=0.125 2024-09-23 19:19:03,591 INFO [train.py:1198] (0/4) Epoch 19, batch 100, loss[loss=0.2007, ctc_loss=0.1316, cr_loss=0.3456, over 17262.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1448, cr_loss=0.3599, over 1333811.22 frames. ], batch size: 44, lr: 6.41e-03, grad_scale: 32.0 2024-09-23 19:19:03,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=327730.6666666667, ans=0.0 2024-09-23 19:19:18,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=327777.3333333333, ans=0.0 2024-09-23 19:19:19,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=327777.3333333333, ans=0.125 2024-09-23 19:19:26,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=327777.3333333333, ans=0.125 2024-09-23 19:20:08,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=327917.3333333333, ans=0.125 2024-09-23 19:20:21,266 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.272e+02 1.373e+02 1.513e+02 2.086e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-23 19:20:22,823 INFO [train.py:1198] (0/4) Epoch 19, batch 150, loss[loss=0.1948, ctc_loss=0.1307, cr_loss=0.3205, over 16952.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.147, cr_loss=0.3631, over 1777885.40 frames. ], batch size: 42, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:20:27,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=327964.0, ans=0.125 2024-09-23 19:20:29,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=327964.0, ans=0.125 2024-09-23 19:20:38,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=328010.6666666667, ans=0.125 2024-09-23 19:20:46,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328010.6666666667, ans=0.1 2024-09-23 19:21:28,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=328150.6666666667, ans=0.2 2024-09-23 19:21:34,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=328150.6666666667, ans=0.125 2024-09-23 19:21:48,161 INFO [train.py:1198] (0/4) Epoch 19, batch 200, loss[loss=0.2324, ctc_loss=0.1596, cr_loss=0.3641, over 16038.00 frames. ], tot_loss[loss=0.2192, ctc_loss=0.1466, cr_loss=0.3629, over 2131967.62 frames. ], batch size: 74, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:21:51,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-09-23 19:22:01,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.72 vs. limit=12.0 2024-09-23 19:22:02,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=328244.0, ans=0.04949747468305833 2024-09-23 19:23:09,156 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.256e+02 1.366e+02 1.552e+02 2.193e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-23 19:23:10,723 INFO [train.py:1198] (0/4) Epoch 19, batch 250, loss[loss=0.2296, ctc_loss=0.1555, cr_loss=0.3705, over 17157.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1466, cr_loss=0.3634, over 2413893.40 frames. ], batch size: 48, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:23:15,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=328430.6666666667, ans=0.025 2024-09-23 19:24:01,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328570.6666666667, ans=0.1 2024-09-23 19:24:03,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=328570.6666666667, ans=0.0 2024-09-23 19:24:09,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=328570.6666666667, ans=0.0 2024-09-23 19:24:32,616 INFO [train.py:1198] (0/4) Epoch 19, batch 300, loss[loss=0.2486, ctc_loss=0.1733, cr_loss=0.3761, over 12118.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1477, cr_loss=0.3639, over 2597665.19 frames. ], batch size: 123, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:24:55,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328710.6666666667, ans=0.1 2024-09-23 19:25:06,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=328757.3333333333, ans=0.125 2024-09-23 19:25:13,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2024-09-23 19:25:14,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328757.3333333333, ans=0.125 2024-09-23 19:25:23,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.60 vs. limit=15.0 2024-09-23 19:25:29,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=328804.0, ans=0.125 2024-09-23 19:25:51,171 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.306e+02 1.376e+02 1.560e+02 2.317e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 19:25:51,698 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:25:52,848 INFO [train.py:1198] (0/4) Epoch 19, batch 350, loss[loss=0.1858, ctc_loss=0.1233, cr_loss=0.3127, over 17255.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1475, cr_loss=0.3634, over 2766119.10 frames. ], batch size: 44, lr: 6.40e-03, grad_scale: 32.0 2024-09-23 19:26:11,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.08 vs. limit=10.0 2024-09-23 19:26:23,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=328990.6666666667, ans=0.07 2024-09-23 19:27:04,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=329084.0, ans=0.0 2024-09-23 19:27:11,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=329084.0, ans=0.2 2024-09-23 19:27:16,269 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:27:17,488 INFO [train.py:1198] (0/4) Epoch 19, batch 400, loss[loss=0.2244, ctc_loss=0.1476, cr_loss=0.3841, over 17034.00 frames. ], tot_loss[loss=0.2196, ctc_loss=0.1471, cr_loss=0.3625, over 2898915.19 frames. ], batch size: 52, lr: 6.39e-03, grad_scale: 32.0 2024-09-23 19:27:18,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2024-09-23 19:27:33,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=329130.6666666667, ans=0.1 2024-09-23 19:28:07,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=329270.6666666667, ans=0.125 2024-09-23 19:28:22,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=329270.6666666667, ans=0.125 2024-09-23 19:28:41,427 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.241e+02 1.311e+02 1.431e+02 1.671e+02, threshold=2.622e+02, percent-clipped=0.0 2024-09-23 19:28:43,035 INFO [train.py:1198] (0/4) Epoch 19, batch 450, loss[loss=0.2139, ctc_loss=0.1403, cr_loss=0.3682, over 17170.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1478, cr_loss=0.3636, over 2988736.31 frames. ], batch size: 45, lr: 6.39e-03, grad_scale: 32.0 2024-09-23 19:28:52,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=329364.0, ans=0.07 2024-09-23 19:29:02,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329410.6666666667, ans=0.1 2024-09-23 19:29:36,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=329504.0, ans=0.0 2024-09-23 19:29:40,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=329504.0, ans=0.025 2024-09-23 19:30:03,313 INFO [train.py:1198] (0/4) Epoch 19, batch 500, loss[loss=0.2101, ctc_loss=0.1397, cr_loss=0.3518, over 17160.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1477, cr_loss=0.3644, over 3070843.19 frames. ], batch size: 48, lr: 6.39e-03, grad_scale: 32.0 2024-09-23 19:30:06,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329597.3333333333, ans=0.1 2024-09-23 19:30:34,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=329690.6666666667, ans=0.0 2024-09-23 19:31:01,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=329737.3333333333, ans=0.125 2024-09-23 19:31:03,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.02 vs. limit=6.0 2024-09-23 19:31:10,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=329784.0, ans=0.125 2024-09-23 19:31:21,608 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.237e+02 1.337e+02 1.500e+02 1.948e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-23 19:31:23,262 INFO [train.py:1198] (0/4) Epoch 19, batch 550, loss[loss=0.2544, ctc_loss=0.1718, cr_loss=0.4128, over 17308.00 frames. ], tot_loss[loss=0.2204, ctc_loss=0.1475, cr_loss=0.3641, over 3136291.64 frames. ], batch size: 51, lr: 6.39e-03, grad_scale: 32.0 2024-09-23 19:31:51,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=329877.3333333333, ans=0.125 2024-09-23 19:32:24,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=12.0 2024-09-23 19:32:26,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=329970.6666666667, ans=0.125 2024-09-23 19:32:48,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=330017.3333333333, ans=0.0 2024-09-23 19:32:51,190 INFO [train.py:1198] (0/4) Epoch 19, batch 600, loss[loss=0.2228, ctc_loss=0.1445, cr_loss=0.3912, over 17133.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.1476, cr_loss=0.3645, over 3176633.86 frames. ], batch size: 48, lr: 6.38e-03, grad_scale: 32.0 2024-09-23 19:32:54,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=330064.0, ans=0.125 2024-09-23 19:33:10,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=330110.6666666667, ans=0.2 2024-09-23 19:33:40,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=330204.0, ans=0.125 2024-09-23 19:34:05,566 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-09-23 19:34:12,591 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.274e+02 1.356e+02 1.568e+02 2.511e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-23 19:34:14,268 INFO [train.py:1198] (0/4) Epoch 19, batch 650, loss[loss=0.1666, ctc_loss=0.1085, cr_loss=0.2904, over 17054.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1467, cr_loss=0.3639, over 3220911.00 frames. ], batch size: 39, lr: 6.38e-03, grad_scale: 64.0 2024-09-23 19:34:25,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=330297.3333333333, ans=0.025 2024-09-23 19:34:27,393 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:34:43,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=330344.0, ans=0.0 2024-09-23 19:35:11,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=330437.3333333333, ans=0.125 2024-09-23 19:35:25,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2024-09-23 19:35:34,531 INFO [train.py:1198] (0/4) Epoch 19, batch 700, loss[loss=0.2174, ctc_loss=0.1429, cr_loss=0.3724, over 17267.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1466, cr_loss=0.3636, over 3258954.27 frames. ], batch size: 44, lr: 6.38e-03, grad_scale: 64.0 2024-09-23 19:35:42,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=330530.6666666667, ans=0.0 2024-09-23 19:36:02,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=330577.3333333333, ans=0.125 2024-09-23 19:36:08,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=330624.0, ans=0.125 2024-09-23 19:36:09,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2024-09-23 19:36:35,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=22.5 2024-09-23 19:36:44,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=330717.3333333333, ans=0.125 2024-09-23 19:36:58,035 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.250e+02 1.371e+02 1.487e+02 1.794e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-23 19:36:59,671 INFO [train.py:1198] (0/4) Epoch 19, batch 750, loss[loss=0.2618, ctc_loss=0.1814, cr_loss=0.4021, over 12216.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1467, cr_loss=0.3633, over 3277581.10 frames. ], batch size: 123, lr: 6.38e-03, grad_scale: 64.0 2024-09-23 19:37:03,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=330764.0, ans=0.025 2024-09-23 19:37:03,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=330764.0, ans=0.125 2024-09-23 19:37:05,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-23 19:37:49,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2024-09-23 19:37:55,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=330904.0, ans=0.0 2024-09-23 19:37:58,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=330904.0, ans=0.04949747468305833 2024-09-23 19:37:58,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=330904.0, ans=0.125 2024-09-23 19:38:25,056 INFO [train.py:1198] (0/4) Epoch 19, batch 800, loss[loss=0.2029, ctc_loss=0.1342, cr_loss=0.3434, over 17296.00 frames. ], tot_loss[loss=0.2195, ctc_loss=0.1469, cr_loss=0.3631, over 3296746.81 frames. ], batch size: 46, lr: 6.38e-03, grad_scale: 32.0 2024-09-23 19:38:29,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.71 vs. limit=10.0 2024-09-23 19:38:47,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=331044.0, ans=0.0 2024-09-23 19:38:55,616 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 19:38:58,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=331090.6666666667, ans=0.125 2024-09-23 19:39:37,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2024-09-23 19:39:43,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=331230.6666666667, ans=0.0 2024-09-23 19:39:44,571 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.257e+02 1.307e+02 1.431e+02 3.261e+02, threshold=2.613e+02, percent-clipped=1.0 2024-09-23 19:39:44,596 INFO [train.py:1198] (0/4) Epoch 19, batch 850, loss[loss=0.2042, ctc_loss=0.1367, cr_loss=0.3376, over 17123.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1473, cr_loss=0.363, over 3306337.25 frames. ], batch size: 40, lr: 6.37e-03, grad_scale: 32.0 2024-09-23 19:39:44,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=331230.6666666667, ans=0.125 2024-09-23 19:39:52,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=331230.6666666667, ans=0.125 2024-09-23 19:39:52,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=331230.6666666667, ans=0.1 2024-09-23 19:40:23,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=331324.0, ans=0.125 2024-09-23 19:40:31,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=331370.6666666667, ans=0.125 2024-09-23 19:40:39,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=331370.6666666667, ans=0.1 2024-09-23 19:40:50,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=331417.3333333333, ans=0.0 2024-09-23 19:40:52,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=331417.3333333333, ans=0.0 2024-09-23 19:41:04,317 INFO [train.py:1198] (0/4) Epoch 19, batch 900, loss[loss=0.175, ctc_loss=0.1143, cr_loss=0.3034, over 16309.00 frames. ], tot_loss[loss=0.2199, ctc_loss=0.1472, cr_loss=0.3631, over 3309793.27 frames. ], batch size: 36, lr: 6.37e-03, grad_scale: 16.0 2024-09-23 19:42:31,885 INFO [train.py:1198] (0/4) Epoch 19, batch 950, loss[loss=0.2093, ctc_loss=0.1419, cr_loss=0.3367, over 17225.00 frames. ], tot_loss[loss=0.2203, ctc_loss=0.1476, cr_loss=0.3637, over 3311359.48 frames. ], batch size: 50, lr: 6.37e-03, grad_scale: 16.0 2024-09-23 19:42:33,536 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.264e+02 1.377e+02 1.501e+02 1.833e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-23 19:42:33,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=331697.3333333333, ans=0.2 2024-09-23 19:42:37,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-09-23 19:43:27,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=331837.3333333333, ans=0.125 2024-09-23 19:43:54,921 INFO [train.py:1198] (0/4) Epoch 19, batch 1000, loss[loss=0.1734, ctc_loss=0.1116, cr_loss=0.3089, over 17278.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1462, cr_loss=0.3612, over 3328584.44 frames. ], batch size: 42, lr: 6.37e-03, grad_scale: 16.0 2024-09-23 19:44:07,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331930.6666666667, ans=0.1 2024-09-23 19:44:22,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=22.5 2024-09-23 19:44:56,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2024-09-23 19:45:01,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332117.3333333333, ans=0.1 2024-09-23 19:45:03,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=332117.3333333333, ans=0.125 2024-09-23 19:45:08,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-09-23 19:45:14,337 INFO [train.py:1198] (0/4) Epoch 19, batch 1050, loss[loss=0.2301, ctc_loss=0.1517, cr_loss=0.3919, over 16989.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1461, cr_loss=0.3603, over 3334493.23 frames. ], batch size: 56, lr: 6.36e-03, grad_scale: 16.0 2024-09-23 19:45:15,988 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.292e+02 1.398e+02 1.505e+02 1.868e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-23 19:45:36,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2024-09-23 19:45:55,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=332257.3333333333, ans=0.025 2024-09-23 19:45:57,281 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2024-09-23 19:45:58,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=332257.3333333333, ans=0.125 2024-09-23 19:45:59,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332257.3333333333, ans=0.1 2024-09-23 19:46:00,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2024-09-23 19:46:01,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=332304.0, ans=0.125 2024-09-23 19:46:13,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2024-09-23 19:46:39,720 INFO [train.py:1198] (0/4) Epoch 19, batch 1100, loss[loss=0.2232, ctc_loss=0.1465, cr_loss=0.3837, over 16964.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1455, cr_loss=0.3599, over 3342319.54 frames. ], batch size: 53, lr: 6.36e-03, grad_scale: 16.0 2024-09-23 19:46:43,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=22.5 2024-09-23 19:46:47,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=332397.3333333333, ans=0.125 2024-09-23 19:46:49,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=332397.3333333333, ans=0.125 2024-09-23 19:46:54,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-09-23 19:47:25,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=332490.6666666667, ans=0.125 2024-09-23 19:47:27,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2024-09-23 19:47:39,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=332537.3333333333, ans=0.0 2024-09-23 19:47:41,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=332537.3333333333, ans=0.125 2024-09-23 19:48:01,783 INFO [train.py:1198] (0/4) Epoch 19, batch 1150, loss[loss=0.2753, ctc_loss=0.1962, cr_loss=0.3954, over 11480.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1457, cr_loss=0.3599, over 3338002.76 frames. ], batch size: 123, lr: 6.36e-03, grad_scale: 16.0 2024-09-23 19:48:06,032 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.288e+02 1.429e+02 1.556e+02 3.211e+02, threshold=2.859e+02, percent-clipped=1.0 2024-09-23 19:48:48,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-09-23 19:48:51,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332770.6666666667, ans=0.1 2024-09-23 19:49:20,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=332817.3333333333, ans=0.125 2024-09-23 19:49:24,949 INFO [train.py:1198] (0/4) Epoch 19, batch 1200, loss[loss=0.1845, ctc_loss=0.1185, cr_loss=0.3303, over 17182.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1461, cr_loss=0.3615, over 3333822.95 frames. ], batch size: 41, lr: 6.36e-03, grad_scale: 32.0 2024-09-23 19:49:26,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=332864.0, ans=0.0 2024-09-23 19:49:41,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-09-23 19:50:01,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-09-23 19:50:02,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=332957.3333333333, ans=0.125 2024-09-23 19:50:16,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=333004.0, ans=0.125 2024-09-23 19:50:21,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=333004.0, ans=0.125 2024-09-23 19:50:33,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-09-23 19:50:39,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=333050.6666666667, ans=0.2 2024-09-23 19:50:45,364 INFO [train.py:1198] (0/4) Epoch 19, batch 1250, loss[loss=0.2151, ctc_loss=0.1447, cr_loss=0.3519, over 16838.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1465, cr_loss=0.3624, over 3332634.64 frames. ], batch size: 58, lr: 6.36e-03, grad_scale: 32.0 2024-09-23 19:50:46,878 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.324e+02 1.446e+02 1.574e+02 2.086e+02, threshold=2.891e+02, percent-clipped=0.0 2024-09-23 19:50:53,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=333097.3333333333, ans=0.0 2024-09-23 19:50:56,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=333097.3333333333, ans=0.125 2024-09-23 19:51:01,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=333144.0, ans=0.2 2024-09-23 19:51:14,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=333144.0, ans=0.2 2024-09-23 19:51:28,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=333190.6666666667, ans=0.0 2024-09-23 19:51:46,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=333237.3333333333, ans=0.0 2024-09-23 19:52:12,988 INFO [train.py:1198] (0/4) Epoch 19, batch 1300, loss[loss=0.1899, ctc_loss=0.1243, cr_loss=0.3281, over 17162.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1461, cr_loss=0.3617, over 3347517.95 frames. ], batch size: 45, lr: 6.35e-03, grad_scale: 16.0 2024-09-23 19:52:27,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=333377.3333333333, ans=0.025 2024-09-23 19:52:27,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333377.3333333333, ans=0.1 2024-09-23 19:52:29,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-09-23 19:52:40,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=333377.3333333333, ans=0.0 2024-09-23 19:52:51,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333424.0, ans=0.1 2024-09-23 19:53:35,936 INFO [train.py:1198] (0/4) Epoch 19, batch 1350, loss[loss=0.1935, ctc_loss=0.1244, cr_loss=0.3457, over 16960.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1466, cr_loss=0.3618, over 3331801.09 frames. ], batch size: 42, lr: 6.35e-03, grad_scale: 16.0 2024-09-23 19:53:39,097 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.258e+02 1.337e+02 1.461e+02 2.024e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-23 19:53:39,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=333564.0, ans=0.125 2024-09-23 19:53:56,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.54 vs. limit=22.5 2024-09-23 19:54:02,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=333610.6666666667, ans=0.2 2024-09-23 19:54:17,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.06 vs. limit=10.0 2024-09-23 19:54:18,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333657.3333333333, ans=0.1 2024-09-23 19:54:39,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333750.6666666667, ans=0.1 2024-09-23 19:54:51,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=333750.6666666667, ans=0.125 2024-09-23 19:54:55,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=333797.3333333333, ans=0.125 2024-09-23 19:54:56,468 INFO [train.py:1198] (0/4) Epoch 19, batch 1400, loss[loss=0.186, ctc_loss=0.1242, cr_loss=0.309, over 17077.00 frames. ], tot_loss[loss=0.2194, ctc_loss=0.1469, cr_loss=0.3626, over 3339199.77 frames. ], batch size: 40, lr: 6.35e-03, grad_scale: 16.0 2024-09-23 19:55:18,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2024-09-23 19:55:27,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=333890.6666666667, ans=0.0 2024-09-23 19:55:44,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=333937.3333333333, ans=0.0 2024-09-23 19:55:46,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2024-09-23 19:55:48,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=333937.3333333333, ans=0.125 2024-09-23 19:56:15,965 INFO [train.py:1198] (0/4) Epoch 19, batch 1450, loss[loss=0.1836, ctc_loss=0.1224, cr_loss=0.3058, over 17056.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.146, cr_loss=0.3614, over 3347860.57 frames. ], batch size: 39, lr: 6.35e-03, grad_scale: 16.0 2024-09-23 19:56:21,632 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.283e+02 1.375e+02 1.499e+02 2.283e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-23 19:56:42,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=334077.3333333333, ans=0.2 2024-09-23 19:56:51,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=334124.0, ans=0.125 2024-09-23 19:56:53,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334124.0, ans=0.1 2024-09-23 19:57:03,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2024-09-23 19:57:36,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=334217.3333333333, ans=0.125 2024-09-23 19:57:43,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=334264.0, ans=15.0 2024-09-23 19:57:43,773 INFO [train.py:1198] (0/4) Epoch 19, batch 1500, loss[loss=0.2477, ctc_loss=0.1669, cr_loss=0.4043, over 17015.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1455, cr_loss=0.3608, over 3352983.27 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 16.0 2024-09-23 19:58:01,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=334310.6666666667, ans=0.2 2024-09-23 19:58:33,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2024-09-23 19:59:05,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-23 19:59:06,200 INFO [train.py:1198] (0/4) Epoch 19, batch 1550, loss[loss=0.2235, ctc_loss=0.1499, cr_loss=0.3683, over 17298.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.146, cr_loss=0.3616, over 3355760.41 frames. ], batch size: 49, lr: 6.34e-03, grad_scale: 16.0 2024-09-23 19:59:08,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=22.5 2024-09-23 19:59:09,456 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.039e+02 1.278e+02 1.387e+02 1.518e+02 2.007e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-23 20:00:20,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2024-09-23 20:00:26,300 INFO [train.py:1198] (0/4) Epoch 19, batch 1600, loss[loss=0.2104, ctc_loss=0.1373, cr_loss=0.3656, over 17172.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.146, cr_loss=0.3615, over 3349840.05 frames. ], batch size: 41, lr: 6.34e-03, grad_scale: 32.0 2024-09-23 20:01:12,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=334870.6666666667, ans=0.125 2024-09-23 20:01:26,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2024-09-23 20:01:41,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-09-23 20:01:50,762 INFO [train.py:1198] (0/4) Epoch 19, batch 1650, loss[loss=0.1779, ctc_loss=0.1177, cr_loss=0.3013, over 16290.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1461, cr_loss=0.3618, over 3354880.89 frames. ], batch size: 36, lr: 6.34e-03, grad_scale: 32.0 2024-09-23 20:01:53,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2024-09-23 20:01:53,946 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.297e+02 1.356e+02 1.480e+02 2.099e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-23 20:02:01,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=334964.0, ans=0.2 2024-09-23 20:02:19,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-09-23 20:02:48,407 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:02:54,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=335104.0, ans=0.1 2024-09-23 20:03:16,143 INFO [train.py:1198] (0/4) Epoch 19, batch 1700, loss[loss=0.2272, ctc_loss=0.149, cr_loss=0.3908, over 17292.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1464, cr_loss=0.3628, over 3362302.47 frames. ], batch size: 49, lr: 6.34e-03, grad_scale: 32.0 2024-09-23 20:03:19,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=335197.3333333333, ans=0.125 2024-09-23 20:03:30,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=335244.0, ans=0.125 2024-09-23 20:03:31,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-23 20:03:36,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=335244.0, ans=0.1 2024-09-23 20:03:40,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=335244.0, ans=0.125 2024-09-23 20:03:44,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2024-09-23 20:03:54,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335290.6666666667, ans=0.1 2024-09-23 20:03:54,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=335290.6666666667, ans=0.125 2024-09-23 20:04:36,069 INFO [train.py:1198] (0/4) Epoch 19, batch 1750, loss[loss=0.2389, ctc_loss=0.1597, cr_loss=0.3962, over 17201.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1453, cr_loss=0.3608, over 3359921.68 frames. ], batch size: 55, lr: 6.33e-03, grad_scale: 32.0 2024-09-23 20:04:39,273 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.243e+02 1.345e+02 1.441e+02 1.973e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-23 20:04:39,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=335430.6666666667, ans=0.1 2024-09-23 20:05:09,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=335524.0, ans=0.125 2024-09-23 20:05:40,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.97 vs. limit=22.5 2024-09-23 20:05:41,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=335617.3333333333, ans=0.025 2024-09-23 20:05:44,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-09-23 20:05:46,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=335617.3333333333, ans=0.0 2024-09-23 20:05:55,468 INFO [train.py:1198] (0/4) Epoch 19, batch 1800, loss[loss=0.2038, ctc_loss=0.1313, cr_loss=0.3626, over 17093.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.146, cr_loss=0.3624, over 3359095.57 frames. ], batch size: 43, lr: 6.33e-03, grad_scale: 32.0 2024-09-23 20:05:57,678 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=22.5 2024-09-23 20:06:13,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=335710.6666666667, ans=0.125 2024-09-23 20:06:24,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=12.0 2024-09-23 20:06:28,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=335710.6666666667, ans=0.2 2024-09-23 20:06:36,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=335757.3333333333, ans=0.04949747468305833 2024-09-23 20:06:39,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=335757.3333333333, ans=0.125 2024-09-23 20:07:01,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=335804.0, ans=0.125 2024-09-23 20:07:02,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=335804.0, ans=0.125 2024-09-23 20:07:07,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=335850.6666666667, ans=0.0 2024-09-23 20:07:22,993 INFO [train.py:1198] (0/4) Epoch 19, batch 1850, loss[loss=0.2121, ctc_loss=0.1404, cr_loss=0.3583, over 17031.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1451, cr_loss=0.3609, over 3352275.41 frames. ], batch size: 44, lr: 6.33e-03, grad_scale: 32.0 2024-09-23 20:07:26,228 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.276e+02 1.381e+02 1.510e+02 2.241e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-23 20:07:52,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.19 vs. limit=10.0 2024-09-23 20:07:55,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=335990.6666666667, ans=0.125 2024-09-23 20:07:55,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=335990.6666666667, ans=0.125 2024-09-23 20:07:56,920 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-72000.pt 2024-09-23 20:08:26,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=336037.3333333333, ans=0.2 2024-09-23 20:08:26,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=336037.3333333333, ans=0.0 2024-09-23 20:08:31,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=336084.0, ans=0.09899494936611666 2024-09-23 20:08:45,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=336084.0, ans=0.2 2024-09-23 20:08:48,089 INFO [train.py:1198] (0/4) Epoch 19, batch 1900, loss[loss=0.211, ctc_loss=0.1411, cr_loss=0.3498, over 17292.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1453, cr_loss=0.3612, over 3355554.74 frames. ], batch size: 51, lr: 6.33e-03, grad_scale: 32.0 2024-09-23 20:09:09,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=336177.3333333333, ans=0.125 2024-09-23 20:09:09,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=336177.3333333333, ans=10.0 2024-09-23 20:09:50,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=336317.3333333333, ans=0.0 2024-09-23 20:10:07,582 INFO [train.py:1198] (0/4) Epoch 19, batch 1950, loss[loss=0.1735, ctc_loss=0.1136, cr_loss=0.2991, over 17092.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1449, cr_loss=0.36, over 3352339.97 frames. ], batch size: 40, lr: 6.32e-03, grad_scale: 32.0 2024-09-23 20:10:10,844 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.232e+02 1.325e+02 1.435e+02 2.417e+02, threshold=2.650e+02, percent-clipped=0.0 2024-09-23 20:10:19,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.86 vs. limit=10.0 2024-09-23 20:10:33,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=336410.6666666667, ans=0.125 2024-09-23 20:10:53,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=336457.3333333333, ans=0.125 2024-09-23 20:11:17,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=336550.6666666667, ans=0.125 2024-09-23 20:11:33,082 INFO [train.py:1198] (0/4) Epoch 19, batch 2000, loss[loss=0.2164, ctc_loss=0.145, cr_loss=0.357, over 17204.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1445, cr_loss=0.3593, over 3363536.71 frames. ], batch size: 55, lr: 6.32e-03, grad_scale: 32.0 2024-09-23 20:11:38,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=336597.3333333333, ans=0.025 2024-09-23 20:11:52,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=336644.0, ans=0.125 2024-09-23 20:11:54,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=336644.0, ans=0.125 2024-09-23 20:12:00,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=336644.0, ans=0.125 2024-09-23 20:12:06,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=336690.6666666667, ans=0.0 2024-09-23 20:12:30,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2024-09-23 20:12:33,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=336737.3333333333, ans=0.0 2024-09-23 20:12:35,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=336737.3333333333, ans=0.0 2024-09-23 20:12:40,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2024-09-23 20:12:41,451 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:12:51,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-09-23 20:12:54,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2024-09-23 20:12:55,371 INFO [train.py:1198] (0/4) Epoch 19, batch 2050, loss[loss=0.2535, ctc_loss=0.171, cr_loss=0.4125, over 17011.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.145, cr_loss=0.3603, over 3370660.66 frames. ], batch size: 56, lr: 6.32e-03, grad_scale: 32.0 2024-09-23 20:12:58,517 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.236e+02 1.316e+02 1.436e+02 2.046e+02, threshold=2.631e+02, percent-clipped=0.0 2024-09-23 20:13:10,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=336830.6666666667, ans=0.125 2024-09-23 20:13:32,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.84 vs. limit=10.0 2024-09-23 20:13:33,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=336924.0, ans=0.125 2024-09-23 20:13:36,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=336924.0, ans=0.025 2024-09-23 20:13:51,822 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=12.0 2024-09-23 20:14:00,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337017.3333333333, ans=0.1 2024-09-23 20:14:18,068 INFO [train.py:1198] (0/4) Epoch 19, batch 2100, loss[loss=0.2384, ctc_loss=0.1618, cr_loss=0.3833, over 17135.00 frames. ], tot_loss[loss=0.217, ctc_loss=0.145, cr_loss=0.3601, over 3370485.04 frames. ], batch size: 48, lr: 6.32e-03, grad_scale: 16.0 2024-09-23 20:14:18,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337064.0, ans=0.1 2024-09-23 20:14:32,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=337110.6666666667, ans=0.125 2024-09-23 20:14:45,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=337110.6666666667, ans=0.125 2024-09-23 20:14:53,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=337157.3333333333, ans=0.125 2024-09-23 20:15:12,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=337204.0, ans=0.05 2024-09-23 20:15:37,918 INFO [train.py:1198] (0/4) Epoch 19, batch 2150, loss[loss=0.231, ctc_loss=0.1525, cr_loss=0.3929, over 17034.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1451, cr_loss=0.3599, over 3368482.59 frames. ], batch size: 53, lr: 6.32e-03, grad_scale: 8.0 2024-09-23 20:15:41,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=337297.3333333333, ans=0.0 2024-09-23 20:15:44,256 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.267e+02 1.340e+02 1.517e+02 2.263e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-23 20:15:54,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2024-09-23 20:16:02,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-23 20:16:41,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=337437.3333333333, ans=0.0 2024-09-23 20:16:44,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=337437.3333333333, ans=0.125 2024-09-23 20:16:58,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=337484.0, ans=0.125 2024-09-23 20:17:04,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337530.6666666667, ans=0.1 2024-09-23 20:17:05,772 INFO [train.py:1198] (0/4) Epoch 19, batch 2200, loss[loss=0.2345, ctc_loss=0.159, cr_loss=0.3776, over 17238.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1452, cr_loss=0.3602, over 3367449.66 frames. ], batch size: 50, lr: 6.31e-03, grad_scale: 8.0 2024-09-23 20:17:07,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=337530.6666666667, ans=0.09899494936611666 2024-09-23 20:17:16,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2024-09-23 20:17:20,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=337577.3333333333, ans=0.0 2024-09-23 20:18:17,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=337717.3333333333, ans=0.0 2024-09-23 20:18:18,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=337717.3333333333, ans=0.025 2024-09-23 20:18:20,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=337717.3333333333, ans=0.0 2024-09-23 20:18:28,142 INFO [train.py:1198] (0/4) Epoch 19, batch 2250, loss[loss=0.2732, ctc_loss=0.1853, cr_loss=0.4394, over 16744.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1455, cr_loss=0.3604, over 3364628.28 frames. ], batch size: 61, lr: 6.31e-03, grad_scale: 8.0 2024-09-23 20:18:34,522 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.290e+02 1.402e+02 1.494e+02 5.352e+02, threshold=2.803e+02, percent-clipped=1.0 2024-09-23 20:18:38,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=337764.0, ans=0.125 2024-09-23 20:19:10,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=337857.3333333333, ans=0.025 2024-09-23 20:19:48,421 INFO [train.py:1198] (0/4) Epoch 19, batch 2300, loss[loss=0.2098, ctc_loss=0.1398, cr_loss=0.3502, over 17003.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1459, cr_loss=0.3615, over 3362382.01 frames. ], batch size: 44, lr: 6.31e-03, grad_scale: 8.0 2024-09-23 20:21:09,228 INFO [train.py:1198] (0/4) Epoch 19, batch 2350, loss[loss=0.1818, ctc_loss=0.1216, cr_loss=0.3006, over 17189.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1459, cr_loss=0.3611, over 3360193.14 frames. ], batch size: 41, lr: 6.31e-03, grad_scale: 8.0 2024-09-23 20:21:11,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338230.6666666667, ans=0.1 2024-09-23 20:21:18,094 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.269e+02 1.353e+02 1.497e+02 2.252e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-23 20:21:22,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338230.6666666667, ans=0.1 2024-09-23 20:21:35,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=338277.3333333333, ans=0.2 2024-09-23 20:21:44,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=338324.0, ans=0.125 2024-09-23 20:22:25,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=338417.3333333333, ans=0.0 2024-09-23 20:22:36,780 INFO [train.py:1198] (0/4) Epoch 19, batch 2400, loss[loss=0.2104, ctc_loss=0.1354, cr_loss=0.3747, over 17363.00 frames. ], tot_loss[loss=0.2191, ctc_loss=0.1466, cr_loss=0.3626, over 3351367.16 frames. ], batch size: 48, lr: 6.31e-03, grad_scale: 16.0 2024-09-23 20:23:02,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2024-09-23 20:23:08,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338510.6666666667, ans=0.1 2024-09-23 20:23:22,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=338557.3333333333, ans=10.0 2024-09-23 20:23:56,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=338650.6666666667, ans=0.125 2024-09-23 20:23:59,489 INFO [train.py:1198] (0/4) Epoch 19, batch 2450, loss[loss=0.1964, ctc_loss=0.1295, cr_loss=0.3345, over 17253.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1454, cr_loss=0.3601, over 3344767.36 frames. ], batch size: 44, lr: 6.30e-03, grad_scale: 16.0 2024-09-23 20:24:05,811 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.266e+02 1.363e+02 1.497e+02 1.978e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-23 20:24:25,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=22.5 2024-09-23 20:24:31,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=12.0 2024-09-23 20:24:50,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=338837.3333333333, ans=0.125 2024-09-23 20:25:03,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.38 vs. limit=10.0 2024-09-23 20:25:15,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=338884.0, ans=22.5 2024-09-23 20:25:19,142 INFO [train.py:1198] (0/4) Epoch 19, batch 2500, loss[loss=0.1778, ctc_loss=0.1132, cr_loss=0.323, over 17196.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.145, cr_loss=0.3593, over 3355618.08 frames. ], batch size: 41, lr: 6.30e-03, grad_scale: 16.0 2024-09-23 20:25:19,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=338930.6666666667, ans=0.125 2024-09-23 20:25:25,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=338930.6666666667, ans=0.125 2024-09-23 20:25:35,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338977.3333333333, ans=0.1 2024-09-23 20:25:45,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=338977.3333333333, ans=0.025 2024-09-23 20:26:04,950 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.74 vs. limit=15.0 2024-09-23 20:26:15,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=339070.6666666667, ans=0.0 2024-09-23 20:26:18,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=339070.6666666667, ans=0.07 2024-09-23 20:26:24,507 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.03 vs. limit=15.0 2024-09-23 20:26:28,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=339117.3333333333, ans=0.125 2024-09-23 20:26:30,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=22.5 2024-09-23 20:26:39,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-09-23 20:26:43,707 INFO [train.py:1198] (0/4) Epoch 19, batch 2550, loss[loss=0.238, ctc_loss=0.1579, cr_loss=0.4008, over 17300.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1461, cr_loss=0.3615, over 3345238.78 frames. ], batch size: 49, lr: 6.30e-03, grad_scale: 16.0 2024-09-23 20:26:52,585 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.236e+02 1.315e+02 1.425e+02 2.139e+02, threshold=2.630e+02, percent-clipped=0.0 2024-09-23 20:26:52,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339164.0, ans=0.125 2024-09-23 20:27:07,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339210.6666666667, ans=0.1 2024-09-23 20:27:11,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339210.6666666667, ans=0.1 2024-09-23 20:27:37,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=339304.0, ans=0.125 2024-09-23 20:27:59,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=339350.6666666667, ans=0.2 2024-09-23 20:28:08,359 INFO [train.py:1198] (0/4) Epoch 19, batch 2600, loss[loss=0.2527, ctc_loss=0.1704, cr_loss=0.4113, over 16910.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1461, cr_loss=0.3616, over 3350983.89 frames. ], batch size: 58, lr: 6.30e-03, grad_scale: 16.0 2024-09-23 20:28:34,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=339444.0, ans=0.09899494936611666 2024-09-23 20:28:45,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=339490.6666666667, ans=0.0 2024-09-23 20:28:46,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-09-23 20:28:47,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=339490.6666666667, ans=0.1 2024-09-23 20:29:06,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=339537.3333333333, ans=0.125 2024-09-23 20:29:14,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=22.5 2024-09-23 20:29:14,996 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=15.0 2024-09-23 20:29:28,191 INFO [train.py:1198] (0/4) Epoch 19, batch 2650, loss[loss=0.2087, ctc_loss=0.1387, cr_loss=0.3503, over 16995.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1454, cr_loss=0.3607, over 3357504.95 frames. ], batch size: 53, lr: 6.29e-03, grad_scale: 16.0 2024-09-23 20:29:34,368 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.315e+02 1.464e+02 1.614e+02 2.211e+02, threshold=2.927e+02, percent-clipped=0.0 2024-09-23 20:29:41,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=339630.6666666667, ans=0.2 2024-09-23 20:30:27,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=339770.6666666667, ans=0.125 2024-09-23 20:30:48,090 INFO [train.py:1198] (0/4) Epoch 19, batch 2700, loss[loss=0.2862, ctc_loss=0.1963, cr_loss=0.4495, over 15211.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.1453, cr_loss=0.3616, over 3366904.80 frames. ], batch size: 89, lr: 6.29e-03, grad_scale: 16.0 2024-09-23 20:30:51,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339864.0, ans=0.125 2024-09-23 20:31:35,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=339957.3333333333, ans=0.0 2024-09-23 20:31:49,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-23 20:31:54,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=340004.0, ans=0.1 2024-09-23 20:31:55,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=340004.0, ans=0.125 2024-09-23 20:32:00,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=340050.6666666667, ans=0.125 2024-09-23 20:32:13,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=340050.6666666667, ans=0.2 2024-09-23 20:32:14,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=340097.3333333333, ans=0.0 2024-09-23 20:32:16,075 INFO [train.py:1198] (0/4) Epoch 19, batch 2750, loss[loss=0.2094, ctc_loss=0.1395, cr_loss=0.3498, over 17057.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1464, cr_loss=0.3632, over 3357194.00 frames. ], batch size: 56, lr: 6.29e-03, grad_scale: 16.0 2024-09-23 20:32:22,222 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.266e+02 1.341e+02 1.484e+02 3.814e+02, threshold=2.681e+02, percent-clipped=1.0 2024-09-23 20:32:22,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=340097.3333333333, ans=0.125 2024-09-23 20:32:24,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=340097.3333333333, ans=0.125 2024-09-23 20:32:52,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340190.6666666667, ans=0.1 2024-09-23 20:33:37,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340330.6666666667, ans=0.125 2024-09-23 20:33:38,451 INFO [train.py:1198] (0/4) Epoch 19, batch 2800, loss[loss=0.1813, ctc_loss=0.1175, cr_loss=0.3188, over 17099.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.146, cr_loss=0.3623, over 3363434.01 frames. ], batch size: 43, lr: 6.29e-03, grad_scale: 32.0 2024-09-23 20:34:15,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=340424.0, ans=0.0 2024-09-23 20:34:58,054 INFO [train.py:1198] (0/4) Epoch 19, batch 2850, loss[loss=0.2274, ctc_loss=0.1539, cr_loss=0.3676, over 17034.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1451, cr_loss=0.361, over 3368087.84 frames. ], batch size: 56, lr: 6.29e-03, grad_scale: 32.0 2024-09-23 20:35:04,576 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.252e+02 1.382e+02 1.524e+02 2.298e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-23 20:35:11,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=340564.0, ans=0.5 2024-09-23 20:35:15,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.23 vs. limit=22.5 2024-09-23 20:35:17,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=340610.6666666667, ans=0.0 2024-09-23 20:35:27,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=12.0 2024-09-23 20:35:31,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=340657.3333333333, ans=0.125 2024-09-23 20:35:50,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=340704.0, ans=0.2 2024-09-23 20:35:56,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=340704.0, ans=0.1 2024-09-23 20:36:22,621 INFO [train.py:1198] (0/4) Epoch 19, batch 2900, loss[loss=0.2138, ctc_loss=0.1387, cr_loss=0.3752, over 17146.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1453, cr_loss=0.3608, over 3355707.19 frames. ], batch size: 48, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:36:51,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=340844.0, ans=0.125 2024-09-23 20:37:08,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=340890.6666666667, ans=0.2 2024-09-23 20:37:34,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=340984.0, ans=0.125 2024-09-23 20:37:47,691 INFO [train.py:1198] (0/4) Epoch 19, batch 2950, loss[loss=0.2119, ctc_loss=0.1388, cr_loss=0.3657, over 17191.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1457, cr_loss=0.3617, over 3361174.81 frames. ], batch size: 41, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:37:55,447 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.260e+02 1.402e+02 1.500e+02 2.241e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-23 20:38:02,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341077.3333333333, ans=0.1 2024-09-23 20:38:21,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=341124.0, ans=0.0 2024-09-23 20:38:25,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2024-09-23 20:38:37,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=341170.6666666667, ans=0.125 2024-09-23 20:38:49,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341217.3333333333, ans=0.1 2024-09-23 20:39:06,541 INFO [train.py:1198] (0/4) Epoch 19, batch 3000, loss[loss=0.1838, ctc_loss=0.1206, cr_loss=0.3161, over 17117.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1448, cr_loss=0.3605, over 3369997.75 frames. ], batch size: 40, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:39:06,542 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 20:39:21,854 INFO [train.py:1230] (0/4) Epoch 19, validation: loss=0.03984, ctc_loss=0.03984, cr_loss=8.01e-15, over 944034.00 frames. 2024-09-23 20:39:21,855 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 20:39:50,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.0 2024-09-23 20:39:59,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=341357.3333333333, ans=0.0 2024-09-23 20:40:11,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2024-09-23 20:40:40,305 INFO [train.py:1198] (0/4) Epoch 19, batch 3050, loss[loss=0.2515, ctc_loss=0.1697, cr_loss=0.4092, over 16497.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1443, cr_loss=0.3599, over 3368288.91 frames. ], batch size: 66, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:40:43,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=341497.3333333333, ans=0.2 2024-09-23 20:40:48,103 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.265e+02 1.361e+02 1.501e+02 2.045e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-23 20:40:57,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=341544.0, ans=0.125 2024-09-23 20:41:55,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=341684.0, ans=0.125 2024-09-23 20:41:59,531 INFO [train.py:1198] (0/4) Epoch 19, batch 3100, loss[loss=0.2171, ctc_loss=0.1461, cr_loss=0.3554, over 17356.00 frames. ], tot_loss[loss=0.2183, ctc_loss=0.146, cr_loss=0.3616, over 3357838.07 frames. ], batch size: 48, lr: 6.28e-03, grad_scale: 16.0 2024-09-23 20:42:01,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=341730.6666666667, ans=0.125 2024-09-23 20:42:09,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=341730.6666666667, ans=0.125 2024-09-23 20:42:43,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341824.0, ans=0.1 2024-09-23 20:43:20,311 INFO [train.py:1198] (0/4) Epoch 19, batch 3150, loss[loss=0.2285, ctc_loss=0.1571, cr_loss=0.3569, over 17155.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.1463, cr_loss=0.3621, over 3357552.73 frames. ], batch size: 45, lr: 6.27e-03, grad_scale: 16.0 2024-09-23 20:43:20,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=341964.0, ans=0.0 2024-09-23 20:43:30,465 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.288e+02 1.385e+02 1.538e+02 2.696e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-23 20:43:56,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=342057.3333333333, ans=0.0 2024-09-23 20:43:59,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=22.5 2024-09-23 20:44:21,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-09-23 20:44:30,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=342150.6666666667, ans=0.125 2024-09-23 20:44:42,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=342197.3333333333, ans=0.0 2024-09-23 20:44:42,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=342197.3333333333, ans=0.07 2024-09-23 20:44:44,044 INFO [train.py:1198] (0/4) Epoch 19, batch 3200, loss[loss=0.2255, ctc_loss=0.1487, cr_loss=0.3839, over 17008.00 frames. ], tot_loss[loss=0.2202, ctc_loss=0.1474, cr_loss=0.3641, over 3347555.71 frames. ], batch size: 51, lr: 6.27e-03, grad_scale: 32.0 2024-09-23 20:45:31,032 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:45:32,726 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 20:45:32,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-09-23 20:45:38,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=342337.3333333333, ans=0.125 2024-09-23 20:45:47,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-09-23 20:45:54,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=342384.0, ans=0.2 2024-09-23 20:46:01,867 INFO [train.py:1198] (0/4) Epoch 19, batch 3250, loss[loss=0.2204, ctc_loss=0.1475, cr_loss=0.3643, over 17221.00 frames. ], tot_loss[loss=0.2206, ctc_loss=0.1478, cr_loss=0.3642, over 3334263.86 frames. ], batch size: 55, lr: 6.27e-03, grad_scale: 16.0 2024-09-23 20:46:06,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=342430.6666666667, ans=0.2 2024-09-23 20:46:11,249 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.296e+02 1.376e+02 1.501e+02 2.697e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-23 20:46:19,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=15.0 2024-09-23 20:46:21,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.57 vs. limit=22.5 2024-09-23 20:47:22,260 INFO [train.py:1198] (0/4) Epoch 19, batch 3300, loss[loss=0.2076, ctc_loss=0.1396, cr_loss=0.3398, over 17341.00 frames. ], tot_loss[loss=0.2201, ctc_loss=0.1474, cr_loss=0.3636, over 3343792.66 frames. ], batch size: 48, lr: 6.27e-03, grad_scale: 16.0 2024-09-23 20:47:24,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=342664.0, ans=0.125 2024-09-23 20:47:27,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=342664.0, ans=0.0 2024-09-23 20:47:28,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=342664.0, ans=0.125 2024-09-23 20:47:46,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=342710.6666666667, ans=0.0 2024-09-23 20:48:30,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=342850.6666666667, ans=0.0 2024-09-23 20:48:41,090 INFO [train.py:1198] (0/4) Epoch 19, batch 3350, loss[loss=0.2275, ctc_loss=0.152, cr_loss=0.3774, over 17291.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1461, cr_loss=0.3617, over 3352513.94 frames. ], batch size: 51, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:48:50,525 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.298e+02 1.381e+02 1.515e+02 2.027e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-23 20:49:09,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=342944.0, ans=0.2 2024-09-23 20:49:15,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=342990.6666666667, ans=0.125 2024-09-23 20:49:23,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=342990.6666666667, ans=0.125 2024-09-23 20:49:39,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=343037.3333333333, ans=0.0 2024-09-23 20:49:53,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=343084.0, ans=0.2 2024-09-23 20:49:59,654 INFO [train.py:1198] (0/4) Epoch 19, batch 3400, loss[loss=0.2008, ctc_loss=0.1332, cr_loss=0.338, over 17256.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1457, cr_loss=0.3611, over 3349752.57 frames. ], batch size: 44, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:50:03,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=343130.6666666667, ans=0.025 2024-09-23 20:50:28,303 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=22.5 2024-09-23 20:50:32,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=343224.0, ans=0.125 2024-09-23 20:50:53,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=343270.6666666667, ans=0.125 2024-09-23 20:51:09,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=343317.3333333333, ans=0.125 2024-09-23 20:51:17,915 INFO [train.py:1198] (0/4) Epoch 19, batch 3450, loss[loss=0.2078, ctc_loss=0.1366, cr_loss=0.3561, over 17341.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1458, cr_loss=0.3614, over 3352014.59 frames. ], batch size: 48, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:51:27,572 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.246e+02 1.361e+02 1.474e+02 2.072e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-23 20:52:05,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=15.0 2024-09-23 20:52:08,896 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.47 vs. limit=10.0 2024-09-23 20:52:20,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=343550.6666666667, ans=0.0 2024-09-23 20:52:21,246 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2024-09-23 20:52:28,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=343550.6666666667, ans=0.125 2024-09-23 20:52:31,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=343550.6666666667, ans=0.04949747468305833 2024-09-23 20:52:36,231 INFO [train.py:1198] (0/4) Epoch 19, batch 3500, loss[loss=0.2214, ctc_loss=0.1472, cr_loss=0.3714, over 17336.00 frames. ], tot_loss[loss=0.2181, ctc_loss=0.1458, cr_loss=0.3611, over 3348781.59 frames. ], batch size: 48, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:52:53,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=343644.0, ans=0.125 2024-09-23 20:52:59,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=343644.0, ans=0.05 2024-09-23 20:52:59,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=343644.0, ans=0.125 2024-09-23 20:53:27,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=12.0 2024-09-23 20:53:50,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=343784.0, ans=0.0 2024-09-23 20:53:56,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=343830.6666666667, ans=0.125 2024-09-23 20:53:57,647 INFO [train.py:1198] (0/4) Epoch 19, batch 3550, loss[loss=0.2303, ctc_loss=0.1559, cr_loss=0.3722, over 17135.00 frames. ], tot_loss[loss=0.2186, ctc_loss=0.1462, cr_loss=0.3624, over 3356594.76 frames. ], batch size: 48, lr: 6.26e-03, grad_scale: 16.0 2024-09-23 20:54:07,041 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.247e+02 1.326e+02 1.445e+02 2.020e+02, threshold=2.653e+02, percent-clipped=0.0 2024-09-23 20:54:40,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=343924.0, ans=0.125 2024-09-23 20:55:17,372 INFO [train.py:1198] (0/4) Epoch 19, batch 3600, loss[loss=0.1658, ctc_loss=0.1068, cr_loss=0.2948, over 16176.00 frames. ], tot_loss[loss=0.2189, ctc_loss=0.1463, cr_loss=0.3632, over 3351260.40 frames. ], batch size: 36, lr: 6.25e-03, grad_scale: 32.0 2024-09-23 20:55:33,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=344110.6666666667, ans=0.0 2024-09-23 20:55:35,392 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2024-09-23 20:55:37,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=344110.6666666667, ans=0.2 2024-09-23 20:55:39,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=344110.6666666667, ans=0.0 2024-09-23 20:55:40,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=344110.6666666667, ans=0.125 2024-09-23 20:56:10,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=344204.0, ans=0.125 2024-09-23 20:56:35,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=344297.3333333333, ans=0.125 2024-09-23 20:56:36,799 INFO [train.py:1198] (0/4) Epoch 19, batch 3650, loss[loss=0.1996, ctc_loss=0.1293, cr_loss=0.3515, over 17192.00 frames. ], tot_loss[loss=0.2193, ctc_loss=0.1465, cr_loss=0.364, over 3349577.25 frames. ], batch size: 41, lr: 6.25e-03, grad_scale: 32.0 2024-09-23 20:56:38,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344297.3333333333, ans=0.1 2024-09-23 20:56:40,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=344297.3333333333, ans=0.0 2024-09-23 20:56:47,598 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.302e+02 1.379e+02 1.507e+02 2.534e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-23 20:57:21,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=344390.6666666667, ans=0.125 2024-09-23 20:57:24,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=344437.3333333333, ans=0.0 2024-09-23 20:57:29,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=344437.3333333333, ans=0.125 2024-09-23 20:57:31,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=22.5 2024-09-23 20:57:53,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.92 vs. limit=10.0 2024-09-23 20:57:56,274 INFO [train.py:1198] (0/4) Epoch 19, batch 3700, loss[loss=0.1953, ctc_loss=0.1289, cr_loss=0.3323, over 15745.00 frames. ], tot_loss[loss=0.2179, ctc_loss=0.1456, cr_loss=0.3617, over 3348804.50 frames. ], batch size: 35, lr: 6.25e-03, grad_scale: 16.0 2024-09-23 20:58:14,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=344577.3333333333, ans=15.0 2024-09-23 20:58:40,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=344624.0, ans=0.125 2024-09-23 20:59:14,535 INFO [train.py:1198] (0/4) Epoch 19, batch 3750, loss[loss=0.2332, ctc_loss=0.1588, cr_loss=0.3722, over 16994.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1451, cr_loss=0.3613, over 3353267.79 frames. ], batch size: 53, lr: 6.25e-03, grad_scale: 16.0 2024-09-23 20:59:24,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=344764.0, ans=0.0 2024-09-23 20:59:24,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=15.0 2024-09-23 20:59:25,487 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.321e+02 1.412e+02 1.562e+02 2.185e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-23 20:59:40,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-23 20:59:42,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=344810.6666666667, ans=0.0 2024-09-23 20:59:48,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=344857.3333333333, ans=0.0 2024-09-23 20:59:49,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=22.5 2024-09-23 20:59:55,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=344857.3333333333, ans=0.2 2024-09-23 21:00:32,422 INFO [train.py:1198] (0/4) Epoch 19, batch 3800, loss[loss=0.1872, ctc_loss=0.1201, cr_loss=0.3351, over 16938.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1457, cr_loss=0.3614, over 3321356.17 frames. ], batch size: 42, lr: 6.25e-03, grad_scale: 16.0 2024-09-23 21:00:54,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345044.0, ans=0.1 2024-09-23 21:01:02,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=345090.6666666667, ans=0.125 2024-09-23 21:01:11,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=345090.6666666667, ans=0.0 2024-09-23 21:01:22,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=345137.3333333333, ans=0.1 2024-09-23 21:01:45,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2024-09-23 21:01:50,703 INFO [train.py:1198] (0/4) Epoch 19, batch 3850, loss[loss=0.2881, ctc_loss=0.2031, cr_loss=0.4251, over 11794.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1449, cr_loss=0.3586, over 3285637.85 frames. ], batch size: 126, lr: 6.24e-03, grad_scale: 16.0 2024-09-23 21:02:01,516 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.310e+02 1.460e+02 1.598e+02 2.355e+02, threshold=2.920e+02, percent-clipped=0.0 2024-09-23 21:02:02,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=12.0 2024-09-23 21:02:20,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=345324.0, ans=0.015 2024-09-23 21:02:20,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=345324.0, ans=0.0 2024-09-23 21:03:01,427 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-19.pt 2024-09-23 21:03:53,559 INFO [train.py:1198] (0/4) Epoch 20, batch 0, loss[loss=0.1965, ctc_loss=0.1318, cr_loss=0.3238, over 17026.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1318, cr_loss=0.3238, over 17026.00 frames. ], batch size: 44, lr: 6.08e-03, grad_scale: 32.0 2024-09-23 21:03:53,560 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 21:04:04,796 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.3927, 2.3897, 3.1125, 3.0788], device='cuda:0') 2024-09-23 21:04:08,661 INFO [train.py:1230] (0/4) Epoch 20, validation: loss=0.03935, ctc_loss=0.03935, cr_loss=7.664e-15, over 944034.00 frames. 2024-09-23 21:04:08,661 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 21:04:27,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=345492.0, ans=0.1 2024-09-23 21:04:44,796 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 21:05:10,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=345585.3333333333, ans=0.125 2024-09-23 21:05:12,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=345585.3333333333, ans=0.125 2024-09-23 21:05:33,920 INFO [train.py:1198] (0/4) Epoch 20, batch 50, loss[loss=0.2248, ctc_loss=0.1487, cr_loss=0.3807, over 17022.00 frames. ], tot_loss[loss=0.2187, ctc_loss=0.146, cr_loss=0.3634, over 761657.92 frames. ], batch size: 51, lr: 6.08e-03, grad_scale: 32.0 2024-09-23 21:05:35,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=345678.6666666667, ans=0.0 2024-09-23 21:05:51,261 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.262e+02 1.457e+02 1.636e+02 2.185e+02, threshold=2.915e+02, percent-clipped=0.0 2024-09-23 21:06:53,843 INFO [train.py:1198] (0/4) Epoch 20, batch 100, loss[loss=0.223, ctc_loss=0.1518, cr_loss=0.356, over 17157.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1451, cr_loss=0.3602, over 1341052.36 frames. ], batch size: 48, lr: 6.08e-03, grad_scale: 32.0 2024-09-23 21:06:54,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=345912.0, ans=0.0 2024-09-23 21:07:15,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-09-23 21:07:16,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2024-09-23 21:07:44,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=346052.0, ans=0.0 2024-09-23 21:08:08,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=346098.6666666667, ans=0.125 2024-09-23 21:08:18,775 INFO [train.py:1198] (0/4) Epoch 20, batch 150, loss[loss=0.2079, ctc_loss=0.1371, cr_loss=0.3541, over 17089.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1459, cr_loss=0.3625, over 1786379.37 frames. ], batch size: 43, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:08:36,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=346192.0, ans=0.0 2024-09-23 21:08:38,025 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.021e+02 1.321e+02 1.429e+02 1.602e+02 2.448e+02, threshold=2.858e+02, percent-clipped=0.0 2024-09-23 21:09:05,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=346285.3333333333, ans=0.125 2024-09-23 21:09:14,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2024-09-23 21:09:19,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-23 21:09:35,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=346332.0, ans=0.1 2024-09-23 21:09:45,100 INFO [train.py:1198] (0/4) Epoch 20, batch 200, loss[loss=0.1819, ctc_loss=0.1191, cr_loss=0.3141, over 17252.00 frames. ], tot_loss[loss=0.2182, ctc_loss=0.1458, cr_loss=0.362, over 2144177.52 frames. ], batch size: 42, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:10:09,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=346425.3333333333, ans=0.07 2024-09-23 21:10:44,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=346518.6666666667, ans=0.125 2024-09-23 21:10:49,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=346565.3333333333, ans=0.125 2024-09-23 21:10:59,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2024-09-23 21:11:04,972 INFO [train.py:1198] (0/4) Epoch 20, batch 250, loss[loss=0.2146, ctc_loss=0.1452, cr_loss=0.3472, over 17037.00 frames. ], tot_loss[loss=0.2185, ctc_loss=0.1459, cr_loss=0.3627, over 2418970.41 frames. ], batch size: 56, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:11:13,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=346612.0, ans=0.125 2024-09-23 21:11:23,903 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.297e+02 1.428e+02 1.617e+02 1.854e+02, threshold=2.857e+02, percent-clipped=0.0 2024-09-23 21:11:35,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=346705.3333333333, ans=0.09899494936611666 2024-09-23 21:12:02,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=346752.0, ans=0.0 2024-09-23 21:12:24,540 INFO [train.py:1198] (0/4) Epoch 20, batch 300, loss[loss=0.2127, ctc_loss=0.1456, cr_loss=0.3352, over 17031.00 frames. ], tot_loss[loss=0.2184, ctc_loss=0.1459, cr_loss=0.3626, over 2621235.22 frames. ], batch size: 51, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:12:24,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346845.3333333333, ans=0.1 2024-09-23 21:12:32,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346845.3333333333, ans=0.1 2024-09-23 21:13:04,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=346938.6666666667, ans=0.125 2024-09-23 21:13:23,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=346985.3333333333, ans=0.05 2024-09-23 21:13:31,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2024-09-23 21:13:33,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=347032.0, ans=0.09899494936611666 2024-09-23 21:13:43,525 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 21:13:49,665 INFO [train.py:1198] (0/4) Epoch 20, batch 350, loss[loss=0.2257, ctc_loss=0.1499, cr_loss=0.3787, over 16770.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1447, cr_loss=0.3621, over 2791886.95 frames. ], batch size: 61, lr: 6.07e-03, grad_scale: 16.0 2024-09-23 21:13:57,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=347078.6666666667, ans=0.2 2024-09-23 21:14:05,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=347125.3333333333, ans=0.125 2024-09-23 21:14:08,669 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.225e+02 1.312e+02 1.421e+02 1.795e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-23 21:14:50,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=347218.6666666667, ans=0.125 2024-09-23 21:14:58,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=347265.3333333333, ans=0.025 2024-09-23 21:15:12,642 INFO [train.py:1198] (0/4) Epoch 20, batch 400, loss[loss=0.2111, ctc_loss=0.1429, cr_loss=0.3414, over 17003.00 frames. ], tot_loss[loss=0.2178, ctc_loss=0.1452, cr_loss=0.3628, over 2913731.32 frames. ], batch size: 51, lr: 6.06e-03, grad_scale: 32.0 2024-09-23 21:15:41,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=347358.6666666667, ans=0.1 2024-09-23 21:15:42,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2024-09-23 21:16:32,402 INFO [train.py:1198] (0/4) Epoch 20, batch 450, loss[loss=0.1963, ctc_loss=0.1296, cr_loss=0.3336, over 17089.00 frames. ], tot_loss[loss=0.2176, ctc_loss=0.145, cr_loss=0.3629, over 3016027.72 frames. ], batch size: 43, lr: 6.06e-03, grad_scale: 32.0 2024-09-23 21:16:34,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=347545.3333333333, ans=0.125 2024-09-23 21:16:52,900 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.293e+02 1.421e+02 1.585e+02 2.168e+02, threshold=2.842e+02, percent-clipped=0.0 2024-09-23 21:17:48,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=347732.0, ans=0.0 2024-09-23 21:17:53,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=347778.6666666667, ans=0.1 2024-09-23 21:17:54,854 INFO [train.py:1198] (0/4) Epoch 20, batch 500, loss[loss=0.1969, ctc_loss=0.1315, cr_loss=0.3271, over 17163.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1448, cr_loss=0.3623, over 3096373.34 frames. ], batch size: 45, lr: 6.06e-03, grad_scale: 16.0 2024-09-23 21:17:56,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=347778.6666666667, ans=0.125 2024-09-23 21:18:26,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=347825.3333333333, ans=0.125 2024-09-23 21:18:59,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347965.3333333333, ans=0.1 2024-09-23 21:19:11,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=347965.3333333333, ans=0.2 2024-09-23 21:19:17,193 INFO [train.py:1198] (0/4) Epoch 20, batch 550, loss[loss=0.2276, ctc_loss=0.1527, cr_loss=0.3748, over 17297.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1443, cr_loss=0.3613, over 3145104.81 frames. ], batch size: 49, lr: 6.06e-03, grad_scale: 16.0 2024-09-23 21:19:42,497 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.238e+02 1.328e+02 1.432e+02 2.472e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-23 21:19:46,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-09-23 21:20:00,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=348105.3333333333, ans=0.0 2024-09-23 21:20:14,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=348152.0, ans=0.125 2024-09-23 21:20:22,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348152.0, ans=0.1 2024-09-23 21:20:41,716 INFO [train.py:1198] (0/4) Epoch 20, batch 600, loss[loss=0.237, ctc_loss=0.1565, cr_loss=0.4023, over 17038.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1441, cr_loss=0.3609, over 3195544.78 frames. ], batch size: 56, lr: 6.06e-03, grad_scale: 16.0 2024-09-23 21:20:57,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=348292.0, ans=0.125 2024-09-23 21:20:58,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2024-09-23 21:21:18,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348338.6666666667, ans=0.125 2024-09-23 21:21:53,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=348432.0, ans=0.0 2024-09-23 21:22:01,011 INFO [train.py:1198] (0/4) Epoch 20, batch 650, loss[loss=0.1952, ctc_loss=0.125, cr_loss=0.3505, over 16962.00 frames. ], tot_loss[loss=0.215, ctc_loss=0.1432, cr_loss=0.3595, over 3235960.48 frames. ], batch size: 42, lr: 6.05e-03, grad_scale: 16.0 2024-09-23 21:22:21,604 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.264e+02 1.348e+02 1.477e+02 2.156e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-23 21:22:23,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=348525.3333333333, ans=0.125 2024-09-23 21:22:29,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=348525.3333333333, ans=0.025 2024-09-23 21:22:42,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=348572.0, ans=0.2 2024-09-23 21:22:47,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-09-23 21:22:56,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=348618.6666666667, ans=0.125 2024-09-23 21:22:58,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=348618.6666666667, ans=0.125 2024-09-23 21:23:26,231 INFO [train.py:1198] (0/4) Epoch 20, batch 700, loss[loss=0.2413, ctc_loss=0.1634, cr_loss=0.3897, over 16896.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1429, cr_loss=0.3589, over 3270353.91 frames. ], batch size: 58, lr: 6.05e-03, grad_scale: 16.0 2024-09-23 21:24:00,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=348805.3333333333, ans=0.025 2024-09-23 21:24:08,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=348805.3333333333, ans=0.125 2024-09-23 21:24:11,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=348805.3333333333, ans=0.125 2024-09-23 21:24:17,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=348852.0, ans=0.125 2024-09-23 21:24:29,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348852.0, ans=0.1 2024-09-23 21:24:31,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=348852.0, ans=0.025 2024-09-23 21:24:51,551 INFO [train.py:1198] (0/4) Epoch 20, batch 750, loss[loss=0.2123, ctc_loss=0.1397, cr_loss=0.3627, over 17217.00 frames. ], tot_loss[loss=0.2163, ctc_loss=0.1441, cr_loss=0.3607, over 3284620.00 frames. ], batch size: 47, lr: 6.05e-03, grad_scale: 16.0 2024-09-23 21:25:12,208 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.238e+02 1.344e+02 1.442e+02 2.138e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-23 21:25:16,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.10 vs. limit=12.0 2024-09-23 21:25:17,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=22.5 2024-09-23 21:25:40,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=12.0 2024-09-23 21:25:42,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=349085.3333333333, ans=0.0 2024-09-23 21:26:09,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=349178.6666666667, ans=0.125 2024-09-23 21:26:11,143 INFO [train.py:1198] (0/4) Epoch 20, batch 800, loss[loss=0.2089, ctc_loss=0.1393, cr_loss=0.3479, over 17355.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1443, cr_loss=0.361, over 3311677.16 frames. ], batch size: 48, lr: 6.05e-03, grad_scale: 32.0 2024-09-23 21:26:19,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=12.0 2024-09-23 21:26:22,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=349178.6666666667, ans=0.0 2024-09-23 21:26:32,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-09-23 21:26:51,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=349272.0, ans=0.0 2024-09-23 21:26:51,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=349272.0, ans=0.125 2024-09-23 21:26:52,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=349272.0, ans=0.125 2024-09-23 21:27:10,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=349318.6666666667, ans=0.125 2024-09-23 21:27:16,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=349365.3333333333, ans=0.0 2024-09-23 21:27:17,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349365.3333333333, ans=0.1 2024-09-23 21:27:23,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=349365.3333333333, ans=0.125 2024-09-23 21:27:31,205 INFO [train.py:1198] (0/4) Epoch 20, batch 850, loss[loss=0.2066, ctc_loss=0.1366, cr_loss=0.3497, over 17205.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1434, cr_loss=0.3589, over 3310098.43 frames. ], batch size: 50, lr: 6.05e-03, grad_scale: 32.0 2024-09-23 21:27:44,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=349412.0, ans=0.2 2024-09-23 21:27:49,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=349458.6666666667, ans=0.2 2024-09-23 21:27:54,464 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.268e+02 1.363e+02 1.494e+02 2.147e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-23 21:28:46,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=349598.6666666667, ans=0.025 2024-09-23 21:28:47,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-09-23 21:28:48,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=349598.6666666667, ans=15.0 2024-09-23 21:28:56,003 INFO [train.py:1198] (0/4) Epoch 20, batch 900, loss[loss=0.2526, ctc_loss=0.1721, cr_loss=0.4021, over 17029.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1443, cr_loss=0.3604, over 3317704.19 frames. ], batch size: 56, lr: 6.04e-03, grad_scale: 32.0 2024-09-23 21:30:17,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2024-09-23 21:30:21,168 INFO [train.py:1198] (0/4) Epoch 20, batch 950, loss[loss=0.2695, ctc_loss=0.1844, cr_loss=0.4258, over 15164.00 frames. ], tot_loss[loss=0.2168, ctc_loss=0.1448, cr_loss=0.3602, over 3313429.03 frames. ], batch size: 89, lr: 6.04e-03, grad_scale: 32.0 2024-09-23 21:30:36,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.99 vs. limit=10.0 2024-09-23 21:30:42,123 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.279e+02 1.392e+02 1.519e+02 1.959e+02, threshold=2.784e+02, percent-clipped=0.0 2024-09-23 21:31:06,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-09-23 21:31:41,402 INFO [train.py:1198] (0/4) Epoch 20, batch 1000, loss[loss=0.2443, ctc_loss=0.1655, cr_loss=0.3935, over 17001.00 frames. ], tot_loss[loss=0.218, ctc_loss=0.1457, cr_loss=0.3615, over 3316786.98 frames. ], batch size: 53, lr: 6.04e-03, grad_scale: 32.0 2024-09-23 21:31:49,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=350112.0, ans=0.125 2024-09-23 21:32:02,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=350158.6666666667, ans=0.125 2024-09-23 21:32:25,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=350205.3333333333, ans=0.125 2024-09-23 21:32:34,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=350252.0, ans=0.05 2024-09-23 21:33:04,841 INFO [train.py:1198] (0/4) Epoch 20, batch 1050, loss[loss=0.2446, ctc_loss=0.1724, cr_loss=0.3609, over 11732.00 frames. ], tot_loss[loss=0.2175, ctc_loss=0.1452, cr_loss=0.3611, over 3319996.36 frames. ], batch size: 124, lr: 6.04e-03, grad_scale: 32.0 2024-09-23 21:33:10,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=350345.3333333333, ans=0.0 2024-09-23 21:33:27,843 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.295e+02 1.374e+02 1.534e+02 2.501e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-23 21:33:29,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=350392.0, ans=0.125 2024-09-23 21:33:31,343 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 21:34:00,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=350485.3333333333, ans=0.125 2024-09-23 21:34:02,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-09-23 21:34:32,260 INFO [train.py:1198] (0/4) Epoch 20, batch 1100, loss[loss=0.1822, ctc_loss=0.1161, cr_loss=0.3308, over 17010.00 frames. ], tot_loss[loss=0.2173, ctc_loss=0.1451, cr_loss=0.3609, over 3323077.18 frames. ], batch size: 39, lr: 6.04e-03, grad_scale: 16.0 2024-09-23 21:34:36,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.44 vs. limit=15.0 2024-09-23 21:34:42,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=350578.6666666667, ans=0.125 2024-09-23 21:34:45,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-23 21:34:47,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-09-23 21:35:21,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=12.0 2024-09-23 21:35:21,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=350718.6666666667, ans=0.0 2024-09-23 21:35:52,150 INFO [train.py:1198] (0/4) Epoch 20, batch 1150, loss[loss=0.184, ctc_loss=0.1212, cr_loss=0.3141, over 17294.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1446, cr_loss=0.3598, over 3327936.77 frames. ], batch size: 49, lr: 6.03e-03, grad_scale: 16.0 2024-09-23 21:35:53,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2024-09-23 21:36:00,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=350812.0, ans=0.0 2024-09-23 21:36:10,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=350858.6666666667, ans=0.0 2024-09-23 21:36:14,527 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.209e+02 1.298e+02 1.420e+02 2.385e+02, threshold=2.595e+02, percent-clipped=0.0 2024-09-23 21:36:41,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=12.0 2024-09-23 21:36:42,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=350952.0, ans=0.125 2024-09-23 21:37:00,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-23 21:37:06,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=350998.6666666667, ans=0.125 2024-09-23 21:37:12,177 INFO [train.py:1198] (0/4) Epoch 20, batch 1200, loss[loss=0.1885, ctc_loss=0.1247, cr_loss=0.319, over 16969.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1446, cr_loss=0.3592, over 3336729.89 frames. ], batch size: 42, lr: 6.03e-03, grad_scale: 32.0 2024-09-23 21:37:14,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=351045.3333333333, ans=0.125 2024-09-23 21:37:17,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=351045.3333333333, ans=0.2 2024-09-23 21:37:23,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=351045.3333333333, ans=0.0 2024-09-23 21:37:29,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.31 vs. limit=12.0 2024-09-23 21:37:36,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=351092.0, ans=0.125 2024-09-23 21:38:37,557 INFO [train.py:1198] (0/4) Epoch 20, batch 1250, loss[loss=0.1932, ctc_loss=0.1288, cr_loss=0.3221, over 17300.00 frames. ], tot_loss[loss=0.2164, ctc_loss=0.1445, cr_loss=0.3596, over 3341704.27 frames. ], batch size: 46, lr: 6.03e-03, grad_scale: 32.0 2024-09-23 21:38:50,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=351278.6666666667, ans=0.125 2024-09-23 21:38:52,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=351325.3333333333, ans=0.0 2024-09-23 21:38:53,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=351325.3333333333, ans=0.0 2024-09-23 21:38:58,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351325.3333333333, ans=0.1 2024-09-23 21:38:59,801 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.265e+02 1.387e+02 1.557e+02 1.898e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-23 21:39:00,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-09-23 21:39:12,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=351372.0, ans=0.2 2024-09-23 21:39:21,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=351372.0, ans=0.125 2024-09-23 21:39:27,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351372.0, ans=0.1 2024-09-23 21:39:29,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=351418.6666666667, ans=0.125 2024-09-23 21:39:31,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2024-09-23 21:40:02,019 INFO [train.py:1198] (0/4) Epoch 20, batch 1300, loss[loss=0.2514, ctc_loss=0.1691, cr_loss=0.4116, over 16443.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1437, cr_loss=0.3588, over 3347206.26 frames. ], batch size: 66, lr: 6.03e-03, grad_scale: 32.0 2024-09-23 21:40:24,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=351558.6666666667, ans=0.0 2024-09-23 21:41:07,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=351698.6666666667, ans=0.09899494936611666 2024-09-23 21:41:12,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=351698.6666666667, ans=10.0 2024-09-23 21:41:21,504 INFO [train.py:1198] (0/4) Epoch 20, batch 1350, loss[loss=0.2313, ctc_loss=0.1539, cr_loss=0.3872, over 16985.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1433, cr_loss=0.3589, over 3350472.99 frames. ], batch size: 53, lr: 6.03e-03, grad_scale: 32.0 2024-09-23 21:41:43,587 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.053e+02 1.285e+02 1.356e+02 1.494e+02 3.061e+02, threshold=2.711e+02, percent-clipped=1.0 2024-09-23 21:41:55,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=351838.6666666667, ans=0.125 2024-09-23 21:41:58,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351838.6666666667, ans=0.125 2024-09-23 21:42:09,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351885.3333333333, ans=0.1 2024-09-23 21:42:27,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=351932.0, ans=0.125 2024-09-23 21:42:43,486 INFO [train.py:1198] (0/4) Epoch 20, batch 1400, loss[loss=0.1813, ctc_loss=0.1204, cr_loss=0.3047, over 17025.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1439, cr_loss=0.36, over 3352786.77 frames. ], batch size: 39, lr: 6.02e-03, grad_scale: 32.0 2024-09-23 21:42:43,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=351978.6666666667, ans=0.125 2024-09-23 21:42:53,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=351978.6666666667, ans=0.125 2024-09-23 21:43:27,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352072.0, ans=0.1 2024-09-23 21:43:32,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=352118.6666666667, ans=0.125 2024-09-23 21:43:36,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=12.0 2024-09-23 21:43:50,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2024-09-23 21:44:08,213 INFO [train.py:1198] (0/4) Epoch 20, batch 1450, loss[loss=0.1908, ctc_loss=0.125, cr_loss=0.3291, over 17324.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1452, cr_loss=0.3621, over 3345618.68 frames. ], batch size: 51, lr: 6.02e-03, grad_scale: 32.0 2024-09-23 21:44:26,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=352258.6666666667, ans=0.125 2024-09-23 21:44:27,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-09-23 21:44:32,962 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.276e+02 1.364e+02 1.479e+02 2.768e+02, threshold=2.727e+02, percent-clipped=1.0 2024-09-23 21:44:34,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=352258.6666666667, ans=0.0 2024-09-23 21:44:41,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352305.3333333333, ans=0.1 2024-09-23 21:44:50,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=352305.3333333333, ans=0.125 2024-09-23 21:44:55,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352305.3333333333, ans=0.125 2024-09-23 21:44:57,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=352352.0, ans=0.0 2024-09-23 21:45:01,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352352.0, ans=0.1 2024-09-23 21:45:08,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352352.0, ans=0.125 2024-09-23 21:45:14,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=352398.6666666667, ans=0.125 2024-09-23 21:45:25,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=352398.6666666667, ans=0.0 2024-09-23 21:45:25,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=352398.6666666667, ans=0.1 2024-09-23 21:45:30,178 INFO [train.py:1198] (0/4) Epoch 20, batch 1500, loss[loss=0.2078, ctc_loss=0.1371, cr_loss=0.3536, over 17070.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1447, cr_loss=0.3609, over 3350117.06 frames. ], batch size: 46, lr: 6.02e-03, grad_scale: 16.0 2024-09-23 21:45:36,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=352445.3333333333, ans=0.0 2024-09-23 21:45:40,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=352445.3333333333, ans=0.125 2024-09-23 21:45:42,156 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.87 vs. limit=10.0 2024-09-23 21:46:02,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=352538.6666666667, ans=0.125 2024-09-23 21:46:25,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=352585.3333333333, ans=0.0 2024-09-23 21:46:42,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.90 vs. limit=6.0 2024-09-23 21:46:51,136 INFO [train.py:1198] (0/4) Epoch 20, batch 1550, loss[loss=0.2219, ctc_loss=0.1472, cr_loss=0.3737, over 15810.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1446, cr_loss=0.3607, over 3347599.51 frames. ], batch size: 74, lr: 6.02e-03, grad_scale: 16.0 2024-09-23 21:46:51,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=352678.6666666667, ans=0.0 2024-09-23 21:46:51,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=352678.6666666667, ans=0.07 2024-09-23 21:47:12,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=352725.3333333333, ans=0.0 2024-09-23 21:47:12,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=352725.3333333333, ans=0.1 2024-09-23 21:47:17,721 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.276e+02 1.387e+02 1.513e+02 2.213e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-23 21:47:21,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352725.3333333333, ans=0.1 2024-09-23 21:47:22,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=352725.3333333333, ans=0.125 2024-09-23 21:47:26,447 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-09-23 21:47:48,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=352818.6666666667, ans=0.05 2024-09-23 21:47:53,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=352818.6666666667, ans=0.1 2024-09-23 21:48:11,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=352865.3333333333, ans=0.0 2024-09-23 21:48:16,281 INFO [train.py:1198] (0/4) Epoch 20, batch 1600, loss[loss=0.2317, ctc_loss=0.1544, cr_loss=0.3863, over 16474.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1454, cr_loss=0.3619, over 3343003.90 frames. ], batch size: 66, lr: 6.02e-03, grad_scale: 32.0 2024-09-23 21:48:51,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=353005.3333333333, ans=0.0 2024-09-23 21:49:09,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353052.0, ans=0.1 2024-09-23 21:49:14,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=353052.0, ans=0.95 2024-09-23 21:49:23,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=353098.6666666667, ans=0.0 2024-09-23 21:49:41,007 INFO [train.py:1198] (0/4) Epoch 20, batch 1650, loss[loss=0.2804, ctc_loss=0.1998, cr_loss=0.403, over 11626.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1448, cr_loss=0.3603, over 3339452.54 frames. ], batch size: 123, lr: 6.02e-03, grad_scale: 32.0 2024-09-23 21:50:05,315 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.288e+02 1.374e+02 1.505e+02 2.586e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-23 21:50:20,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=353238.6666666667, ans=0.0 2024-09-23 21:50:40,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=353285.3333333333, ans=0.125 2024-09-23 21:51:01,007 INFO [train.py:1198] (0/4) Epoch 20, batch 1700, loss[loss=0.1825, ctc_loss=0.1189, cr_loss=0.3176, over 17179.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1437, cr_loss=0.3583, over 3353942.14 frames. ], batch size: 41, lr: 6.01e-03, grad_scale: 32.0 2024-09-23 21:51:17,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=353425.3333333333, ans=0.025 2024-09-23 21:51:23,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=353425.3333333333, ans=0.125 2024-09-23 21:51:25,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=353425.3333333333, ans=0.0 2024-09-23 21:51:32,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=22.5 2024-09-23 21:51:39,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=353472.0, ans=0.125 2024-09-23 21:51:41,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=353472.0, ans=10.0 2024-09-23 21:51:57,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=353518.6666666667, ans=0.125 2024-09-23 21:52:14,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=353565.3333333333, ans=0.0 2024-09-23 21:52:24,279 INFO [train.py:1198] (0/4) Epoch 20, batch 1750, loss[loss=0.2378, ctc_loss=0.1614, cr_loss=0.3822, over 16919.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1431, cr_loss=0.3582, over 3365433.91 frames. ], batch size: 58, lr: 6.01e-03, grad_scale: 16.0 2024-09-23 21:52:29,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2024-09-23 21:52:37,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=353612.0, ans=0.125 2024-09-23 21:52:40,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=353658.6666666667, ans=0.0 2024-09-23 21:52:49,739 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.235e+02 1.321e+02 1.434e+02 2.366e+02, threshold=2.642e+02, percent-clipped=0.0 2024-09-23 21:53:47,294 INFO [train.py:1198] (0/4) Epoch 20, batch 1800, loss[loss=0.187, ctc_loss=0.1214, cr_loss=0.3277, over 17274.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.1442, cr_loss=0.3601, over 3355253.81 frames. ], batch size: 42, lr: 6.01e-03, grad_scale: 16.0 2024-09-23 21:53:52,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=353845.3333333333, ans=0.09899494936611666 2024-09-23 21:54:09,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=353892.0, ans=0.2 2024-09-23 21:54:40,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-09-23 21:54:43,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=353985.3333333333, ans=0.2 2024-09-23 21:55:11,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=22.5 2024-09-23 21:55:12,167 INFO [train.py:1198] (0/4) Epoch 20, batch 1850, loss[loss=0.2228, ctc_loss=0.1488, cr_loss=0.3699, over 17118.00 frames. ], tot_loss[loss=0.2166, ctc_loss=0.1444, cr_loss=0.3607, over 3359496.64 frames. ], batch size: 49, lr: 6.01e-03, grad_scale: 16.0 2024-09-23 21:55:37,931 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.237e+02 1.317e+02 1.410e+02 2.955e+02, threshold=2.633e+02, percent-clipped=1.0 2024-09-23 21:56:03,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=354218.6666666667, ans=10.0 2024-09-23 21:56:31,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=354312.0, ans=0.125 2024-09-23 21:56:32,473 INFO [train.py:1198] (0/4) Epoch 20, batch 1900, loss[loss=0.205, ctc_loss=0.1323, cr_loss=0.3638, over 17087.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1439, cr_loss=0.3594, over 3352483.97 frames. ], batch size: 43, lr: 6.01e-03, grad_scale: 16.0 2024-09-23 21:56:35,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=354312.0, ans=0.125 2024-09-23 21:56:36,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=354312.0, ans=0.125 2024-09-23 21:56:36,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=354312.0, ans=0.125 2024-09-23 21:56:42,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=354312.0, ans=0.0 2024-09-23 21:56:48,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=354358.6666666667, ans=0.125 2024-09-23 21:56:51,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=354358.6666666667, ans=0.0 2024-09-23 21:57:01,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=354358.6666666667, ans=0.04949747468305833 2024-09-23 21:57:03,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=354405.3333333333, ans=0.125 2024-09-23 21:57:38,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-09-23 21:57:55,487 INFO [train.py:1198] (0/4) Epoch 20, batch 1950, loss[loss=0.2061, ctc_loss=0.136, cr_loss=0.3503, over 16785.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1431, cr_loss=0.3586, over 3364706.89 frames. ], batch size: 37, lr: 6.00e-03, grad_scale: 16.0 2024-09-23 21:58:04,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 2024-09-23 21:58:23,509 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.283e+02 1.409e+02 1.574e+02 2.318e+02, threshold=2.818e+02, percent-clipped=0.0 2024-09-23 21:58:27,113 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 21:58:28,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354638.6666666667, ans=0.1 2024-09-23 21:58:38,626 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-76000.pt 2024-09-23 21:58:41,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=354638.6666666667, ans=0.125 2024-09-23 21:58:52,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=354685.3333333333, ans=0.0 2024-09-23 21:59:24,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=354778.6666666667, ans=0.1 2024-09-23 21:59:25,720 INFO [train.py:1198] (0/4) Epoch 20, batch 2000, loss[loss=0.1807, ctc_loss=0.1171, cr_loss=0.3181, over 16940.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1421, cr_loss=0.3568, over 3368151.55 frames. ], batch size: 42, lr: 6.00e-03, grad_scale: 32.0 2024-09-23 21:59:33,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354778.6666666667, ans=0.1 2024-09-23 21:59:48,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=354825.3333333333, ans=0.0 2024-09-23 21:59:53,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=354825.3333333333, ans=0.07 2024-09-23 22:00:09,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=354872.0, ans=0.125 2024-09-23 22:00:17,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=354918.6666666667, ans=0.04949747468305833 2024-09-23 22:00:45,848 INFO [train.py:1198] (0/4) Epoch 20, batch 2050, loss[loss=0.2197, ctc_loss=0.1497, cr_loss=0.3504, over 17152.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1428, cr_loss=0.3584, over 3370286.16 frames. ], batch size: 45, lr: 6.00e-03, grad_scale: 32.0 2024-09-23 22:00:54,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=355012.0, ans=0.125 2024-09-23 22:00:56,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=355012.0, ans=0.5 2024-09-23 22:01:07,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=355058.6666666667, ans=0.125 2024-09-23 22:01:11,577 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.282e+02 1.359e+02 1.463e+02 2.527e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-23 22:01:21,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=355105.3333333333, ans=0.0 2024-09-23 22:01:26,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=355105.3333333333, ans=0.125 2024-09-23 22:01:45,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=355152.0, ans=0.125 2024-09-23 22:01:46,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=355152.0, ans=0.1 2024-09-23 22:01:49,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-23 22:02:01,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=355198.6666666667, ans=0.0 2024-09-23 22:02:05,991 INFO [train.py:1198] (0/4) Epoch 20, batch 2100, loss[loss=0.2139, ctc_loss=0.1413, cr_loss=0.363, over 17218.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1436, cr_loss=0.3598, over 3364106.84 frames. ], batch size: 55, lr: 6.00e-03, grad_scale: 32.0 2024-09-23 22:02:12,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=355245.3333333333, ans=0.125 2024-09-23 22:02:21,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355245.3333333333, ans=0.125 2024-09-23 22:02:27,923 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=22.5 2024-09-23 22:03:06,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=355385.3333333333, ans=0.05 2024-09-23 22:03:08,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=355385.3333333333, ans=0.0 2024-09-23 22:03:30,248 INFO [train.py:1198] (0/4) Epoch 20, batch 2150, loss[loss=0.1943, ctc_loss=0.1307, cr_loss=0.3183, over 17251.00 frames. ], tot_loss[loss=0.2152, ctc_loss=0.1434, cr_loss=0.3592, over 3366756.70 frames. ], batch size: 44, lr: 6.00e-03, grad_scale: 32.0 2024-09-23 22:03:43,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=355478.6666666667, ans=0.125 2024-09-23 22:03:58,363 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.263e+02 1.377e+02 1.523e+02 2.016e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 22:03:58,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=355525.3333333333, ans=0.0 2024-09-23 22:04:30,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355618.6666666667, ans=0.1 2024-09-23 22:04:33,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=355618.6666666667, ans=0.125 2024-09-23 22:04:35,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=355618.6666666667, ans=0.125 2024-09-23 22:04:40,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=355665.3333333333, ans=0.125 2024-09-23 22:04:44,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=355665.3333333333, ans=0.2 2024-09-23 22:04:52,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=355665.3333333333, ans=0.125 2024-09-23 22:04:55,855 INFO [train.py:1198] (0/4) Epoch 20, batch 2200, loss[loss=0.2126, ctc_loss=0.1434, cr_loss=0.3463, over 17006.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.1432, cr_loss=0.3595, over 3362581.20 frames. ], batch size: 51, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:05:01,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355712.0, ans=0.1 2024-09-23 22:05:12,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=355758.6666666667, ans=0.2 2024-09-23 22:05:12,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355758.6666666667, ans=0.1 2024-09-23 22:05:34,572 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:05:56,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=355852.0, ans=0.0 2024-09-23 22:06:14,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=355945.3333333333, ans=0.2 2024-09-23 22:06:16,041 INFO [train.py:1198] (0/4) Epoch 20, batch 2250, loss[loss=0.224, ctc_loss=0.1484, cr_loss=0.378, over 17299.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1435, cr_loss=0.3598, over 3357831.85 frames. ], batch size: 46, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:06:41,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=12.0 2024-09-23 22:06:41,714 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.265e+02 1.369e+02 1.505e+02 1.904e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-23 22:06:54,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=356038.6666666667, ans=0.125 2024-09-23 22:07:11,609 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:07:17,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=356085.3333333333, ans=0.025 2024-09-23 22:07:22,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=356132.0, ans=0.125 2024-09-23 22:07:22,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=356132.0, ans=0.025 2024-09-23 22:07:28,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356132.0, ans=0.1 2024-09-23 22:07:28,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=356132.0, ans=0.02 2024-09-23 22:07:38,557 INFO [train.py:1198] (0/4) Epoch 20, batch 2300, loss[loss=0.2634, ctc_loss=0.1819, cr_loss=0.4074, over 14882.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1424, cr_loss=0.3582, over 3360570.23 frames. ], batch size: 89, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:08:17,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=356272.0, ans=0.125 2024-09-23 22:08:19,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=356272.0, ans=0.125 2024-09-23 22:08:22,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=12.0 2024-09-23 22:08:24,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=356272.0, ans=10.0 2024-09-23 22:08:46,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=356365.3333333333, ans=0.125 2024-09-23 22:09:02,583 INFO [train.py:1198] (0/4) Epoch 20, batch 2350, loss[loss=0.1954, ctc_loss=0.1274, cr_loss=0.3402, over 17271.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1424, cr_loss=0.3584, over 3353678.52 frames. ], batch size: 42, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:09:09,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356412.0, ans=0.1 2024-09-23 22:09:18,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=356412.0, ans=0.2 2024-09-23 22:09:30,701 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.236e+02 1.335e+02 1.500e+02 2.118e+02, threshold=2.671e+02, percent-clipped=0.0 2024-09-23 22:09:45,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356505.3333333333, ans=0.1 2024-09-23 22:09:55,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.99 vs. limit=15.0 2024-09-23 22:09:56,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=356552.0, ans=0.125 2024-09-23 22:10:15,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=356598.6666666667, ans=0.0 2024-09-23 22:10:17,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=356598.6666666667, ans=0.2 2024-09-23 22:10:17,271 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:10:21,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=356598.6666666667, ans=0.125 2024-09-23 22:10:24,966 INFO [train.py:1198] (0/4) Epoch 20, batch 2400, loss[loss=0.1835, ctc_loss=0.1197, cr_loss=0.319, over 17282.00 frames. ], tot_loss[loss=0.2142, ctc_loss=0.1425, cr_loss=0.3585, over 3357687.26 frames. ], batch size: 42, lr: 5.99e-03, grad_scale: 32.0 2024-09-23 22:10:25,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=356645.3333333333, ans=0.0 2024-09-23 22:10:26,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=356645.3333333333, ans=0.125 2024-09-23 22:10:33,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=22.5 2024-09-23 22:10:46,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2024-09-23 22:10:50,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=356692.0, ans=0.125 2024-09-23 22:11:06,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356738.6666666667, ans=0.1 2024-09-23 22:11:16,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=356785.3333333333, ans=0.0 2024-09-23 22:11:27,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=15.0 2024-09-23 22:11:42,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=356832.0, ans=0.125 2024-09-23 22:11:45,255 INFO [train.py:1198] (0/4) Epoch 20, batch 2450, loss[loss=0.2215, ctc_loss=0.1455, cr_loss=0.3804, over 17234.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1437, cr_loss=0.3602, over 3358015.13 frames. ], batch size: 50, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:11:48,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=356878.6666666667, ans=0.125 2024-09-23 22:11:50,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=356878.6666666667, ans=0.05 2024-09-23 22:12:13,315 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.240e+02 1.349e+02 1.469e+02 2.826e+02, threshold=2.697e+02, percent-clipped=1.0 2024-09-23 22:12:13,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=356925.3333333333, ans=0.2 2024-09-23 22:12:33,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-09-23 22:12:52,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357065.3333333333, ans=0.1 2024-09-23 22:12:59,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=357065.3333333333, ans=0.125 2024-09-23 22:12:59,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=357065.3333333333, ans=0.0 2024-09-23 22:13:00,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=357065.3333333333, ans=0.1 2024-09-23 22:13:10,055 INFO [train.py:1198] (0/4) Epoch 20, batch 2500, loss[loss=0.2231, ctc_loss=0.1485, cr_loss=0.3728, over 17184.00 frames. ], tot_loss[loss=0.2162, ctc_loss=0.144, cr_loss=0.3609, over 3349634.99 frames. ], batch size: 55, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:13:12,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=357112.0, ans=0.125 2024-09-23 22:13:29,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=357158.6666666667, ans=0.125 2024-09-23 22:13:29,943 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2024-09-23 22:13:39,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-09-23 22:13:55,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=357205.3333333333, ans=0.125 2024-09-23 22:14:04,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.76 vs. limit=10.0 2024-09-23 22:14:34,665 INFO [train.py:1198] (0/4) Epoch 20, batch 2550, loss[loss=0.2287, ctc_loss=0.1527, cr_loss=0.3796, over 17306.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1439, cr_loss=0.3603, over 3352346.67 frames. ], batch size: 51, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:14:44,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=357345.3333333333, ans=0.025 2024-09-23 22:15:00,262 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.269e+02 1.372e+02 1.542e+02 2.100e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-23 22:15:26,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=357485.3333333333, ans=0.2 2024-09-23 22:15:54,811 INFO [train.py:1198] (0/4) Epoch 20, batch 2600, loss[loss=0.2111, ctc_loss=0.1413, cr_loss=0.3491, over 17228.00 frames. ], tot_loss[loss=0.2156, ctc_loss=0.1437, cr_loss=0.3596, over 3360615.56 frames. ], batch size: 50, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:16:07,273 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2024-09-23 22:16:21,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=357625.3333333333, ans=0.125 2024-09-23 22:16:30,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=357672.0, ans=0.09899494936611666 2024-09-23 22:16:57,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=357765.3333333333, ans=0.05 2024-09-23 22:17:17,932 INFO [train.py:1198] (0/4) Epoch 20, batch 2650, loss[loss=0.2335, ctc_loss=0.1563, cr_loss=0.3857, over 17289.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1427, cr_loss=0.358, over 3355752.62 frames. ], batch size: 54, lr: 5.98e-03, grad_scale: 32.0 2024-09-23 22:17:32,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=357858.6666666667, ans=0.125 2024-09-23 22:17:32,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=357858.6666666667, ans=0.025 2024-09-23 22:17:34,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=357858.6666666667, ans=0.125 2024-09-23 22:17:35,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=357858.6666666667, ans=10.0 2024-09-23 22:17:43,557 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.263e+02 1.370e+02 1.503e+02 2.130e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-23 22:18:02,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=357905.3333333333, ans=0.0 2024-09-23 22:18:18,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=357952.0, ans=0.0 2024-09-23 22:18:43,119 INFO [train.py:1198] (0/4) Epoch 20, batch 2700, loss[loss=0.2031, ctc_loss=0.1332, cr_loss=0.3492, over 17092.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1429, cr_loss=0.3585, over 3357626.07 frames. ], batch size: 43, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:18:51,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=358045.3333333333, ans=0.125 2024-09-23 22:18:51,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=358045.3333333333, ans=0.125 2024-09-23 22:19:15,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=358092.0, ans=0.125 2024-09-23 22:19:16,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=358138.6666666667, ans=0.125 2024-09-23 22:19:38,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=358185.3333333333, ans=0.125 2024-09-23 22:19:40,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=358185.3333333333, ans=0.2 2024-09-23 22:19:46,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358185.3333333333, ans=0.125 2024-09-23 22:19:48,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=358232.0, ans=0.2 2024-09-23 22:20:05,466 INFO [train.py:1198] (0/4) Epoch 20, batch 2750, loss[loss=0.2585, ctc_loss=0.1838, cr_loss=0.3737, over 12311.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1431, cr_loss=0.3583, over 3343410.52 frames. ], batch size: 123, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:20:31,065 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.266e+02 1.360e+02 1.482e+02 2.958e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-23 22:20:55,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358418.6666666667, ans=0.1 2024-09-23 22:21:25,606 INFO [train.py:1198] (0/4) Epoch 20, batch 2800, loss[loss=0.2126, ctc_loss=0.1395, cr_loss=0.3655, over 17158.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1428, cr_loss=0.3576, over 3341699.29 frames. ], batch size: 45, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:21:32,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358512.0, ans=0.1 2024-09-23 22:21:40,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=358558.6666666667, ans=0.0 2024-09-23 22:21:46,648 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2024-09-23 22:22:09,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=358605.3333333333, ans=0.035 2024-09-23 22:22:12,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=358605.3333333333, ans=0.125 2024-09-23 22:22:23,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=358652.0, ans=0.2 2024-09-23 22:22:47,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=358698.6666666667, ans=0.125 2024-09-23 22:22:50,493 INFO [train.py:1198] (0/4) Epoch 20, batch 2850, loss[loss=0.2362, ctc_loss=0.1686, cr_loss=0.3379, over 11968.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1429, cr_loss=0.3579, over 3348243.49 frames. ], batch size: 123, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:22:57,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358745.3333333333, ans=0.125 2024-09-23 22:23:15,984 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.246e+02 1.349e+02 1.409e+02 2.188e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-23 22:23:28,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=358838.6666666667, ans=0.125 2024-09-23 22:23:34,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-09-23 22:23:59,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=358932.0, ans=0.0 2024-09-23 22:24:04,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=358932.0, ans=0.125 2024-09-23 22:24:15,306 INFO [train.py:1198] (0/4) Epoch 20, batch 2900, loss[loss=0.208, ctc_loss=0.1385, cr_loss=0.3477, over 17070.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.143, cr_loss=0.3581, over 3349989.34 frames. ], batch size: 46, lr: 5.97e-03, grad_scale: 32.0 2024-09-23 22:24:15,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=358978.6666666667, ans=0.0 2024-09-23 22:24:49,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=359072.0, ans=0.0 2024-09-23 22:25:05,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359118.6666666667, ans=0.1 2024-09-23 22:25:08,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=359118.6666666667, ans=0.0 2024-09-23 22:25:13,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359118.6666666667, ans=0.1 2024-09-23 22:25:16,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=359118.6666666667, ans=0.125 2024-09-23 22:25:35,526 INFO [train.py:1198] (0/4) Epoch 20, batch 2950, loss[loss=0.1895, ctc_loss=0.1228, cr_loss=0.3334, over 16977.00 frames. ], tot_loss[loss=0.2154, ctc_loss=0.1435, cr_loss=0.3593, over 3339714.28 frames. ], batch size: 42, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:25:55,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2024-09-23 22:26:00,732 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.258e+02 1.349e+02 1.512e+02 2.191e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-23 22:26:20,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.65 vs. limit=22.5 2024-09-23 22:26:42,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359398.6666666667, ans=0.1 2024-09-23 22:26:54,937 INFO [train.py:1198] (0/4) Epoch 20, batch 3000, loss[loss=0.2819, ctc_loss=0.1899, cr_loss=0.46, over 16483.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1441, cr_loss=0.3603, over 3344357.94 frames. ], batch size: 66, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:26:54,937 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 22:27:10,609 INFO [train.py:1230] (0/4) Epoch 20, validation: loss=0.03912, ctc_loss=0.03912, cr_loss=8.309e-15, over 944034.00 frames. 2024-09-23 22:27:10,610 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 22:27:26,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=359492.0, ans=0.1 2024-09-23 22:27:56,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=359585.3333333333, ans=0.125 2024-09-23 22:28:31,109 INFO [train.py:1198] (0/4) Epoch 20, batch 3050, loss[loss=0.2329, ctc_loss=0.1527, cr_loss=0.4009, over 17012.00 frames. ], tot_loss[loss=0.2153, ctc_loss=0.1434, cr_loss=0.3596, over 3350472.55 frames. ], batch size: 53, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:28:39,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2024-09-23 22:28:42,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=359678.6666666667, ans=0.1 2024-09-23 22:28:44,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359678.6666666667, ans=0.1 2024-09-23 22:28:56,199 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.297e+02 1.389e+02 1.525e+02 3.485e+02, threshold=2.778e+02, percent-clipped=1.0 2024-09-23 22:28:59,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=359725.3333333333, ans=0.0 2024-09-23 22:29:05,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359772.0, ans=0.1 2024-09-23 22:29:13,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=359772.0, ans=0.125 2024-09-23 22:29:40,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=12.0 2024-09-23 22:29:49,413 INFO [train.py:1198] (0/4) Epoch 20, batch 3100, loss[loss=0.2142, ctc_loss=0.1396, cr_loss=0.3729, over 17226.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1447, cr_loss=0.3619, over 3345522.28 frames. ], batch size: 47, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:29:50,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2024-09-23 22:30:28,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-23 22:30:53,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=360098.6666666667, ans=0.2 2024-09-23 22:31:12,470 INFO [train.py:1198] (0/4) Epoch 20, batch 3150, loss[loss=0.2036, ctc_loss=0.1352, cr_loss=0.3419, over 17137.00 frames. ], tot_loss[loss=0.2171, ctc_loss=0.1448, cr_loss=0.3616, over 3341660.85 frames. ], batch size: 48, lr: 5.96e-03, grad_scale: 32.0 2024-09-23 22:31:27,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2024-09-23 22:31:34,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=360192.0, ans=0.0 2024-09-23 22:31:34,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=360192.0, ans=0.125 2024-09-23 22:31:36,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-23 22:31:37,613 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.291e+02 1.400e+02 1.624e+02 2.307e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-23 22:31:39,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-09-23 22:31:55,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=360238.6666666667, ans=0.125 2024-09-23 22:32:03,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360285.3333333333, ans=0.1 2024-09-23 22:32:03,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=360285.3333333333, ans=0.125 2024-09-23 22:32:16,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2024-09-23 22:32:24,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-23 22:32:27,183 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-23 22:32:31,336 INFO [train.py:1198] (0/4) Epoch 20, batch 3200, loss[loss=0.1796, ctc_loss=0.1165, cr_loss=0.3153, over 17059.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.1441, cr_loss=0.3603, over 3350717.33 frames. ], batch size: 39, lr: 5.95e-03, grad_scale: 32.0 2024-09-23 22:32:55,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=360425.3333333333, ans=0.1 2024-09-23 22:33:09,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=360472.0, ans=0.125 2024-09-23 22:33:45,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=12.0 2024-09-23 22:33:49,551 INFO [train.py:1198] (0/4) Epoch 20, batch 3250, loss[loss=0.2266, ctc_loss=0.1489, cr_loss=0.3888, over 16574.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.143, cr_loss=0.359, over 3360100.15 frames. ], batch size: 66, lr: 5.95e-03, grad_scale: 32.0 2024-09-23 22:34:01,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2024-09-23 22:34:16,116 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.218e+02 1.307e+02 1.462e+02 2.065e+02, threshold=2.615e+02, percent-clipped=0.0 2024-09-23 22:34:21,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=360705.3333333333, ans=0.125 2024-09-23 22:34:41,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=360752.0, ans=0.125 2024-09-23 22:35:07,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2024-09-23 22:35:08,071 INFO [train.py:1198] (0/4) Epoch 20, batch 3300, loss[loss=0.1916, ctc_loss=0.1276, cr_loss=0.3201, over 17068.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1421, cr_loss=0.3571, over 3371543.83 frames. ], batch size: 43, lr: 5.95e-03, grad_scale: 32.0 2024-09-23 22:36:26,307 INFO [train.py:1198] (0/4) Epoch 20, batch 3350, loss[loss=0.2163, ctc_loss=0.143, cr_loss=0.3666, over 17077.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1414, cr_loss=0.3561, over 3380825.62 frames. ], batch size: 43, lr: 5.95e-03, grad_scale: 16.0 2024-09-23 22:36:35,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=12.0 2024-09-23 22:36:56,430 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.040e+02 1.260e+02 1.354e+02 1.461e+02 2.333e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-23 22:37:23,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=361218.6666666667, ans=0.0 2024-09-23 22:37:46,672 INFO [train.py:1198] (0/4) Epoch 20, batch 3400, loss[loss=0.2091, ctc_loss=0.1409, cr_loss=0.3413, over 16947.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1415, cr_loss=0.3568, over 3380980.97 frames. ], batch size: 58, lr: 5.95e-03, grad_scale: 16.0 2024-09-23 22:37:51,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=361312.0, ans=0.125 2024-09-23 22:38:02,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=361358.6666666667, ans=0.0 2024-09-23 22:38:22,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=361405.3333333333, ans=0.0 2024-09-23 22:39:01,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361498.6666666667, ans=0.1 2024-09-23 22:39:06,006 INFO [train.py:1198] (0/4) Epoch 20, batch 3450, loss[loss=0.2533, ctc_loss=0.1685, cr_loss=0.424, over 16976.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1417, cr_loss=0.3567, over 3377703.86 frames. ], batch size: 53, lr: 5.95e-03, grad_scale: 16.0 2024-09-23 22:39:20,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=361592.0, ans=0.125 2024-09-23 22:39:34,087 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.266e+02 1.362e+02 1.520e+02 1.983e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-23 22:39:54,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=361685.3333333333, ans=0.2 2024-09-23 22:40:09,076 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:40:24,483 INFO [train.py:1198] (0/4) Epoch 20, batch 3500, loss[loss=0.2176, ctc_loss=0.1476, cr_loss=0.3497, over 17234.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1428, cr_loss=0.3582, over 3368927.86 frames. ], batch size: 50, lr: 5.94e-03, grad_scale: 16.0 2024-09-23 22:40:26,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=361778.6666666667, ans=0.0 2024-09-23 22:41:12,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.93 vs. limit=10.0 2024-09-23 22:41:39,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=361965.3333333333, ans=0.0 2024-09-23 22:41:47,246 INFO [train.py:1198] (0/4) Epoch 20, batch 3550, loss[loss=0.2088, ctc_loss=0.1371, cr_loss=0.3585, over 17057.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1424, cr_loss=0.3576, over 3371444.20 frames. ], batch size: 46, lr: 5.94e-03, grad_scale: 16.0 2024-09-23 22:41:53,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=362012.0, ans=0.0 2024-09-23 22:41:53,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=362012.0, ans=0.125 2024-09-23 22:42:00,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=362012.0, ans=0.125 2024-09-23 22:42:06,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-09-23 22:42:15,393 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.278e+02 1.363e+02 1.489e+02 4.233e+02, threshold=2.726e+02, percent-clipped=1.0 2024-09-23 22:42:23,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362105.3333333333, ans=0.1 2024-09-23 22:42:28,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=362105.3333333333, ans=0.1 2024-09-23 22:43:01,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=362198.6666666667, ans=0.2 2024-09-23 22:43:05,645 INFO [train.py:1198] (0/4) Epoch 20, batch 3600, loss[loss=0.2037, ctc_loss=0.1368, cr_loss=0.3344, over 17075.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1423, cr_loss=0.3573, over 3381153.98 frames. ], batch size: 39, lr: 5.94e-03, grad_scale: 32.0 2024-09-23 22:43:09,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362245.3333333333, ans=0.125 2024-09-23 22:43:21,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=362292.0, ans=0.09899494936611666 2024-09-23 22:43:32,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=362292.0, ans=0.125 2024-09-23 22:43:50,148 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=22.5 2024-09-23 22:44:09,245 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.71 vs. limit=5.0 2024-09-23 22:44:19,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=362432.0, ans=0.0 2024-09-23 22:44:20,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=362432.0, ans=0.125 2024-09-23 22:44:21,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2024-09-23 22:44:23,799 INFO [train.py:1198] (0/4) Epoch 20, batch 3650, loss[loss=0.2362, ctc_loss=0.1587, cr_loss=0.3872, over 16548.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1423, cr_loss=0.3578, over 3385790.44 frames. ], batch size: 66, lr: 5.94e-03, grad_scale: 32.0 2024-09-23 22:44:30,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=362478.6666666667, ans=0.0 2024-09-23 22:44:51,968 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.248e+02 1.353e+02 1.463e+02 2.228e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-23 22:44:52,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=362525.3333333333, ans=0.0 2024-09-23 22:45:06,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=362572.0, ans=0.0 2024-09-23 22:45:42,631 INFO [train.py:1198] (0/4) Epoch 20, batch 3700, loss[loss=0.2062, ctc_loss=0.1357, cr_loss=0.3527, over 17281.00 frames. ], tot_loss[loss=0.214, ctc_loss=0.1425, cr_loss=0.3574, over 3383024.35 frames. ], batch size: 44, lr: 5.94e-03, grad_scale: 32.0 2024-09-23 22:45:50,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=362712.0, ans=0.125 2024-09-23 22:46:13,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-09-23 22:46:26,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=362805.3333333333, ans=0.2 2024-09-23 22:46:38,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=362852.0, ans=0.2 2024-09-23 22:47:00,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=362945.3333333333, ans=0.0 2024-09-23 22:47:01,722 INFO [train.py:1198] (0/4) Epoch 20, batch 3750, loss[loss=0.1987, ctc_loss=0.1278, cr_loss=0.3548, over 17012.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1439, cr_loss=0.3596, over 3364141.41 frames. ], batch size: 51, lr: 5.93e-03, grad_scale: 32.0 2024-09-23 22:47:11,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=362945.3333333333, ans=0.025 2024-09-23 22:47:24,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362992.0, ans=0.125 2024-09-23 22:47:25,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=362992.0, ans=0.125 2024-09-23 22:47:27,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=362992.0, ans=0.0 2024-09-23 22:47:30,494 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.286e+02 1.371e+02 1.493e+02 3.473e+02, threshold=2.742e+02, percent-clipped=1.0 2024-09-23 22:48:08,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=363132.0, ans=0.125 2024-09-23 22:48:22,098 INFO [train.py:1198] (0/4) Epoch 20, batch 3800, loss[loss=0.2187, ctc_loss=0.1442, cr_loss=0.3726, over 17295.00 frames. ], tot_loss[loss=0.2159, ctc_loss=0.1442, cr_loss=0.3588, over 3329102.67 frames. ], batch size: 51, lr: 5.93e-03, grad_scale: 32.0 2024-09-23 22:49:07,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.23 vs. limit=15.0 2024-09-23 22:49:14,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363318.6666666667, ans=0.1 2024-09-23 22:49:14,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=363318.6666666667, ans=0.0 2024-09-23 22:49:37,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363365.3333333333, ans=0.1 2024-09-23 22:49:40,057 INFO [train.py:1198] (0/4) Epoch 20, batch 3850, loss[loss=0.2617, ctc_loss=0.186, cr_loss=0.3785, over 11863.00 frames. ], tot_loss[loss=0.2174, ctc_loss=0.1455, cr_loss=0.3593, over 3264108.73 frames. ], batch size: 123, lr: 5.93e-03, grad_scale: 32.0 2024-09-23 22:49:40,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=363412.0, ans=0.2 2024-09-23 22:49:51,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=363412.0, ans=0.125 2024-09-23 22:50:05,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=363458.6666666667, ans=0.025 2024-09-23 22:50:08,337 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.324e+02 1.457e+02 1.626e+02 2.908e+02, threshold=2.914e+02, percent-clipped=1.0 2024-09-23 22:50:10,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=363505.3333333333, ans=0.5 2024-09-23 22:50:20,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363505.3333333333, ans=0.1 2024-09-23 22:50:40,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=363598.6666666667, ans=0.0 2024-09-23 22:50:50,822 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-20.pt 2024-09-23 22:51:41,132 INFO [train.py:1198] (0/4) Epoch 21, batch 0, loss[loss=0.177, ctc_loss=0.1137, cr_loss=0.3165, over 17012.00 frames. ], tot_loss[loss=0.177, ctc_loss=0.1137, cr_loss=0.3165, over 17012.00 frames. ], batch size: 39, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:51:41,133 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-23 22:51:57,021 INFO [train.py:1230] (0/4) Epoch 21, validation: loss=0.03907, ctc_loss=0.03907, cr_loss=7.91e-15, over 944034.00 frames. 2024-09-23 22:51:57,021 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-23 22:52:26,679 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:52:41,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363720.0, ans=0.0 2024-09-23 22:52:47,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=363766.6666666667, ans=0.0 2024-09-23 22:53:00,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=363766.6666666667, ans=0.125 2024-09-23 22:53:00,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=363766.6666666667, ans=0.125 2024-09-23 22:53:18,866 INFO [train.py:1198] (0/4) Epoch 21, batch 50, loss[loss=0.2451, ctc_loss=0.1639, cr_loss=0.406, over 16013.00 frames. ], tot_loss[loss=0.2177, ctc_loss=0.1449, cr_loss=0.3638, over 759383.97 frames. ], batch size: 74, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:53:19,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=363860.0, ans=0.125 2024-09-23 22:53:39,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=363906.6666666667, ans=0.125 2024-09-23 22:53:56,544 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.319e+02 1.470e+02 1.661e+02 2.685e+02, threshold=2.941e+02, percent-clipped=0.0 2024-09-23 22:54:02,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=22.5 2024-09-23 22:54:37,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2024-09-23 22:54:41,603 INFO [train.py:1198] (0/4) Epoch 21, batch 100, loss[loss=0.187, ctc_loss=0.1219, cr_loss=0.325, over 17239.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1421, cr_loss=0.3592, over 1332035.45 frames. ], batch size: 44, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:54:48,424 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:55:00,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=364140.0, ans=0.125 2024-09-23 22:55:03,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=364140.0, ans=0.0 2024-09-23 22:55:21,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-09-23 22:55:23,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=364186.6666666667, ans=0.0 2024-09-23 22:55:54,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=364280.0, ans=0.0 2024-09-23 22:56:03,805 INFO [train.py:1198] (0/4) Epoch 21, batch 150, loss[loss=0.2446, ctc_loss=0.1631, cr_loss=0.4079, over 17046.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1418, cr_loss=0.3586, over 1784583.88 frames. ], batch size: 52, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:56:04,616 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.23 vs. limit=10.0 2024-09-23 22:56:19,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=364373.3333333333, ans=0.125 2024-09-23 22:56:34,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=364420.0, ans=0.0 2024-09-23 22:56:38,963 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.242e+02 1.342e+02 1.442e+02 2.042e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-23 22:56:45,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=364420.0, ans=0.07 2024-09-23 22:57:29,599 INFO [train.py:1198] (0/4) Epoch 21, batch 200, loss[loss=0.2233, ctc_loss=0.1486, cr_loss=0.3734, over 17364.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1425, cr_loss=0.3599, over 2130364.65 frames. ], batch size: 48, lr: 5.78e-03, grad_scale: 32.0 2024-09-23 22:57:43,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=364560.0, ans=0.0 2024-09-23 22:57:54,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=364606.6666666667, ans=0.125 2024-09-23 22:58:03,821 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 22:58:22,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364700.0, ans=0.1 2024-09-23 22:58:38,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=364746.6666666667, ans=0.0 2024-09-23 22:58:52,294 INFO [train.py:1198] (0/4) Epoch 21, batch 250, loss[loss=0.2062, ctc_loss=0.1365, cr_loss=0.3481, over 17219.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1425, cr_loss=0.3605, over 2399802.92 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 22:59:27,571 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.024e+02 1.297e+02 1.364e+02 1.577e+02 2.161e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-23 23:00:01,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=364980.0, ans=0.2 2024-09-23 23:00:12,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=364980.0, ans=0.0 2024-09-23 23:00:15,864 INFO [train.py:1198] (0/4) Epoch 21, batch 300, loss[loss=0.194, ctc_loss=0.1287, cr_loss=0.3265, over 16955.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1424, cr_loss=0.359, over 2617331.91 frames. ], batch size: 42, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 23:00:22,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2024-09-23 23:00:26,350 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2024-09-23 23:00:51,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2024-09-23 23:00:57,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=365120.0, ans=0.09899494936611666 2024-09-23 23:00:59,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=365120.0, ans=0.0 2024-09-23 23:01:00,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=365120.0, ans=0.0 2024-09-23 23:01:01,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2024-09-23 23:01:07,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=365166.6666666667, ans=0.125 2024-09-23 23:01:18,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=365213.3333333333, ans=0.0 2024-09-23 23:01:21,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=365213.3333333333, ans=0.05 2024-09-23 23:01:29,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=365213.3333333333, ans=0.2 2024-09-23 23:01:34,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2024-09-23 23:01:35,407 INFO [train.py:1198] (0/4) Epoch 21, batch 350, loss[loss=0.2257, ctc_loss=0.1468, cr_loss=0.3945, over 17002.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.142, cr_loss=0.3587, over 2787745.50 frames. ], batch size: 53, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 23:02:12,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365353.3333333333, ans=0.1 2024-09-23 23:02:16,762 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.263e+02 1.348e+02 1.461e+02 2.184e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-23 23:02:24,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=365353.3333333333, ans=0.125 2024-09-23 23:02:53,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=365446.6666666667, ans=0.125 2024-09-23 23:03:01,498 INFO [train.py:1198] (0/4) Epoch 21, batch 400, loss[loss=0.222, ctc_loss=0.1501, cr_loss=0.3593, over 17027.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1428, cr_loss=0.3597, over 2912943.20 frames. ], batch size: 51, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 23:03:25,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=365540.0, ans=0.125 2024-09-23 23:03:31,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=365540.0, ans=0.0 2024-09-23 23:04:23,585 INFO [train.py:1198] (0/4) Epoch 21, batch 450, loss[loss=0.2103, ctc_loss=0.141, cr_loss=0.3463, over 17222.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.1419, cr_loss=0.3581, over 3019475.52 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 32.0 2024-09-23 23:04:24,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-23 23:04:54,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=365820.0, ans=0.0 2024-09-23 23:04:59,071 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.052e+02 1.246e+02 1.320e+02 1.437e+02 2.140e+02, threshold=2.640e+02, percent-clipped=0.0 2024-09-23 23:05:01,059 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:05:11,611 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:05:24,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=365866.6666666667, ans=0.125 2024-09-23 23:05:46,208 INFO [train.py:1198] (0/4) Epoch 21, batch 500, loss[loss=0.2492, ctc_loss=0.1712, cr_loss=0.3901, over 16025.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1426, cr_loss=0.3588, over 3088073.06 frames. ], batch size: 74, lr: 5.76e-03, grad_scale: 32.0 2024-09-23 23:05:51,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=365960.0, ans=0.0 2024-09-23 23:06:12,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2024-09-23 23:06:17,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-09-23 23:07:11,260 INFO [train.py:1198] (0/4) Epoch 21, batch 550, loss[loss=0.236, ctc_loss=0.1563, cr_loss=0.3987, over 17038.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1424, cr_loss=0.3585, over 3145818.14 frames. ], batch size: 52, lr: 5.76e-03, grad_scale: 32.0 2024-09-23 23:07:13,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=366193.3333333333, ans=0.1 2024-09-23 23:07:32,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=366240.0, ans=0.0 2024-09-23 23:07:37,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=366240.0, ans=0.125 2024-09-23 23:07:46,483 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.227e+02 1.339e+02 1.480e+02 1.923e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-23 23:08:12,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=366333.3333333333, ans=0.125 2024-09-23 23:08:21,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=366380.0, ans=0.125 2024-09-23 23:08:22,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366380.0, ans=0.1 2024-09-23 23:08:27,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=366380.0, ans=0.2 2024-09-23 23:08:33,668 INFO [train.py:1198] (0/4) Epoch 21, batch 600, loss[loss=0.2256, ctc_loss=0.1529, cr_loss=0.3638, over 17174.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1427, cr_loss=0.3589, over 3188748.67 frames. ], batch size: 55, lr: 5.76e-03, grad_scale: 32.0 2024-09-23 23:08:43,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=366426.6666666667, ans=0.0 2024-09-23 23:09:31,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=366566.6666666667, ans=0.125 2024-09-23 23:09:39,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=366613.3333333333, ans=0.125 2024-09-23 23:09:42,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=366613.3333333333, ans=0.125 2024-09-23 23:09:53,536 INFO [train.py:1198] (0/4) Epoch 21, batch 650, loss[loss=0.1889, ctc_loss=0.1231, cr_loss=0.3292, over 17254.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1428, cr_loss=0.359, over 3233133.89 frames. ], batch size: 44, lr: 5.76e-03, grad_scale: 16.0 2024-09-23 23:10:12,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=366706.6666666667, ans=0.125 2024-09-23 23:10:32,740 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.251e+02 1.353e+02 1.422e+02 2.083e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-23 23:10:51,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=366800.0, ans=0.0 2024-09-23 23:11:15,973 INFO [train.py:1198] (0/4) Epoch 21, batch 700, loss[loss=0.2462, ctc_loss=0.1664, cr_loss=0.399, over 16999.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1425, cr_loss=0.3586, over 3256193.94 frames. ], batch size: 53, lr: 5.76e-03, grad_scale: 16.0 2024-09-23 23:11:29,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=366893.3333333333, ans=0.125 2024-09-23 23:11:40,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=366940.0, ans=0.125 2024-09-23 23:11:54,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=366986.6666666667, ans=0.125 2024-09-23 23:11:59,268 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:12:05,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=366986.6666666667, ans=0.125 2024-09-23 23:12:11,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=367033.3333333333, ans=0.95 2024-09-23 23:12:24,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2024-09-23 23:12:33,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=367080.0, ans=0.0 2024-09-23 23:12:38,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=367080.0, ans=0.125 2024-09-23 23:12:41,460 INFO [train.py:1198] (0/4) Epoch 21, batch 750, loss[loss=0.2332, ctc_loss=0.1568, cr_loss=0.382, over 16163.00 frames. ], tot_loss[loss=0.2145, ctc_loss=0.1427, cr_loss=0.3589, over 3275789.84 frames. ], batch size: 74, lr: 5.76e-03, grad_scale: 16.0 2024-09-23 23:12:41,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=367126.6666666667, ans=0.0 2024-09-23 23:12:45,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2024-09-23 23:13:08,879 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2024-09-23 23:13:11,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=367173.3333333333, ans=0.0 2024-09-23 23:13:21,100 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.308e+02 1.415e+02 1.587e+02 2.063e+02, threshold=2.831e+02, percent-clipped=0.0 2024-09-23 23:13:29,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=367220.0, ans=0.025 2024-09-23 23:13:48,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=367313.3333333333, ans=0.5 2024-09-23 23:14:02,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=367313.3333333333, ans=0.025 2024-09-23 23:14:04,987 INFO [train.py:1198] (0/4) Epoch 21, batch 800, loss[loss=0.192, ctc_loss=0.1254, cr_loss=0.3329, over 17105.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1435, cr_loss=0.3602, over 3293810.59 frames. ], batch size: 49, lr: 5.75e-03, grad_scale: 32.0 2024-09-23 23:14:22,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=367406.6666666667, ans=0.0 2024-09-23 23:14:23,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367406.6666666667, ans=0.1 2024-09-23 23:14:38,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=367453.3333333333, ans=0.07 2024-09-23 23:14:38,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=367453.3333333333, ans=0.0 2024-09-23 23:14:46,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=367453.3333333333, ans=0.125 2024-09-23 23:14:50,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2024-09-23 23:14:57,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=367500.0, ans=0.125 2024-09-23 23:15:27,328 INFO [train.py:1198] (0/4) Epoch 21, batch 850, loss[loss=0.2585, ctc_loss=0.1753, cr_loss=0.4159, over 15026.00 frames. ], tot_loss[loss=0.2167, ctc_loss=0.1445, cr_loss=0.3607, over 3289448.77 frames. ], batch size: 89, lr: 5.75e-03, grad_scale: 32.0 2024-09-23 23:15:30,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=367593.3333333333, ans=0.0 2024-09-23 23:15:40,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=367593.3333333333, ans=0.2 2024-09-23 23:16:05,383 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.235e+02 1.359e+02 1.551e+02 2.196e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-23 23:16:09,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.65 vs. limit=10.0 2024-09-23 23:16:53,154 INFO [train.py:1198] (0/4) Epoch 21, batch 900, loss[loss=0.207, ctc_loss=0.138, cr_loss=0.3451, over 16997.00 frames. ], tot_loss[loss=0.2169, ctc_loss=0.1446, cr_loss=0.3615, over 3300693.21 frames. ], batch size: 53, lr: 5.75e-03, grad_scale: 16.0 2024-09-23 23:17:09,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=367873.3333333333, ans=0.2 2024-09-23 23:17:33,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=367920.0, ans=0.0 2024-09-23 23:18:15,417 INFO [train.py:1198] (0/4) Epoch 21, batch 950, loss[loss=0.2459, ctc_loss=0.1688, cr_loss=0.3856, over 16731.00 frames. ], tot_loss[loss=0.2161, ctc_loss=0.144, cr_loss=0.3604, over 3302504.33 frames. ], batch size: 61, lr: 5.75e-03, grad_scale: 16.0 2024-09-23 23:18:25,641 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-09-23 23:18:41,059 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:18:42,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=368106.6666666667, ans=0.125 2024-09-23 23:18:48,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=368153.3333333333, ans=0.015 2024-09-23 23:18:50,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=368153.3333333333, ans=0.125 2024-09-23 23:18:53,167 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.253e+02 1.356e+02 1.481e+02 2.989e+02, threshold=2.713e+02, percent-clipped=1.0 2024-09-23 23:19:06,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=368200.0, ans=0.125 2024-09-23 23:19:08,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.60 vs. limit=10.0 2024-09-23 23:19:19,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=368246.6666666667, ans=0.125 2024-09-23 23:19:34,879 INFO [train.py:1198] (0/4) Epoch 21, batch 1000, loss[loss=0.2223, ctc_loss=0.1482, cr_loss=0.3703, over 17029.00 frames. ], tot_loss[loss=0.2157, ctc_loss=0.1436, cr_loss=0.3605, over 3317728.88 frames. ], batch size: 44, lr: 5.75e-03, grad_scale: 16.0 2024-09-23 23:19:57,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=368340.0, ans=0.025 2024-09-23 23:20:02,091 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:20:13,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=368386.6666666667, ans=0.125 2024-09-23 23:20:14,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=368386.6666666667, ans=0.125 2024-09-23 23:20:16,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=368386.6666666667, ans=0.05 2024-09-23 23:20:18,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=368386.6666666667, ans=0.125 2024-09-23 23:20:21,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=368386.6666666667, ans=0.125 2024-09-23 23:20:58,209 INFO [train.py:1198] (0/4) Epoch 21, batch 1050, loss[loss=0.2633, ctc_loss=0.1892, cr_loss=0.3704, over 11309.00 frames. ], tot_loss[loss=0.216, ctc_loss=0.1439, cr_loss=0.3604, over 3312014.92 frames. ], batch size: 123, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:21:13,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368573.3333333333, ans=0.1 2024-09-23 23:21:33,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368620.0, ans=0.1 2024-09-23 23:21:39,785 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.259e+02 1.337e+02 1.472e+02 2.048e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-23 23:21:43,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=368620.0, ans=0.5 2024-09-23 23:22:11,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=368713.3333333333, ans=0.1 2024-09-23 23:22:14,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=368713.3333333333, ans=0.125 2024-09-23 23:22:21,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.35 vs. limit=6.0 2024-09-23 23:22:24,118 INFO [train.py:1198] (0/4) Epoch 21, batch 1100, loss[loss=0.2531, ctc_loss=0.1736, cr_loss=0.3973, over 15063.00 frames. ], tot_loss[loss=0.2158, ctc_loss=0.1437, cr_loss=0.3607, over 3325037.09 frames. ], batch size: 89, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:22:24,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2024-09-23 23:22:25,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2024-09-23 23:22:26,369 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2024-09-23 23:23:46,069 INFO [train.py:1198] (0/4) Epoch 21, batch 1150, loss[loss=0.2067, ctc_loss=0.1356, cr_loss=0.3556, over 17086.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1419, cr_loss=0.3584, over 3341018.78 frames. ], batch size: 43, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:24:08,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=369040.0, ans=0.1 2024-09-23 23:24:24,271 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.242e+02 1.332e+02 1.452e+02 2.606e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-23 23:24:26,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=369086.6666666667, ans=0.0 2024-09-23 23:24:30,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=369086.6666666667, ans=0.0 2024-09-23 23:24:51,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=369180.0, ans=0.0 2024-09-23 23:25:08,570 INFO [train.py:1198] (0/4) Epoch 21, batch 1200, loss[loss=0.2506, ctc_loss=0.168, cr_loss=0.4126, over 17019.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1427, cr_loss=0.3597, over 3341342.44 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 32.0 2024-09-23 23:25:13,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=369226.6666666667, ans=0.125 2024-09-23 23:26:27,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=369413.3333333333, ans=0.0 2024-09-23 23:26:30,658 INFO [train.py:1198] (0/4) Epoch 21, batch 1250, loss[loss=0.1689, ctc_loss=0.1078, cr_loss=0.3056, over 17093.00 frames. ], tot_loss[loss=0.2151, ctc_loss=0.143, cr_loss=0.3603, over 3344271.82 frames. ], batch size: 43, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:26:46,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=369460.0, ans=0.0 2024-09-23 23:26:51,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=369506.6666666667, ans=0.125 2024-09-23 23:27:13,546 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.996e+01 1.239e+02 1.348e+02 1.443e+02 2.209e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-23 23:27:33,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=369600.0, ans=0.0 2024-09-23 23:27:36,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=369646.6666666667, ans=0.2 2024-09-23 23:27:56,569 INFO [train.py:1198] (0/4) Epoch 21, batch 1300, loss[loss=0.201, ctc_loss=0.1334, cr_loss=0.3378, over 17007.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1425, cr_loss=0.3595, over 3352844.73 frames. ], batch size: 51, lr: 5.74e-03, grad_scale: 16.0 2024-09-23 23:28:11,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=369740.0, ans=0.125 2024-09-23 23:28:11,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=369740.0, ans=0.125 2024-09-23 23:28:18,396 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.49 vs. limit=15.0 2024-09-23 23:28:49,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=369833.3333333333, ans=0.0 2024-09-23 23:29:01,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=369880.0, ans=0.0 2024-09-23 23:29:02,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=369880.0, ans=0.125 2024-09-23 23:29:16,752 INFO [train.py:1198] (0/4) Epoch 21, batch 1350, loss[loss=0.185, ctc_loss=0.1271, cr_loss=0.2895, over 17093.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1427, cr_loss=0.3594, over 3357261.62 frames. ], batch size: 49, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:29:58,705 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.263e+02 1.337e+02 1.513e+02 2.330e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-23 23:30:38,707 INFO [train.py:1198] (0/4) Epoch 21, batch 1400, loss[loss=0.2544, ctc_loss=0.1712, cr_loss=0.4163, over 16586.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1427, cr_loss=0.3594, over 3359958.40 frames. ], batch size: 66, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:32:03,659 INFO [train.py:1198] (0/4) Epoch 21, batch 1450, loss[loss=0.1899, ctc_loss=0.1235, cr_loss=0.3323, over 16964.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.142, cr_loss=0.3584, over 3364838.58 frames. ], batch size: 42, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:32:43,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=370486.6666666667, ans=0.0 2024-09-23 23:32:46,581 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.255e+02 1.316e+02 1.424e+02 2.117e+02, threshold=2.631e+02, percent-clipped=0.0 2024-09-23 23:32:58,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=370533.3333333333, ans=0.0 2024-09-23 23:33:01,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=370533.3333333333, ans=0.0 2024-09-23 23:33:07,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=370533.3333333333, ans=0.025 2024-09-23 23:33:26,656 INFO [train.py:1198] (0/4) Epoch 21, batch 1500, loss[loss=0.2445, ctc_loss=0.1646, cr_loss=0.3994, over 16987.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.1428, cr_loss=0.3597, over 3356380.36 frames. ], batch size: 53, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:33:28,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=370626.6666666667, ans=0.125 2024-09-23 23:33:52,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=370673.3333333333, ans=0.125 2024-09-23 23:33:55,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=22.5 2024-09-23 23:34:08,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=370720.0, ans=0.125 2024-09-23 23:34:13,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=370766.6666666667, ans=0.125 2024-09-23 23:34:13,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=370766.6666666667, ans=0.125 2024-09-23 23:34:15,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370766.6666666667, ans=0.1 2024-09-23 23:34:49,372 INFO [train.py:1198] (0/4) Epoch 21, batch 1550, loss[loss=0.2236, ctc_loss=0.1493, cr_loss=0.3717, over 16990.00 frames. ], tot_loss[loss=0.2144, ctc_loss=0.1425, cr_loss=0.3593, over 3361621.13 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2024-09-23 23:34:55,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=370860.0, ans=0.0 2024-09-23 23:35:00,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=370860.0, ans=0.125 2024-09-23 23:35:29,478 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.240e+02 1.330e+02 1.451e+02 2.029e+02, threshold=2.659e+02, percent-clipped=0.0 2024-09-23 23:36:01,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=371046.6666666667, ans=0.0 2024-09-23 23:36:12,040 INFO [train.py:1198] (0/4) Epoch 21, batch 1600, loss[loss=0.2566, ctc_loss=0.1748, cr_loss=0.4088, over 16914.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1419, cr_loss=0.3587, over 3370994.51 frames. ], batch size: 58, lr: 5.73e-03, grad_scale: 32.0 2024-09-23 23:36:16,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=371093.3333333333, ans=0.125 2024-09-23 23:36:22,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-09-23 23:37:37,188 INFO [train.py:1198] (0/4) Epoch 21, batch 1650, loss[loss=0.2489, ctc_loss=0.1771, cr_loss=0.359, over 11593.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1414, cr_loss=0.3572, over 3363546.65 frames. ], batch size: 123, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:37:49,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2024-09-23 23:38:03,082 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:38:16,965 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.286e+02 1.382e+02 1.548e+02 2.796e+02, threshold=2.764e+02, percent-clipped=1.0 2024-09-23 23:38:25,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2024-09-23 23:38:39,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=371513.3333333333, ans=0.05 2024-09-23 23:38:51,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.80 vs. limit=10.0 2024-09-23 23:38:52,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-09-23 23:38:56,662 INFO [train.py:1198] (0/4) Epoch 21, batch 1700, loss[loss=0.2325, ctc_loss=0.1535, cr_loss=0.3948, over 17079.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1414, cr_loss=0.3569, over 3357953.79 frames. ], batch size: 56, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:39:03,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=22.5 2024-09-23 23:39:16,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.07 vs. limit=12.0 2024-09-23 23:39:22,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=15.0 2024-09-23 23:39:42,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371653.3333333333, ans=0.1 2024-09-23 23:39:54,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=371700.0, ans=0.0 2024-09-23 23:39:55,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.20 vs. limit=15.0 2024-09-23 23:39:57,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2024-09-23 23:39:58,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=371700.0, ans=0.2 2024-09-23 23:39:59,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=371700.0, ans=0.125 2024-09-23 23:40:18,440 INFO [train.py:1198] (0/4) Epoch 21, batch 1750, loss[loss=0.215, ctc_loss=0.1427, cr_loss=0.3615, over 17232.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1424, cr_loss=0.3585, over 3351104.05 frames. ], batch size: 50, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:40:41,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371840.0, ans=0.1 2024-09-23 23:40:44,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=371840.0, ans=0.0 2024-09-23 23:40:54,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2024-09-23 23:40:58,127 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.218e+02 1.302e+02 1.379e+02 1.745e+02, threshold=2.603e+02, percent-clipped=0.0 2024-09-23 23:41:12,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=371933.3333333333, ans=0.125 2024-09-23 23:41:23,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=371980.0, ans=0.125 2024-09-23 23:41:28,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=371980.0, ans=0.0 2024-09-23 23:41:40,628 INFO [train.py:1198] (0/4) Epoch 21, batch 1800, loss[loss=0.2359, ctc_loss=0.1556, cr_loss=0.4015, over 16119.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1421, cr_loss=0.3576, over 3352972.69 frames. ], batch size: 74, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:42:20,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372120.0, ans=0.1 2024-09-23 23:42:43,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=372166.6666666667, ans=0.125 2024-09-23 23:42:56,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372213.3333333333, ans=0.1 2024-09-23 23:43:05,768 INFO [train.py:1198] (0/4) Epoch 21, batch 1850, loss[loss=0.2398, ctc_loss=0.1625, cr_loss=0.3864, over 16749.00 frames. ], tot_loss[loss=0.2147, ctc_loss=0.1429, cr_loss=0.359, over 3357592.99 frames. ], batch size: 61, lr: 5.72e-03, grad_scale: 32.0 2024-09-23 23:43:07,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=372260.0, ans=0.125 2024-09-23 23:43:25,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=372306.6666666667, ans=0.125 2024-09-23 23:43:34,753 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-23 23:43:45,460 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.299e+02 1.413e+02 1.617e+02 3.494e+02, threshold=2.826e+02, percent-clipped=2.0 2024-09-23 23:43:58,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=372400.0, ans=0.2 2024-09-23 23:43:58,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=22.5 2024-09-23 23:44:28,064 INFO [train.py:1198] (0/4) Epoch 21, batch 1900, loss[loss=0.1941, ctc_loss=0.128, cr_loss=0.3305, over 17095.00 frames. ], tot_loss[loss=0.2138, ctc_loss=0.1421, cr_loss=0.3581, over 3349868.80 frames. ], batch size: 49, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:44:28,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=372493.3333333333, ans=0.0 2024-09-23 23:44:30,662 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=22.5 2024-09-23 23:44:39,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-09-23 23:44:58,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-09-23 23:45:19,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=372633.3333333333, ans=0.125 2024-09-23 23:45:30,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=372680.0, ans=0.2 2024-09-23 23:45:34,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=372680.0, ans=0.2 2024-09-23 23:45:47,538 INFO [train.py:1198] (0/4) Epoch 21, batch 1950, loss[loss=0.2281, ctc_loss=0.1524, cr_loss=0.3787, over 17208.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.142, cr_loss=0.3578, over 3346432.49 frames. ], batch size: 47, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:45:59,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.60 vs. limit=22.5 2024-09-23 23:46:06,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=372773.3333333333, ans=0.0 2024-09-23 23:46:09,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=372773.3333333333, ans=15.0 2024-09-23 23:46:14,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=372773.3333333333, ans=0.125 2024-09-23 23:46:22,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=372820.0, ans=0.0 2024-09-23 23:46:29,779 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.272e+02 1.378e+02 1.476e+02 2.541e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-23 23:46:44,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-23 23:46:56,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=372913.3333333333, ans=0.2 2024-09-23 23:47:12,288 INFO [train.py:1198] (0/4) Epoch 21, batch 2000, loss[loss=0.1789, ctc_loss=0.1169, cr_loss=0.3102, over 16958.00 frames. ], tot_loss[loss=0.2127, ctc_loss=0.1414, cr_loss=0.3566, over 3356460.87 frames. ], batch size: 42, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:47:47,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373053.3333333333, ans=0.1 2024-09-23 23:47:58,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=373053.3333333333, ans=0.125 2024-09-23 23:48:04,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=373100.0, ans=0.0 2024-09-23 23:48:14,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373100.0, ans=0.125 2024-09-23 23:48:23,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=373146.6666666667, ans=0.125 2024-09-23 23:48:23,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373146.6666666667, ans=0.125 2024-09-23 23:48:34,485 INFO [train.py:1198] (0/4) Epoch 21, batch 2050, loss[loss=0.1966, ctc_loss=0.1288, cr_loss=0.3392, over 16713.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1408, cr_loss=0.3563, over 3360731.02 frames. ], batch size: 37, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:48:57,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=373240.0, ans=0.0 2024-09-23 23:49:05,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=373286.6666666667, ans=0.0 2024-09-23 23:49:08,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=373286.6666666667, ans=0.125 2024-09-23 23:49:09,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=373286.6666666667, ans=0.1 2024-09-23 23:49:14,303 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.256e+02 1.387e+02 1.531e+02 2.398e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-23 23:49:20,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=373286.6666666667, ans=0.0 2024-09-23 23:49:23,700 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-80000.pt 2024-09-23 23:49:45,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=373380.0, ans=0.125 2024-09-23 23:49:59,551 INFO [train.py:1198] (0/4) Epoch 21, batch 2100, loss[loss=0.1923, ctc_loss=0.1223, cr_loss=0.3497, over 17053.00 frames. ], tot_loss[loss=0.2112, ctc_loss=0.1402, cr_loss=0.3551, over 3367153.45 frames. ], batch size: 39, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:50:33,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=373520.0, ans=0.1 2024-09-23 23:51:19,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=12.0 2024-09-23 23:51:21,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373660.0, ans=0.1 2024-09-23 23:51:22,281 INFO [train.py:1198] (0/4) Epoch 21, batch 2150, loss[loss=0.1888, ctc_loss=0.1238, cr_loss=0.325, over 17185.00 frames. ], tot_loss[loss=0.2117, ctc_loss=0.1405, cr_loss=0.3557, over 3360850.18 frames. ], batch size: 45, lr: 5.71e-03, grad_scale: 32.0 2024-09-23 23:51:42,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=373706.6666666667, ans=0.0 2024-09-23 23:51:57,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2024-09-23 23:52:03,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=373753.3333333333, ans=0.125 2024-09-23 23:52:04,668 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.257e+02 1.382e+02 1.555e+02 2.014e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-23 23:52:08,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-09-23 23:52:09,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=373753.3333333333, ans=0.0 2024-09-23 23:52:32,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=373846.6666666667, ans=0.0 2024-09-23 23:52:47,134 INFO [train.py:1198] (0/4) Epoch 21, batch 2200, loss[loss=0.2146, ctc_loss=0.145, cr_loss=0.3481, over 17316.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1409, cr_loss=0.356, over 3352955.63 frames. ], batch size: 51, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:52:58,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=373893.3333333333, ans=0.0 2024-09-23 23:53:09,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=373940.0, ans=0.2 2024-09-23 23:53:14,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=373940.0, ans=0.0 2024-09-23 23:53:21,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=373986.6666666667, ans=0.125 2024-09-23 23:53:35,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=374033.3333333333, ans=0.125 2024-09-23 23:53:40,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.64 vs. limit=10.0 2024-09-23 23:54:06,787 INFO [train.py:1198] (0/4) Epoch 21, batch 2250, loss[loss=0.2186, ctc_loss=0.1466, cr_loss=0.3599, over 17356.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1418, cr_loss=0.357, over 3348619.83 frames. ], batch size: 48, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:54:07,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=374126.6666666667, ans=0.2 2024-09-23 23:54:10,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=374126.6666666667, ans=0.07 2024-09-23 23:54:13,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374126.6666666667, ans=0.125 2024-09-23 23:54:16,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=374126.6666666667, ans=0.125 2024-09-23 23:54:24,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=12.0 2024-09-23 23:54:27,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=374173.3333333333, ans=0.125 2024-09-23 23:54:35,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=374173.3333333333, ans=0.2 2024-09-23 23:54:36,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=374173.3333333333, ans=0.125 2024-09-23 23:54:43,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=374220.0, ans=0.2 2024-09-23 23:54:49,164 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.304e+02 1.416e+02 1.549e+02 2.189e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-23 23:54:49,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=374220.0, ans=0.125 2024-09-23 23:55:14,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-09-23 23:55:17,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-09-23 23:55:18,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=374313.3333333333, ans=0.2 2024-09-23 23:55:29,455 INFO [train.py:1198] (0/4) Epoch 21, batch 2300, loss[loss=0.2373, ctc_loss=0.1621, cr_loss=0.3761, over 15072.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.1411, cr_loss=0.3554, over 3329466.28 frames. ], batch size: 89, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:55:55,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=22.5 2024-09-23 23:56:20,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=374500.0, ans=0.1 2024-09-23 23:56:30,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=374500.0, ans=0.125 2024-09-23 23:56:30,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=374500.0, ans=0.125 2024-09-23 23:56:53,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374593.3333333333, ans=0.1 2024-09-23 23:56:53,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=374593.3333333333, ans=0.125 2024-09-23 23:56:54,584 INFO [train.py:1198] (0/4) Epoch 21, batch 2350, loss[loss=0.2434, ctc_loss=0.1587, cr_loss=0.4239, over 17031.00 frames. ], tot_loss[loss=0.2135, ctc_loss=0.142, cr_loss=0.3576, over 3337088.29 frames. ], batch size: 52, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:57:38,985 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.011e+02 1.274e+02 1.389e+02 1.537e+02 2.355e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-23 23:58:17,353 INFO [train.py:1198] (0/4) Epoch 21, batch 2400, loss[loss=0.1728, ctc_loss=0.1133, cr_loss=0.2977, over 16996.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1424, cr_loss=0.3584, over 3340822.66 frames. ], batch size: 44, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:58:17,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=374826.6666666667, ans=0.125 2024-09-23 23:58:28,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=374826.6666666667, ans=0.125 2024-09-23 23:58:41,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=374873.3333333333, ans=0.2 2024-09-23 23:59:35,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-23 23:59:39,562 INFO [train.py:1198] (0/4) Epoch 21, batch 2450, loss[loss=0.229, ctc_loss=0.149, cr_loss=0.4003, over 16923.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.142, cr_loss=0.3584, over 3346030.04 frames. ], batch size: 58, lr: 5.70e-03, grad_scale: 32.0 2024-09-23 23:59:46,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=375060.0, ans=0.0 2024-09-24 00:00:01,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=375106.6666666667, ans=0.125 2024-09-24 00:00:20,978 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.266e+02 1.364e+02 1.496e+02 2.065e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-24 00:00:24,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375153.3333333333, ans=0.1 2024-09-24 00:00:42,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2024-09-24 00:01:02,064 INFO [train.py:1198] (0/4) Epoch 21, batch 2500, loss[loss=0.2503, ctc_loss=0.1727, cr_loss=0.3879, over 11959.00 frames. ], tot_loss[loss=0.2143, ctc_loss=0.1424, cr_loss=0.3595, over 3345448.51 frames. ], batch size: 123, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:01:55,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=375433.3333333333, ans=0.1 2024-09-24 00:02:26,981 INFO [train.py:1198] (0/4) Epoch 21, batch 2550, loss[loss=0.2635, ctc_loss=0.1881, cr_loss=0.3767, over 11896.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1421, cr_loss=0.358, over 3335659.61 frames. ], batch size: 123, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:02:27,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=15.0 2024-09-24 00:02:33,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=375526.6666666667, ans=0.125 2024-09-24 00:02:39,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=375526.6666666667, ans=0.0 2024-09-24 00:03:08,316 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.019e+02 1.246e+02 1.350e+02 1.498e+02 2.836e+02, threshold=2.701e+02, percent-clipped=1.0 2024-09-24 00:03:29,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=375713.3333333333, ans=0.0 2024-09-24 00:03:46,558 INFO [train.py:1198] (0/4) Epoch 21, batch 2600, loss[loss=0.2159, ctc_loss=0.1425, cr_loss=0.3669, over 17301.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1418, cr_loss=0.358, over 3349232.49 frames. ], batch size: 46, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:03:47,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2024-09-24 00:03:50,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=22.5 2024-09-24 00:05:08,530 INFO [train.py:1198] (0/4) Epoch 21, batch 2650, loss[loss=0.2002, ctc_loss=0.1332, cr_loss=0.3351, over 17139.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1421, cr_loss=0.3582, over 3356996.27 frames. ], batch size: 45, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:05:09,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-09-24 00:05:14,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=22.5 2024-09-24 00:05:16,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=375993.3333333333, ans=0.125 2024-09-24 00:05:23,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=22.5 2024-09-24 00:05:29,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=376040.0, ans=0.0 2024-09-24 00:05:37,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=376040.0, ans=0.125 2024-09-24 00:05:42,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=376086.6666666667, ans=0.125 2024-09-24 00:05:52,572 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.243e+02 1.351e+02 1.464e+02 2.576e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-24 00:05:58,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=376133.3333333333, ans=0.125 2024-09-24 00:05:58,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=376133.3333333333, ans=0.125 2024-09-24 00:05:59,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2024-09-24 00:06:26,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2024-09-24 00:06:33,544 INFO [train.py:1198] (0/4) Epoch 21, batch 2700, loss[loss=0.2491, ctc_loss=0.1655, cr_loss=0.4182, over 17226.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1422, cr_loss=0.3584, over 3360298.35 frames. ], batch size: 55, lr: 5.69e-03, grad_scale: 32.0 2024-09-24 00:07:02,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=22.5 2024-09-24 00:07:09,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2024-09-24 00:07:39,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376413.3333333333, ans=0.1 2024-09-24 00:07:55,724 INFO [train.py:1198] (0/4) Epoch 21, batch 2750, loss[loss=0.1914, ctc_loss=0.1251, cr_loss=0.3314, over 17092.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1415, cr_loss=0.3579, over 3356765.57 frames. ], batch size: 43, lr: 5.68e-03, grad_scale: 32.0 2024-09-24 00:08:00,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=376460.0, ans=0.125 2024-09-24 00:08:07,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=376460.0, ans=0.0 2024-09-24 00:08:18,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=376506.6666666667, ans=0.125 2024-09-24 00:08:22,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2024-09-24 00:08:29,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=376553.3333333333, ans=0.2 2024-09-24 00:08:37,709 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.226e+02 1.342e+02 1.468e+02 2.196e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-24 00:09:10,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=376646.6666666667, ans=0.0 2024-09-24 00:09:10,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=376646.6666666667, ans=0.125 2024-09-24 00:09:18,226 INFO [train.py:1198] (0/4) Epoch 21, batch 2800, loss[loss=0.2156, ctc_loss=0.1455, cr_loss=0.3504, over 16646.00 frames. ], tot_loss[loss=0.2127, ctc_loss=0.1412, cr_loss=0.3576, over 3358426.32 frames. ], batch size: 61, lr: 5.68e-03, grad_scale: 32.0 2024-09-24 00:09:40,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=376740.0, ans=0.125 2024-09-24 00:09:51,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=376786.6666666667, ans=0.125 2024-09-24 00:09:59,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=376786.6666666667, ans=0.125 2024-09-24 00:10:38,125 INFO [train.py:1198] (0/4) Epoch 21, batch 2850, loss[loss=0.1917, ctc_loss=0.1249, cr_loss=0.3344, over 17019.00 frames. ], tot_loss[loss=0.2117, ctc_loss=0.1404, cr_loss=0.3565, over 3365486.77 frames. ], batch size: 39, lr: 5.68e-03, grad_scale: 32.0 2024-09-24 00:11:22,311 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.291e+02 1.370e+02 1.483e+02 1.856e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-24 00:11:22,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=377020.0, ans=0.125 2024-09-24 00:12:03,411 INFO [train.py:1198] (0/4) Epoch 21, batch 2900, loss[loss=0.2283, ctc_loss=0.1506, cr_loss=0.3887, over 16615.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1415, cr_loss=0.3581, over 3370743.87 frames. ], batch size: 66, lr: 5.68e-03, grad_scale: 32.0 2024-09-24 00:12:14,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=377160.0, ans=0.0 2024-09-24 00:12:30,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=377206.6666666667, ans=0.125 2024-09-24 00:12:44,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=377253.3333333333, ans=0.05 2024-09-24 00:13:26,190 INFO [train.py:1198] (0/4) Epoch 21, batch 2950, loss[loss=0.1812, ctc_loss=0.1173, cr_loss=0.3192, over 17100.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1414, cr_loss=0.357, over 3364948.11 frames. ], batch size: 40, lr: 5.68e-03, grad_scale: 16.0 2024-09-24 00:13:48,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=377440.0, ans=0.0 2024-09-24 00:13:50,780 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.51 vs. limit=10.0 2024-09-24 00:14:11,336 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.234e+02 1.337e+02 1.460e+02 2.034e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-24 00:14:11,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=377486.6666666667, ans=0.125 2024-09-24 00:14:17,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=377533.3333333333, ans=0.125 2024-09-24 00:14:24,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2024-09-24 00:14:47,421 INFO [train.py:1198] (0/4) Epoch 21, batch 3000, loss[loss=0.2019, ctc_loss=0.1314, cr_loss=0.3527, over 17287.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.141, cr_loss=0.3562, over 3356974.34 frames. ], batch size: 46, lr: 5.68e-03, grad_scale: 16.0 2024-09-24 00:14:47,422 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 00:15:02,280 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.6958, 4.5777, 4.5803, 4.4761], device='cuda:0') 2024-09-24 00:15:02,884 INFO [train.py:1230] (0/4) Epoch 21, validation: loss=0.03893, ctc_loss=0.03893, cr_loss=7.803e-15, over 944034.00 frames. 2024-09-24 00:15:02,884 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 00:15:08,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-09-24 00:15:09,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=377626.6666666667, ans=0.125 2024-09-24 00:16:07,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=377813.3333333333, ans=0.125 2024-09-24 00:16:21,773 INFO [train.py:1198] (0/4) Epoch 21, batch 3050, loss[loss=0.2199, ctc_loss=0.1447, cr_loss=0.3762, over 17009.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1412, cr_loss=0.3567, over 3345603.18 frames. ], batch size: 44, lr: 5.67e-03, grad_scale: 16.0 2024-09-24 00:16:34,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377860.0, ans=0.1 2024-09-24 00:16:36,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=15.0 2024-09-24 00:17:04,472 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.246e+02 1.316e+02 1.406e+02 3.085e+02, threshold=2.632e+02, percent-clipped=1.0 2024-09-24 00:17:21,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=378000.0, ans=0.2 2024-09-24 00:17:32,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378046.6666666667, ans=0.1 2024-09-24 00:17:34,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2024-09-24 00:17:37,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2024-09-24 00:17:43,200 INFO [train.py:1198] (0/4) Epoch 21, batch 3100, loss[loss=0.2066, ctc_loss=0.1397, cr_loss=0.3342, over 17291.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1417, cr_loss=0.3577, over 3348703.17 frames. ], batch size: 49, lr: 5.67e-03, grad_scale: 16.0 2024-09-24 00:18:27,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=378186.6666666667, ans=0.125 2024-09-24 00:18:39,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=378233.3333333333, ans=0.0 2024-09-24 00:19:04,080 INFO [train.py:1198] (0/4) Epoch 21, batch 3150, loss[loss=0.2073, ctc_loss=0.1351, cr_loss=0.361, over 17044.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.1422, cr_loss=0.3584, over 3346011.79 frames. ], batch size: 46, lr: 5.67e-03, grad_scale: 16.0 2024-09-24 00:19:24,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=378373.3333333333, ans=0.0 2024-09-24 00:19:27,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=378373.3333333333, ans=0.0 2024-09-24 00:19:33,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=378420.0, ans=0.0 2024-09-24 00:19:41,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=378420.0, ans=0.025 2024-09-24 00:19:44,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=378420.0, ans=0.125 2024-09-24 00:19:46,189 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.987e+01 1.255e+02 1.337e+02 1.461e+02 2.046e+02, threshold=2.675e+02, percent-clipped=0.0 2024-09-24 00:19:47,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=378420.0, ans=0.125 2024-09-24 00:19:51,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=378466.6666666667, ans=0.125 2024-09-24 00:20:21,951 INFO [train.py:1198] (0/4) Epoch 21, batch 3200, loss[loss=0.2246, ctc_loss=0.148, cr_loss=0.3833, over 14964.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1419, cr_loss=0.3576, over 3343580.60 frames. ], batch size: 89, lr: 5.67e-03, grad_scale: 32.0 2024-09-24 00:20:22,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=378560.0, ans=0.125 2024-09-24 00:20:25,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=378560.0, ans=0.0 2024-09-24 00:20:30,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=378560.0, ans=0.125 2024-09-24 00:20:41,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=378606.6666666667, ans=0.125 2024-09-24 00:20:50,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2024-09-24 00:21:14,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=378700.0, ans=0.07 2024-09-24 00:21:25,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=378746.6666666667, ans=0.125 2024-09-24 00:21:42,247 INFO [train.py:1198] (0/4) Epoch 21, batch 3250, loss[loss=0.1878, ctc_loss=0.1218, cr_loss=0.33, over 17292.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1419, cr_loss=0.3584, over 3351605.80 frames. ], batch size: 42, lr: 5.67e-03, grad_scale: 32.0 2024-09-24 00:21:54,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=378793.3333333333, ans=0.125 2024-09-24 00:21:56,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=378840.0, ans=0.125 2024-09-24 00:22:12,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-09-24 00:22:21,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=378886.6666666667, ans=0.0 2024-09-24 00:22:24,295 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.280e+02 1.348e+02 1.474e+02 1.911e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 00:22:43,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378980.0, ans=0.1 2024-09-24 00:22:43,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=378980.0, ans=0.0 2024-09-24 00:22:49,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=378980.0, ans=0.0 2024-09-24 00:22:54,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=378980.0, ans=0.0 2024-09-24 00:23:00,168 INFO [train.py:1198] (0/4) Epoch 21, batch 3300, loss[loss=0.2079, ctc_loss=0.138, cr_loss=0.3491, over 17376.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1412, cr_loss=0.3572, over 3359713.52 frames. ], batch size: 48, lr: 5.67e-03, grad_scale: 32.0 2024-09-24 00:23:02,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=379026.6666666667, ans=0.025 2024-09-24 00:23:17,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=379073.3333333333, ans=0.09899494936611666 2024-09-24 00:23:26,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=379073.3333333333, ans=0.125 2024-09-24 00:24:18,313 INFO [train.py:1198] (0/4) Epoch 21, batch 3350, loss[loss=0.2053, ctc_loss=0.1346, cr_loss=0.3533, over 17222.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1417, cr_loss=0.3583, over 3359402.42 frames. ], batch size: 50, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:24:25,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=379260.0, ans=0.125 2024-09-24 00:24:29,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.86 vs. limit=10.0 2024-09-24 00:24:52,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379353.3333333333, ans=0.1 2024-09-24 00:25:02,489 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.292e+02 1.373e+02 1.509e+02 2.435e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-24 00:25:03,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=12.0 2024-09-24 00:25:11,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-24 00:25:16,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=379400.0, ans=0.125 2024-09-24 00:25:24,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=379446.6666666667, ans=0.125 2024-09-24 00:25:26,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=379446.6666666667, ans=0.125 2024-09-24 00:25:38,776 INFO [train.py:1198] (0/4) Epoch 21, batch 3400, loss[loss=0.1827, ctc_loss=0.1207, cr_loss=0.3101, over 17088.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1416, cr_loss=0.3579, over 3357295.65 frames. ], batch size: 43, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:25:46,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=379493.3333333333, ans=0.125 2024-09-24 00:25:52,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=379540.0, ans=0.025 2024-09-24 00:26:56,714 INFO [train.py:1198] (0/4) Epoch 21, batch 3450, loss[loss=0.2395, ctc_loss=0.1613, cr_loss=0.3907, over 16954.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1417, cr_loss=0.358, over 3356417.73 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:27:03,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=379726.6666666667, ans=0.2 2024-09-24 00:27:14,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-09-24 00:27:20,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=379773.3333333333, ans=0.125 2024-09-24 00:27:38,511 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.292e+02 1.385e+02 1.500e+02 2.362e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-24 00:27:48,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=379866.6666666667, ans=0.2 2024-09-24 00:28:16,469 INFO [train.py:1198] (0/4) Epoch 21, batch 3500, loss[loss=0.1977, ctc_loss=0.1294, cr_loss=0.3411, over 17249.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1418, cr_loss=0.3575, over 3342135.63 frames. ], batch size: 44, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:28:16,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=379960.0, ans=0.2 2024-09-24 00:28:28,097 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-24 00:28:31,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-24 00:29:14,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=380100.0, ans=0.0 2024-09-24 00:29:17,260 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-24 00:29:29,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=380146.6666666667, ans=0.025 2024-09-24 00:29:36,656 INFO [train.py:1198] (0/4) Epoch 21, batch 3550, loss[loss=0.2468, ctc_loss=0.17, cr_loss=0.3843, over 16611.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.142, cr_loss=0.3587, over 3341365.07 frames. ], batch size: 66, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:30:01,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=380240.0, ans=0.2 2024-09-24 00:30:08,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=12.0 2024-09-24 00:30:19,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=380286.6666666667, ans=0.2 2024-09-24 00:30:20,801 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.243e+02 1.324e+02 1.443e+02 2.322e+02, threshold=2.648e+02, percent-clipped=0.0 2024-09-24 00:30:56,887 INFO [train.py:1198] (0/4) Epoch 21, batch 3600, loss[loss=0.2203, ctc_loss=0.1446, cr_loss=0.3784, over 17009.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1423, cr_loss=0.3591, over 3348738.77 frames. ], batch size: 53, lr: 5.66e-03, grad_scale: 32.0 2024-09-24 00:31:13,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380473.3333333333, ans=0.1 2024-09-24 00:31:29,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=380520.0, ans=0.2 2024-09-24 00:31:40,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=380520.0, ans=0.0 2024-09-24 00:32:14,928 INFO [train.py:1198] (0/4) Epoch 21, batch 3650, loss[loss=0.2141, ctc_loss=0.1456, cr_loss=0.3429, over 16612.00 frames. ], tot_loss[loss=0.2146, ctc_loss=0.1427, cr_loss=0.3594, over 3344874.63 frames. ], batch size: 66, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:32:42,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380706.6666666667, ans=0.1 2024-09-24 00:32:50,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-09-24 00:32:57,524 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.280e+02 1.369e+02 1.514e+02 2.640e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-24 00:33:07,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=380800.0, ans=0.0 2024-09-24 00:33:18,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=380846.6666666667, ans=0.125 2024-09-24 00:33:18,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=380846.6666666667, ans=0.0 2024-09-24 00:33:21,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=380846.6666666667, ans=0.125 2024-09-24 00:33:27,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=380846.6666666667, ans=0.05 2024-09-24 00:33:35,128 INFO [train.py:1198] (0/4) Epoch 21, batch 3700, loss[loss=0.2075, ctc_loss=0.137, cr_loss=0.3528, over 17201.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1419, cr_loss=0.3583, over 3348648.21 frames. ], batch size: 47, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:33:43,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=380893.3333333333, ans=0.125 2024-09-24 00:33:52,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=380940.0, ans=0.0 2024-09-24 00:33:52,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=380940.0, ans=0.0 2024-09-24 00:34:07,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-09-24 00:34:17,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=380986.6666666667, ans=0.125 2024-09-24 00:34:36,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=381080.0, ans=0.1 2024-09-24 00:34:53,989 INFO [train.py:1198] (0/4) Epoch 21, batch 3750, loss[loss=0.2274, ctc_loss=0.15, cr_loss=0.387, over 16923.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.142, cr_loss=0.3578, over 3334271.55 frames. ], batch size: 58, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:34:57,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=381126.6666666667, ans=0.0 2024-09-24 00:35:16,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=381173.3333333333, ans=0.125 2024-09-24 00:35:22,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=381173.3333333333, ans=0.125 2024-09-24 00:35:27,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381220.0, ans=0.1 2024-09-24 00:35:33,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=381220.0, ans=0.0 2024-09-24 00:35:36,265 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.271e+02 1.353e+02 1.516e+02 2.351e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 00:35:38,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=381220.0, ans=0.0 2024-09-24 00:35:40,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2024-09-24 00:35:50,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=381266.6666666667, ans=0.0 2024-09-24 00:36:08,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=381313.3333333333, ans=0.0 2024-09-24 00:36:12,542 INFO [train.py:1198] (0/4) Epoch 21, batch 3800, loss[loss=0.2617, ctc_loss=0.1759, cr_loss=0.4293, over 15265.00 frames. ], tot_loss[loss=0.2148, ctc_loss=0.143, cr_loss=0.3586, over 3305142.17 frames. ], batch size: 89, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:36:28,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=381406.6666666667, ans=0.125 2024-09-24 00:36:36,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=381406.6666666667, ans=0.1 2024-09-24 00:36:53,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-24 00:36:58,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=381500.0, ans=0.125 2024-09-24 00:37:09,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=381500.0, ans=0.04949747468305833 2024-09-24 00:37:12,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=381500.0, ans=0.0 2024-09-24 00:37:14,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=381546.6666666667, ans=0.0 2024-09-24 00:37:14,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381546.6666666667, ans=0.1 2024-09-24 00:37:23,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=381546.6666666667, ans=0.025 2024-09-24 00:37:31,012 INFO [train.py:1198] (0/4) Epoch 21, batch 3850, loss[loss=0.2629, ctc_loss=0.1866, cr_loss=0.3813, over 11755.00 frames. ], tot_loss[loss=0.2155, ctc_loss=0.1438, cr_loss=0.3587, over 3283418.65 frames. ], batch size: 123, lr: 5.65e-03, grad_scale: 32.0 2024-09-24 00:37:48,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=381640.0, ans=0.125 2024-09-24 00:37:55,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=381640.0, ans=0.0 2024-09-24 00:38:14,456 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.386e+02 1.498e+02 1.641e+02 2.451e+02, threshold=2.996e+02, percent-clipped=0.0 2024-09-24 00:38:29,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=381733.3333333333, ans=0.0 2024-09-24 00:38:43,852 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-21.pt 2024-09-24 00:39:39,374 INFO [train.py:1198] (0/4) Epoch 22, batch 0, loss[loss=0.2315, ctc_loss=0.1553, cr_loss=0.3809, over 16889.00 frames. ], tot_loss[loss=0.2315, ctc_loss=0.1553, cr_loss=0.3809, over 16889.00 frames. ], batch size: 58, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:39:39,375 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 00:39:54,642 INFO [train.py:1230] (0/4) Epoch 22, validation: loss=0.03827, ctc_loss=0.03827, cr_loss=8.092e-15, over 944034.00 frames. 2024-09-24 00:39:54,643 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 00:40:24,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381854.6666666667, ans=0.125 2024-09-24 00:40:38,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0 2024-09-24 00:40:39,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=381901.3333333333, ans=0.125 2024-09-24 00:40:39,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381901.3333333333, ans=0.1 2024-09-24 00:40:45,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-09-24 00:41:11,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=381994.6666666667, ans=0.0 2024-09-24 00:41:17,587 INFO [train.py:1198] (0/4) Epoch 22, batch 50, loss[loss=0.1972, ctc_loss=0.1312, cr_loss=0.3302, over 17008.00 frames. ], tot_loss[loss=0.219, ctc_loss=0.1461, cr_loss=0.3645, over 747787.57 frames. ], batch size: 51, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:41:32,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2024-09-24 00:41:42,132 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=22.5 2024-09-24 00:41:47,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=382134.6666666667, ans=0.0 2024-09-24 00:42:06,807 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.269e+02 1.377e+02 1.581e+02 4.736e+02, threshold=2.753e+02, percent-clipped=1.0 2024-09-24 00:42:23,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=382228.0, ans=0.125 2024-09-24 00:42:24,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=382228.0, ans=0.125 2024-09-24 00:42:28,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=382228.0, ans=0.125 2024-09-24 00:42:40,201 INFO [train.py:1198] (0/4) Epoch 22, batch 100, loss[loss=0.2246, ctc_loss=0.1513, cr_loss=0.3666, over 17226.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1446, cr_loss=0.3627, over 1324905.83 frames. ], batch size: 50, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:42:45,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=382274.6666666667, ans=0.05 2024-09-24 00:43:04,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=382321.3333333333, ans=0.0 2024-09-24 00:43:05,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=382321.3333333333, ans=0.1 2024-09-24 00:43:13,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382368.0, ans=0.1 2024-09-24 00:43:59,822 INFO [train.py:1198] (0/4) Epoch 22, batch 150, loss[loss=0.201, ctc_loss=0.1317, cr_loss=0.3468, over 17009.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1411, cr_loss=0.3566, over 1781950.40 frames. ], batch size: 51, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:44:32,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=382554.6666666667, ans=15.0 2024-09-24 00:44:55,451 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.262e+02 1.352e+02 1.515e+02 2.166e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-24 00:44:57,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=382648.0, ans=10.0 2024-09-24 00:45:09,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=382694.6666666667, ans=0.0 2024-09-24 00:45:22,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.87 vs. limit=10.0 2024-09-24 00:45:26,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=382694.6666666667, ans=0.04949747468305833 2024-09-24 00:45:29,262 INFO [train.py:1198] (0/4) Epoch 22, batch 200, loss[loss=0.2112, ctc_loss=0.1412, cr_loss=0.35, over 16889.00 frames. ], tot_loss[loss=0.2139, ctc_loss=0.142, cr_loss=0.3594, over 2130047.75 frames. ], batch size: 58, lr: 5.51e-03, grad_scale: 32.0 2024-09-24 00:46:11,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-09-24 00:46:36,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=382928.0, ans=0.025 2024-09-24 00:46:48,689 INFO [train.py:1198] (0/4) Epoch 22, batch 250, loss[loss=0.1979, ctc_loss=0.1286, cr_loss=0.3467, over 17312.00 frames. ], tot_loss[loss=0.2133, ctc_loss=0.1416, cr_loss=0.3584, over 2396843.55 frames. ], batch size: 46, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:47:01,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=382974.6666666667, ans=0.125 2024-09-24 00:47:08,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-24 00:47:10,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=383021.3333333333, ans=0.125 2024-09-24 00:47:41,221 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.260e+02 1.348e+02 1.458e+02 2.828e+02, threshold=2.696e+02, percent-clipped=1.0 2024-09-24 00:47:41,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=383114.6666666667, ans=0.0 2024-09-24 00:47:48,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383114.6666666667, ans=0.125 2024-09-24 00:47:57,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-09-24 00:48:08,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=383161.3333333333, ans=0.1 2024-09-24 00:48:11,509 INFO [train.py:1198] (0/4) Epoch 22, batch 300, loss[loss=0.2315, ctc_loss=0.1535, cr_loss=0.3899, over 16878.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1419, cr_loss=0.359, over 2610725.51 frames. ], batch size: 58, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:48:13,384 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:48:26,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=383254.6666666667, ans=0.2 2024-09-24 00:48:34,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=383254.6666666667, ans=0.125 2024-09-24 00:48:37,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383254.6666666667, ans=0.1 2024-09-24 00:49:15,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=383348.0, ans=0.2 2024-09-24 00:49:27,935 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:49:29,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-09-24 00:49:37,076 INFO [train.py:1198] (0/4) Epoch 22, batch 350, loss[loss=0.2204, ctc_loss=0.1418, cr_loss=0.393, over 17032.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1408, cr_loss=0.3578, over 2777164.27 frames. ], batch size: 52, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:49:39,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2024-09-24 00:49:40,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=383441.3333333333, ans=0.0 2024-09-24 00:50:28,763 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.255e+02 1.333e+02 1.486e+02 2.174e+02, threshold=2.666e+02, percent-clipped=0.0 2024-09-24 00:50:29,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2024-09-24 00:50:32,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=383581.3333333333, ans=0.0 2024-09-24 00:50:34,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2024-09-24 00:50:50,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=383628.0, ans=0.025 2024-09-24 00:50:59,532 INFO [train.py:1198] (0/4) Epoch 22, batch 400, loss[loss=0.1948, ctc_loss=0.1261, cr_loss=0.3438, over 16353.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1405, cr_loss=0.3564, over 2913533.58 frames. ], batch size: 36, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:51:07,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=383674.6666666667, ans=0.125 2024-09-24 00:51:14,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=383721.3333333333, ans=0.125 2024-09-24 00:51:20,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=383721.3333333333, ans=0.0 2024-09-24 00:51:28,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=383721.3333333333, ans=0.5 2024-09-24 00:51:28,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=22.5 2024-09-24 00:51:55,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=383814.6666666667, ans=0.1 2024-09-24 00:52:02,342 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 00:52:08,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=383861.3333333333, ans=0.125 2024-09-24 00:52:14,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=383861.3333333333, ans=0.025 2024-09-24 00:52:19,400 INFO [train.py:1198] (0/4) Epoch 22, batch 450, loss[loss=0.1955, ctc_loss=0.128, cr_loss=0.3372, over 17102.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1396, cr_loss=0.3554, over 3020155.78 frames. ], batch size: 40, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:52:49,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383954.6666666667, ans=0.125 2024-09-24 00:53:09,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2024-09-24 00:53:11,774 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.289e+02 1.376e+02 1.526e+02 3.562e+02, threshold=2.753e+02, percent-clipped=1.0 2024-09-24 00:53:41,935 INFO [train.py:1198] (0/4) Epoch 22, batch 500, loss[loss=0.1938, ctc_loss=0.1244, cr_loss=0.347, over 17210.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.1396, cr_loss=0.3553, over 3096615.86 frames. ], batch size: 47, lr: 5.50e-03, grad_scale: 32.0 2024-09-24 00:54:33,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=384281.3333333333, ans=0.0 2024-09-24 00:54:34,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2024-09-24 00:54:38,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=384281.3333333333, ans=0.125 2024-09-24 00:54:40,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=384281.3333333333, ans=0.125 2024-09-24 00:54:40,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=384281.3333333333, ans=0.0 2024-09-24 00:54:51,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=384328.0, ans=0.0 2024-09-24 00:55:07,118 INFO [train.py:1198] (0/4) Epoch 22, batch 550, loss[loss=0.2067, ctc_loss=0.1362, cr_loss=0.3528, over 17316.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1395, cr_loss=0.3555, over 3156165.12 frames. ], batch size: 51, lr: 5.49e-03, grad_scale: 32.0 2024-09-24 00:55:14,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.76 vs. limit=10.0 2024-09-24 00:55:20,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.12 vs. limit=6.0 2024-09-24 00:55:36,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2024-09-24 00:55:59,357 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.258e+02 1.357e+02 1.512e+02 2.238e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-24 00:56:16,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=384561.3333333333, ans=0.0 2024-09-24 00:56:20,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=384561.3333333333, ans=0.125 2024-09-24 00:56:30,182 INFO [train.py:1198] (0/4) Epoch 22, batch 600, loss[loss=0.1759, ctc_loss=0.1151, cr_loss=0.3041, over 17186.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1398, cr_loss=0.3554, over 3207235.78 frames. ], batch size: 41, lr: 5.49e-03, grad_scale: 32.0 2024-09-24 00:56:57,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=384654.6666666667, ans=0.1 2024-09-24 00:57:00,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=384701.3333333333, ans=0.125 2024-09-24 00:57:01,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2024-09-24 00:57:41,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=384794.6666666667, ans=0.125 2024-09-24 00:57:47,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=12.0 2024-09-24 00:57:52,483 INFO [train.py:1198] (0/4) Epoch 22, batch 650, loss[loss=0.2125, ctc_loss=0.1412, cr_loss=0.3565, over 17295.00 frames. ], tot_loss[loss=0.2123, ctc_loss=0.1409, cr_loss=0.3572, over 3234620.10 frames. ], batch size: 46, lr: 5.49e-03, grad_scale: 16.0 2024-09-24 00:58:43,615 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.285e+02 1.401e+02 1.551e+02 2.497e+02, threshold=2.802e+02, percent-clipped=0.0 2024-09-24 00:58:43,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=384981.3333333333, ans=0.1 2024-09-24 00:58:54,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.23 vs. limit=5.0 2024-09-24 00:59:00,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=385028.0, ans=0.125 2024-09-24 00:59:15,117 INFO [train.py:1198] (0/4) Epoch 22, batch 700, loss[loss=0.1838, ctc_loss=0.1193, cr_loss=0.3222, over 16940.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1416, cr_loss=0.3586, over 3259208.50 frames. ], batch size: 42, lr: 5.49e-03, grad_scale: 16.0 2024-09-24 00:59:42,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=385121.3333333333, ans=0.125 2024-09-24 00:59:43,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=385121.3333333333, ans=0.2 2024-09-24 01:00:01,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=385168.0, ans=0.0 2024-09-24 01:00:10,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.46 vs. limit=10.0 2024-09-24 01:00:32,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2024-09-24 01:00:33,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2024-09-24 01:00:40,682 INFO [train.py:1198] (0/4) Epoch 22, batch 750, loss[loss=0.1813, ctc_loss=0.1182, cr_loss=0.3156, over 17085.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.1408, cr_loss=0.3569, over 3286896.09 frames. ], batch size: 43, lr: 5.49e-03, grad_scale: 8.0 2024-09-24 01:00:47,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=385308.0, ans=0.0 2024-09-24 01:00:48,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=385308.0, ans=0.125 2024-09-24 01:01:06,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-09-24 01:01:14,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=385401.3333333333, ans=0.125 2024-09-24 01:01:23,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=385401.3333333333, ans=0.125 2024-09-24 01:01:33,050 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.232e+02 1.335e+02 1.516e+02 2.427e+02, threshold=2.671e+02, percent-clipped=0.0 2024-09-24 01:01:35,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=385448.0, ans=0.0 2024-09-24 01:01:39,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=385448.0, ans=0.0 2024-09-24 01:01:45,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.43 vs. limit=22.5 2024-09-24 01:01:47,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=385494.6666666667, ans=0.0 2024-09-24 01:01:57,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=385494.6666666667, ans=0.125 2024-09-24 01:02:00,330 INFO [train.py:1198] (0/4) Epoch 22, batch 800, loss[loss=0.1839, ctc_loss=0.1209, cr_loss=0.3151, over 17261.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1398, cr_loss=0.3552, over 3303848.66 frames. ], batch size: 44, lr: 5.49e-03, grad_scale: 16.0 2024-09-24 01:02:16,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=22.5 2024-09-24 01:02:49,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=385681.3333333333, ans=0.125 2024-09-24 01:03:09,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=385728.0, ans=0.125 2024-09-24 01:03:15,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=385728.0, ans=0.125 2024-09-24 01:03:23,190 INFO [train.py:1198] (0/4) Epoch 22, batch 850, loss[loss=0.1912, ctc_loss=0.1255, cr_loss=0.3288, over 17215.00 frames. ], tot_loss[loss=0.2115, ctc_loss=0.1402, cr_loss=0.3566, over 3322071.24 frames. ], batch size: 41, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:03:41,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=385821.3333333333, ans=0.2 2024-09-24 01:03:47,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=385821.3333333333, ans=0.125 2024-09-24 01:03:55,892 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:04:18,870 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.270e+02 1.386e+02 1.514e+02 2.172e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-24 01:04:25,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=385914.6666666667, ans=0.125 2024-09-24 01:04:33,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=385961.3333333333, ans=0.0 2024-09-24 01:04:48,969 INFO [train.py:1198] (0/4) Epoch 22, batch 900, loss[loss=0.177, ctc_loss=0.1122, cr_loss=0.324, over 17113.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1406, cr_loss=0.3578, over 3329081.14 frames. ], batch size: 40, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:04:56,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-09-24 01:05:00,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=386008.0, ans=0.0 2024-09-24 01:05:10,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=386054.6666666667, ans=15.0 2024-09-24 01:05:14,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=386054.6666666667, ans=0.2 2024-09-24 01:05:18,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=386054.6666666667, ans=0.125 2024-09-24 01:05:34,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2024-09-24 01:05:40,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=386148.0, ans=0.125 2024-09-24 01:05:46,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=386148.0, ans=0.125 2024-09-24 01:06:02,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=22.5 2024-09-24 01:06:11,838 INFO [train.py:1198] (0/4) Epoch 22, batch 950, loss[loss=0.2411, ctc_loss=0.1686, cr_loss=0.3629, over 11485.00 frames. ], tot_loss[loss=0.212, ctc_loss=0.1404, cr_loss=0.358, over 3340441.22 frames. ], batch size: 123, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:07:03,757 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.224e+02 1.313e+02 1.395e+02 1.890e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-24 01:07:31,055 INFO [train.py:1198] (0/4) Epoch 22, batch 1000, loss[loss=0.2497, ctc_loss=0.1691, cr_loss=0.403, over 16028.00 frames. ], tot_loss[loss=0.2115, ctc_loss=0.1401, cr_loss=0.3574, over 3348265.54 frames. ], batch size: 74, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:07:37,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=386474.6666666667, ans=0.2 2024-09-24 01:08:08,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=386568.0, ans=0.125 2024-09-24 01:08:33,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=12.0 2024-09-24 01:08:53,200 INFO [train.py:1198] (0/4) Epoch 22, batch 1050, loss[loss=0.261, ctc_loss=0.1868, cr_loss=0.3709, over 11679.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1408, cr_loss=0.3579, over 3337825.82 frames. ], batch size: 123, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:09:35,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-09-24 01:09:50,909 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.253e+02 1.357e+02 1.506e+02 3.378e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-24 01:09:57,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386848.0, ans=0.1 2024-09-24 01:09:58,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2024-09-24 01:10:20,570 INFO [train.py:1198] (0/4) Epoch 22, batch 1100, loss[loss=0.2371, ctc_loss=0.1586, cr_loss=0.3923, over 17096.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1412, cr_loss=0.3582, over 3341210.34 frames. ], batch size: 49, lr: 5.48e-03, grad_scale: 16.0 2024-09-24 01:10:35,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=386988.0, ans=0.0 2024-09-24 01:10:40,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=386988.0, ans=0.125 2024-09-24 01:10:40,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=386988.0, ans=0.125 2024-09-24 01:10:45,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-09-24 01:10:54,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387034.6666666667, ans=0.1 2024-09-24 01:11:32,672 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:11:40,358 INFO [train.py:1198] (0/4) Epoch 22, batch 1150, loss[loss=0.2071, ctc_loss=0.1371, cr_loss=0.3502, over 17359.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1415, cr_loss=0.3583, over 3336034.66 frames. ], batch size: 48, lr: 5.47e-03, grad_scale: 16.0 2024-09-24 01:11:43,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=387174.6666666667, ans=0.125 2024-09-24 01:11:50,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=387174.6666666667, ans=0.0 2024-09-24 01:12:35,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2024-09-24 01:12:36,145 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.223e+02 1.330e+02 1.443e+02 2.592e+02, threshold=2.661e+02, percent-clipped=0.0 2024-09-24 01:13:03,660 INFO [train.py:1198] (0/4) Epoch 22, batch 1200, loss[loss=0.2699, ctc_loss=0.1944, cr_loss=0.3775, over 11418.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.1402, cr_loss=0.3559, over 3336967.48 frames. ], batch size: 123, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:13:13,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=387408.0, ans=0.035 2024-09-24 01:13:48,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=387501.3333333333, ans=0.09899494936611666 2024-09-24 01:14:01,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=387548.0, ans=0.0 2024-09-24 01:14:18,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=387594.6666666667, ans=0.0 2024-09-24 01:14:25,965 INFO [train.py:1198] (0/4) Epoch 22, batch 1250, loss[loss=0.2529, ctc_loss=0.1723, cr_loss=0.4029, over 14865.00 frames. ], tot_loss[loss=0.211, ctc_loss=0.1399, cr_loss=0.3558, over 3336641.49 frames. ], batch size: 89, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:14:30,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=387641.3333333333, ans=0.025 2024-09-24 01:14:38,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=387641.3333333333, ans=0.0 2024-09-24 01:14:46,897 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-24 01:14:48,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=387688.0, ans=0.125 2024-09-24 01:14:59,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=387734.6666666667, ans=0.125 2024-09-24 01:15:16,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=387734.6666666667, ans=0.2 2024-09-24 01:15:23,793 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.292e+02 1.416e+02 1.548e+02 3.016e+02, threshold=2.831e+02, percent-clipped=1.0 2024-09-24 01:15:46,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=387828.0, ans=0.09899494936611666 2024-09-24 01:15:50,986 INFO [train.py:1198] (0/4) Epoch 22, batch 1300, loss[loss=0.27, ctc_loss=0.1898, cr_loss=0.401, over 12111.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.1401, cr_loss=0.3559, over 3343087.94 frames. ], batch size: 123, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:16:18,468 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:16:23,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-09-24 01:16:56,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388061.3333333333, ans=0.1 2024-09-24 01:16:57,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=388061.3333333333, ans=0.125 2024-09-24 01:16:57,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=388061.3333333333, ans=0.2 2024-09-24 01:17:10,178 INFO [train.py:1198] (0/4) Epoch 22, batch 1350, loss[loss=0.2053, ctc_loss=0.1375, cr_loss=0.3393, over 17286.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.1394, cr_loss=0.354, over 3353972.63 frames. ], batch size: 49, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:17:26,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=388108.0, ans=0.125 2024-09-24 01:18:04,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2024-09-24 01:18:05,574 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.287e+02 1.390e+02 1.521e+02 2.749e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-24 01:18:32,819 INFO [train.py:1198] (0/4) Epoch 22, batch 1400, loss[loss=0.1991, ctc_loss=0.1321, cr_loss=0.3354, over 17098.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1397, cr_loss=0.3545, over 3353007.03 frames. ], batch size: 43, lr: 5.47e-03, grad_scale: 32.0 2024-09-24 01:18:36,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=388341.3333333333, ans=0.0 2024-09-24 01:18:37,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=388341.3333333333, ans=0.0 2024-09-24 01:19:32,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=388481.3333333333, ans=0.2 2024-09-24 01:19:38,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=388481.3333333333, ans=0.125 2024-09-24 01:19:54,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388528.0, ans=0.1 2024-09-24 01:19:57,777 INFO [train.py:1198] (0/4) Epoch 22, batch 1450, loss[loss=0.215, ctc_loss=0.141, cr_loss=0.37, over 17225.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1395, cr_loss=0.3548, over 3355147.94 frames. ], batch size: 50, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:20:12,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2024-09-24 01:20:31,214 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2024-09-24 01:20:41,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=388668.0, ans=0.025 2024-09-24 01:20:48,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=388714.6666666667, ans=0.0 2024-09-24 01:20:51,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=388714.6666666667, ans=0.95 2024-09-24 01:20:52,838 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.236e+02 1.340e+02 1.484e+02 2.143e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-24 01:21:05,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2024-09-24 01:21:20,286 INFO [train.py:1198] (0/4) Epoch 22, batch 1500, loss[loss=0.2128, ctc_loss=0.1388, cr_loss=0.3698, over 17224.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1398, cr_loss=0.3549, over 3363663.92 frames. ], batch size: 47, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:21:57,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2024-09-24 01:22:42,365 INFO [train.py:1198] (0/4) Epoch 22, batch 1550, loss[loss=0.1666, ctc_loss=0.1071, cr_loss=0.2977, over 16948.00 frames. ], tot_loss[loss=0.2115, ctc_loss=0.1403, cr_loss=0.3562, over 3369872.88 frames. ], batch size: 42, lr: 5.46e-03, grad_scale: 16.0 2024-09-24 01:22:42,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=389041.3333333333, ans=0.125 2024-09-24 01:22:53,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=389041.3333333333, ans=0.0 2024-09-24 01:23:09,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389088.0, ans=0.1 2024-09-24 01:23:16,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=389134.6666666667, ans=0.125 2024-09-24 01:23:29,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=389181.3333333333, ans=0.125 2024-09-24 01:23:37,031 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.256e+02 1.352e+02 1.468e+02 3.649e+02, threshold=2.704e+02, percent-clipped=1.0 2024-09-24 01:23:37,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=389181.3333333333, ans=0.0 2024-09-24 01:24:02,776 INFO [train.py:1198] (0/4) Epoch 22, batch 1600, loss[loss=0.2626, ctc_loss=0.175, cr_loss=0.4381, over 17220.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1412, cr_loss=0.3581, over 3368715.97 frames. ], batch size: 55, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:24:10,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389274.6666666667, ans=0.1 2024-09-24 01:24:48,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389368.0, ans=0.1 2024-09-24 01:24:51,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=389368.0, ans=0.125 2024-09-24 01:24:59,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=389414.6666666667, ans=0.0 2024-09-24 01:25:09,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=389414.6666666667, ans=0.125 2024-09-24 01:25:29,443 INFO [train.py:1198] (0/4) Epoch 22, batch 1650, loss[loss=0.2245, ctc_loss=0.152, cr_loss=0.3624, over 17241.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.1412, cr_loss=0.3578, over 3369741.08 frames. ], batch size: 55, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:26:01,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=389601.3333333333, ans=0.05 2024-09-24 01:26:11,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=389601.3333333333, ans=0.125 2024-09-24 01:26:23,522 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.218e+02 1.295e+02 1.408e+02 1.987e+02, threshold=2.589e+02, percent-clipped=0.0 2024-09-24 01:26:42,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2024-09-24 01:26:44,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=389694.6666666667, ans=0.0 2024-09-24 01:26:49,450 INFO [train.py:1198] (0/4) Epoch 22, batch 1700, loss[loss=0.2402, ctc_loss=0.1609, cr_loss=0.3965, over 17100.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1408, cr_loss=0.3569, over 3369828.73 frames. ], batch size: 49, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:27:19,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=389788.0, ans=0.5 2024-09-24 01:27:57,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.95 vs. limit=6.0 2024-09-24 01:28:01,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=389928.0, ans=0.0 2024-09-24 01:28:12,340 INFO [train.py:1198] (0/4) Epoch 22, batch 1750, loss[loss=0.2147, ctc_loss=0.1415, cr_loss=0.366, over 17011.00 frames. ], tot_loss[loss=0.2136, ctc_loss=0.1418, cr_loss=0.359, over 3359606.15 frames. ], batch size: 53, lr: 5.46e-03, grad_scale: 32.0 2024-09-24 01:28:32,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-09-24 01:28:39,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=390021.3333333333, ans=0.09899494936611666 2024-09-24 01:28:50,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=390068.0, ans=0.0 2024-09-24 01:29:09,553 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.027e+02 1.289e+02 1.384e+02 1.529e+02 2.458e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-24 01:29:30,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2024-09-24 01:29:36,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=390208.0, ans=0.2 2024-09-24 01:29:37,593 INFO [train.py:1198] (0/4) Epoch 22, batch 1800, loss[loss=0.1975, ctc_loss=0.1297, cr_loss=0.339, over 17259.00 frames. ], tot_loss[loss=0.2137, ctc_loss=0.1419, cr_loss=0.3592, over 3348455.07 frames. ], batch size: 44, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:29:41,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=390208.0, ans=0.5 2024-09-24 01:29:41,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=390208.0, ans=0.0 2024-09-24 01:29:42,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=390208.0, ans=0.2 2024-09-24 01:30:00,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=390254.6666666667, ans=0.0 2024-09-24 01:30:28,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=390348.0, ans=0.125 2024-09-24 01:30:36,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=390348.0, ans=0.1 2024-09-24 01:30:37,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-09-24 01:31:00,216 INFO [train.py:1198] (0/4) Epoch 22, batch 1850, loss[loss=0.2485, ctc_loss=0.1647, cr_loss=0.4191, over 17022.00 frames. ], tot_loss[loss=0.2141, ctc_loss=0.1421, cr_loss=0.3597, over 3346410.53 frames. ], batch size: 52, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:31:05,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=390441.3333333333, ans=0.125 2024-09-24 01:31:13,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=390441.3333333333, ans=0.125 2024-09-24 01:31:25,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=390488.0, ans=0.125 2024-09-24 01:31:29,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=390488.0, ans=0.2 2024-09-24 01:31:35,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=390534.6666666667, ans=0.0 2024-09-24 01:31:54,069 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.256e+02 1.338e+02 1.460e+02 2.391e+02, threshold=2.675e+02, percent-clipped=0.0 2024-09-24 01:32:16,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2024-09-24 01:32:21,842 INFO [train.py:1198] (0/4) Epoch 22, batch 1900, loss[loss=0.2101, ctc_loss=0.1433, cr_loss=0.3338, over 17301.00 frames. ], tot_loss[loss=0.213, ctc_loss=0.1415, cr_loss=0.3577, over 3338873.54 frames. ], batch size: 51, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:32:31,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=390674.6666666667, ans=0.125 2024-09-24 01:32:54,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=390768.0, ans=0.125 2024-09-24 01:32:54,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=390768.0, ans=0.09899494936611666 2024-09-24 01:33:08,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=390814.6666666667, ans=0.025 2024-09-24 01:33:08,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=390814.6666666667, ans=0.125 2024-09-24 01:33:12,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=22.5 2024-09-24 01:33:16,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=390814.6666666667, ans=0.0 2024-09-24 01:33:22,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=390814.6666666667, ans=0.125 2024-09-24 01:33:41,708 INFO [train.py:1198] (0/4) Epoch 22, batch 1950, loss[loss=0.2431, ctc_loss=0.1617, cr_loss=0.4072, over 16989.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.141, cr_loss=0.3575, over 3347507.27 frames. ], batch size: 53, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:34:41,243 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.293e+02 1.356e+02 1.523e+02 3.316e+02, threshold=2.712e+02, percent-clipped=1.0 2024-09-24 01:34:49,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=391094.6666666667, ans=0.125 2024-09-24 01:34:50,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-09-24 01:34:53,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=391094.6666666667, ans=0.125 2024-09-24 01:35:09,058 INFO [train.py:1198] (0/4) Epoch 22, batch 2000, loss[loss=0.2112, ctc_loss=0.1389, cr_loss=0.3615, over 17311.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1409, cr_loss=0.3574, over 3342873.57 frames. ], batch size: 49, lr: 5.45e-03, grad_scale: 32.0 2024-09-24 01:35:15,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=391141.3333333333, ans=0.125 2024-09-24 01:35:31,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=391188.0, ans=0.0 2024-09-24 01:35:53,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=391234.6666666667, ans=0.2 2024-09-24 01:36:22,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391328.0, ans=0.1 2024-09-24 01:36:28,617 INFO [train.py:1198] (0/4) Epoch 22, batch 2050, loss[loss=0.1731, ctc_loss=0.1135, cr_loss=0.298, over 16664.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1407, cr_loss=0.3571, over 3337742.18 frames. ], batch size: 37, lr: 5.45e-03, grad_scale: 16.0 2024-09-24 01:37:27,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.255e+02 1.364e+02 1.443e+02 3.272e+02, threshold=2.728e+02, percent-clipped=1.0 2024-09-24 01:37:33,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=391561.3333333333, ans=0.125 2024-09-24 01:37:40,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=391561.3333333333, ans=0.125 2024-09-24 01:37:51,064 INFO [train.py:1198] (0/4) Epoch 22, batch 2100, loss[loss=0.2517, ctc_loss=0.1775, cr_loss=0.3707, over 12069.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1406, cr_loss=0.3574, over 3335858.60 frames. ], batch size: 123, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:38:39,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391748.0, ans=0.1 2024-09-24 01:38:46,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=391748.0, ans=0.0 2024-09-24 01:38:50,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=391748.0, ans=0.0 2024-09-24 01:39:01,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=391794.6666666667, ans=10.0 2024-09-24 01:39:16,045 INFO [train.py:1198] (0/4) Epoch 22, batch 2150, loss[loss=0.1832, ctc_loss=0.1173, cr_loss=0.3292, over 17053.00 frames. ], tot_loss[loss=0.2117, ctc_loss=0.1403, cr_loss=0.357, over 3346032.37 frames. ], batch size: 39, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:39:16,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=12.0 2024-09-24 01:39:17,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.40 vs. limit=10.0 2024-09-24 01:39:49,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=391934.6666666667, ans=0.125 2024-09-24 01:40:11,627 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-84000.pt 2024-09-24 01:40:16,908 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.257e+02 1.348e+02 1.506e+02 2.274e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 01:40:26,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=392028.0, ans=0.0 2024-09-24 01:40:34,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=392028.0, ans=0.0 2024-09-24 01:40:38,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-09-24 01:40:40,621 INFO [train.py:1198] (0/4) Epoch 22, batch 2200, loss[loss=0.2221, ctc_loss=0.1494, cr_loss=0.3635, over 16909.00 frames. ], tot_loss[loss=0.2131, ctc_loss=0.1413, cr_loss=0.3587, over 3342507.72 frames. ], batch size: 58, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:40:46,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-09-24 01:40:49,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-24 01:41:04,806 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:41:09,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=392121.3333333333, ans=0.0 2024-09-24 01:41:32,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=392214.6666666667, ans=0.0 2024-09-24 01:42:03,260 INFO [train.py:1198] (0/4) Epoch 22, batch 2250, loss[loss=0.2254, ctc_loss=0.15, cr_loss=0.3773, over 16708.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1402, cr_loss=0.3571, over 3351917.63 frames. ], batch size: 61, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:42:04,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-09-24 01:42:22,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=392354.6666666667, ans=0.0 2024-09-24 01:42:32,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392354.6666666667, ans=0.1 2024-09-24 01:42:59,349 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.230e+02 1.321e+02 1.423e+02 2.235e+02, threshold=2.642e+02, percent-clipped=0.0 2024-09-24 01:43:01,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=392448.0, ans=0.0 2024-09-24 01:43:03,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=392448.0, ans=0.125 2024-09-24 01:43:23,764 INFO [train.py:1198] (0/4) Epoch 22, batch 2300, loss[loss=0.2045, ctc_loss=0.1364, cr_loss=0.3404, over 17154.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1404, cr_loss=0.3569, over 3352486.18 frames. ], batch size: 48, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:43:24,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2024-09-24 01:44:31,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=392728.0, ans=0.95 2024-09-24 01:44:33,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=392728.0, ans=0.0 2024-09-24 01:44:47,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=392774.6666666667, ans=0.0 2024-09-24 01:44:47,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392774.6666666667, ans=0.1 2024-09-24 01:44:51,576 INFO [train.py:1198] (0/4) Epoch 22, batch 2350, loss[loss=0.25, ctc_loss=0.1726, cr_loss=0.3866, over 11715.00 frames. ], tot_loss[loss=0.2119, ctc_loss=0.1406, cr_loss=0.3566, over 3345957.48 frames. ], batch size: 123, lr: 5.44e-03, grad_scale: 16.0 2024-09-24 01:44:51,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=392774.6666666667, ans=0.125 2024-09-24 01:45:20,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=392821.3333333333, ans=0.125 2024-09-24 01:45:22,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=15.0 2024-09-24 01:45:24,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=392868.0, ans=0.125 2024-09-24 01:45:37,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=392914.6666666667, ans=0.125 2024-09-24 01:45:47,291 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.237e+02 1.319e+02 1.405e+02 2.078e+02, threshold=2.638e+02, percent-clipped=0.0 2024-09-24 01:45:50,877 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:46:11,758 INFO [train.py:1198] (0/4) Epoch 22, batch 2400, loss[loss=0.227, ctc_loss=0.1504, cr_loss=0.3833, over 17216.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1403, cr_loss=0.3565, over 3353994.60 frames. ], batch size: 50, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:46:18,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=393008.0, ans=0.125 2024-09-24 01:46:29,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=393054.6666666667, ans=0.125 2024-09-24 01:46:47,318 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:46:52,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=393101.3333333333, ans=0.0 2024-09-24 01:47:11,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=393148.0, ans=0.125 2024-09-24 01:47:14,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=393148.0, ans=0.025 2024-09-24 01:47:15,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=393148.0, ans=0.5 2024-09-24 01:47:34,555 INFO [train.py:1198] (0/4) Epoch 22, batch 2450, loss[loss=0.1947, ctc_loss=0.1272, cr_loss=0.3377, over 17298.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.1401, cr_loss=0.3562, over 3357506.20 frames. ], batch size: 46, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:47:36,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393241.3333333333, ans=0.1 2024-09-24 01:48:02,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=393288.0, ans=0.125 2024-09-24 01:48:08,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393334.6666666667, ans=0.1 2024-09-24 01:48:23,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=393381.3333333333, ans=15.0 2024-09-24 01:48:30,720 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.273e+02 1.360e+02 1.558e+02 2.301e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-24 01:48:43,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=393428.0, ans=0.125 2024-09-24 01:48:57,401 INFO [train.py:1198] (0/4) Epoch 22, batch 2500, loss[loss=0.2085, ctc_loss=0.1364, cr_loss=0.3605, over 16960.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1403, cr_loss=0.3567, over 3356257.33 frames. ], batch size: 58, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:49:17,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=393521.3333333333, ans=0.025 2024-09-24 01:49:28,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=393521.3333333333, ans=0.1 2024-09-24 01:49:38,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-24 01:50:02,992 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:50:20,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=393708.0, ans=10.0 2024-09-24 01:50:22,190 INFO [train.py:1198] (0/4) Epoch 22, batch 2550, loss[loss=0.2302, ctc_loss=0.1547, cr_loss=0.3779, over 17219.00 frames. ], tot_loss[loss=0.2129, ctc_loss=0.1413, cr_loss=0.358, over 3350383.65 frames. ], batch size: 47, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:50:35,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=393708.0, ans=0.125 2024-09-24 01:50:59,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=393801.3333333333, ans=0.2 2024-09-24 01:51:04,393 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:51:18,854 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.258e+02 1.341e+02 1.475e+02 2.313e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-24 01:51:23,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393848.0, ans=0.1 2024-09-24 01:51:24,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2024-09-24 01:51:43,208 INFO [train.py:1198] (0/4) Epoch 22, batch 2600, loss[loss=0.1454, ctc_loss=0.09116, cr_loss=0.2714, over 17278.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.1408, cr_loss=0.3569, over 3350659.61 frames. ], batch size: 42, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:51:49,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=393941.3333333333, ans=0.0 2024-09-24 01:51:50,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=393941.3333333333, ans=0.1 2024-09-24 01:51:52,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=393941.3333333333, ans=22.5 2024-09-24 01:51:54,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=393941.3333333333, ans=0.0 2024-09-24 01:52:13,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=393988.0, ans=0.0 2024-09-24 01:52:15,141 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:52:20,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=394034.6666666667, ans=0.125 2024-09-24 01:52:40,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=394081.3333333333, ans=0.2 2024-09-24 01:53:06,135 INFO [train.py:1198] (0/4) Epoch 22, batch 2650, loss[loss=0.2443, ctc_loss=0.1632, cr_loss=0.4057, over 17025.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.1402, cr_loss=0.3562, over 3356274.86 frames. ], batch size: 53, lr: 5.43e-03, grad_scale: 32.0 2024-09-24 01:53:28,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394221.3333333333, ans=0.1 2024-09-24 01:54:07,322 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.255e+02 1.320e+02 1.434e+02 1.908e+02, threshold=2.640e+02, percent-clipped=0.0 2024-09-24 01:54:12,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=394314.6666666667, ans=0.2 2024-09-24 01:54:31,191 INFO [train.py:1198] (0/4) Epoch 22, batch 2700, loss[loss=0.21, ctc_loss=0.14, cr_loss=0.35, over 17358.00 frames. ], tot_loss[loss=0.2125, ctc_loss=0.1409, cr_loss=0.3578, over 3361651.59 frames. ], batch size: 48, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 01:54:37,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=394408.0, ans=0.125 2024-09-24 01:54:41,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=394408.0, ans=10.0 2024-09-24 01:54:42,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=394408.0, ans=0.1 2024-09-24 01:55:05,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=394501.3333333333, ans=0.2 2024-09-24 01:55:06,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=22.5 2024-09-24 01:55:21,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=394548.0, ans=0.125 2024-09-24 01:55:23,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=394548.0, ans=0.0 2024-09-24 01:55:34,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=394548.0, ans=0.125 2024-09-24 01:55:53,380 INFO [train.py:1198] (0/4) Epoch 22, batch 2750, loss[loss=0.2358, ctc_loss=0.1591, cr_loss=0.3835, over 16990.00 frames. ], tot_loss[loss=0.2124, ctc_loss=0.1409, cr_loss=0.3574, over 3351660.00 frames. ], batch size: 53, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 01:55:54,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2024-09-24 01:56:01,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=394641.3333333333, ans=0.125 2024-09-24 01:56:03,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=394641.3333333333, ans=0.0 2024-09-24 01:56:05,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2024-09-24 01:56:09,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=394688.0, ans=0.2 2024-09-24 01:56:22,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=394688.0, ans=0.025 2024-09-24 01:56:25,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=394734.6666666667, ans=0.2 2024-09-24 01:56:52,095 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.297e+02 1.428e+02 1.577e+02 2.482e+02, threshold=2.855e+02, percent-clipped=0.0 2024-09-24 01:57:11,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=15.0 2024-09-24 01:57:16,459 INFO [train.py:1198] (0/4) Epoch 22, batch 2800, loss[loss=0.1956, ctc_loss=0.1283, cr_loss=0.3368, over 16940.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.14, cr_loss=0.3567, over 3361508.68 frames. ], batch size: 42, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 01:57:23,573 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=22.5 2024-09-24 01:57:31,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=12.0 2024-09-24 01:58:07,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=395014.6666666667, ans=0.04949747468305833 2024-09-24 01:58:38,325 INFO [train.py:1198] (0/4) Epoch 22, batch 2850, loss[loss=0.1866, ctc_loss=0.1236, cr_loss=0.3147, over 17089.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1393, cr_loss=0.3558, over 3367168.78 frames. ], batch size: 43, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 01:58:40,336 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 01:58:59,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=395154.6666666667, ans=0.09899494936611666 2024-09-24 01:59:24,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=395201.3333333333, ans=0.125 2024-09-24 01:59:36,607 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.264e+02 1.351e+02 1.437e+02 2.135e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-24 01:59:37,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.47 vs. limit=10.0 2024-09-24 02:00:03,240 INFO [train.py:1198] (0/4) Epoch 22, batch 2900, loss[loss=0.2802, ctc_loss=0.1958, cr_loss=0.4222, over 11604.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1409, cr_loss=0.3585, over 3354800.98 frames. ], batch size: 123, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 02:00:06,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=395341.3333333333, ans=0.0 2024-09-24 02:00:12,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=395341.3333333333, ans=10.0 2024-09-24 02:00:17,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=395388.0, ans=0.125 2024-09-24 02:00:52,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=395481.3333333333, ans=0.125 2024-09-24 02:01:07,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=395528.0, ans=0.2 2024-09-24 02:01:23,079 INFO [train.py:1198] (0/4) Epoch 22, batch 2950, loss[loss=0.1944, ctc_loss=0.1276, cr_loss=0.3343, over 16955.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1414, cr_loss=0.3591, over 3349524.76 frames. ], batch size: 42, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 02:01:36,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395574.6666666667, ans=0.1 2024-09-24 02:01:49,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=395621.3333333333, ans=0.0 2024-09-24 02:01:51,395 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:02:20,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=395714.6666666667, ans=0.2 2024-09-24 02:02:21,884 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.266e+02 1.346e+02 1.444e+02 1.756e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 02:02:39,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=395761.3333333333, ans=0.0 2024-09-24 02:02:43,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=395808.0, ans=0.125 2024-09-24 02:02:45,310 INFO [train.py:1198] (0/4) Epoch 22, batch 3000, loss[loss=0.2042, ctc_loss=0.1365, cr_loss=0.3382, over 17169.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1406, cr_loss=0.3578, over 3351172.98 frames. ], batch size: 45, lr: 5.42e-03, grad_scale: 32.0 2024-09-24 02:02:45,311 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 02:03:00,740 INFO [train.py:1230] (0/4) Epoch 22, validation: loss=0.03869, ctc_loss=0.03869, cr_loss=8.188e-15, over 944034.00 frames. 2024-09-24 02:03:00,741 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 02:03:05,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=395808.0, ans=10.0 2024-09-24 02:03:05,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=395808.0, ans=0.04949747468305833 2024-09-24 02:03:21,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-24 02:03:23,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.51 vs. limit=10.0 2024-09-24 02:03:24,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-24 02:03:27,235 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:03:34,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2024-09-24 02:03:41,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=395901.3333333333, ans=0.125 2024-09-24 02:03:45,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=395948.0, ans=0.125 2024-09-24 02:03:50,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=395948.0, ans=0.125 2024-09-24 02:03:56,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=395948.0, ans=0.2 2024-09-24 02:03:56,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395948.0, ans=0.1 2024-09-24 02:03:56,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=395948.0, ans=0.125 2024-09-24 02:03:58,539 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2024-09-24 02:04:06,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=395994.6666666667, ans=0.1 2024-09-24 02:04:14,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-09-24 02:04:15,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=395994.6666666667, ans=0.2 2024-09-24 02:04:16,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395994.6666666667, ans=0.1 2024-09-24 02:04:16,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-24 02:04:19,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=396041.3333333333, ans=0.125 2024-09-24 02:04:21,131 INFO [train.py:1198] (0/4) Epoch 22, batch 3050, loss[loss=0.2449, ctc_loss=0.165, cr_loss=0.3992, over 17030.00 frames. ], tot_loss[loss=0.211, ctc_loss=0.1397, cr_loss=0.3561, over 3358777.54 frames. ], batch size: 56, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:04:22,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-09-24 02:04:29,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=396041.3333333333, ans=0.0 2024-09-24 02:04:35,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=396088.0, ans=0.125 2024-09-24 02:04:37,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-09-24 02:05:13,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.14 vs. limit=15.0 2024-09-24 02:05:15,411 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.242e+02 1.330e+02 1.474e+02 2.506e+02, threshold=2.661e+02, percent-clipped=0.0 2024-09-24 02:05:35,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=396228.0, ans=0.025 2024-09-24 02:05:41,006 INFO [train.py:1198] (0/4) Epoch 22, batch 3100, loss[loss=0.2151, ctc_loss=0.1434, cr_loss=0.3584, over 17305.00 frames. ], tot_loss[loss=0.2134, ctc_loss=0.1415, cr_loss=0.3594, over 3348779.87 frames. ], batch size: 51, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:05:45,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=15.0 2024-09-24 02:05:49,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=396274.6666666667, ans=0.125 2024-09-24 02:06:00,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=396321.3333333333, ans=0.125 2024-09-24 02:06:03,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2024-09-24 02:06:31,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=12.0 2024-09-24 02:06:59,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=396508.0, ans=0.125 2024-09-24 02:07:01,223 INFO [train.py:1198] (0/4) Epoch 22, batch 3150, loss[loss=0.1673, ctc_loss=0.1071, cr_loss=0.3008, over 17077.00 frames. ], tot_loss[loss=0.2128, ctc_loss=0.141, cr_loss=0.3586, over 3347770.15 frames. ], batch size: 40, lr: 5.41e-03, grad_scale: 16.0 2024-09-24 02:07:31,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=396601.3333333333, ans=0.0 2024-09-24 02:07:46,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396648.0, ans=0.1 2024-09-24 02:07:57,789 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.290e+02 1.399e+02 1.555e+02 2.844e+02, threshold=2.797e+02, percent-clipped=1.0 2024-09-24 02:08:13,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=396694.6666666667, ans=0.0 2024-09-24 02:08:18,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=396741.3333333333, ans=0.07 2024-09-24 02:08:19,587 INFO [train.py:1198] (0/4) Epoch 22, batch 3200, loss[loss=0.1854, ctc_loss=0.122, cr_loss=0.3171, over 16960.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.141, cr_loss=0.3582, over 3339426.47 frames. ], batch size: 42, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:08:37,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=396788.0, ans=0.125 2024-09-24 02:08:41,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396788.0, ans=0.1 2024-09-24 02:08:41,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396788.0, ans=0.1 2024-09-24 02:09:21,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=396928.0, ans=0.0 2024-09-24 02:09:35,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2024-09-24 02:09:37,798 INFO [train.py:1198] (0/4) Epoch 22, batch 3250, loss[loss=0.2154, ctc_loss=0.1467, cr_loss=0.3436, over 17080.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1395, cr_loss=0.3563, over 3348482.49 frames. ], batch size: 46, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:09:42,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=396974.6666666667, ans=0.0 2024-09-24 02:10:09,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=397068.0, ans=0.04949747468305833 2024-09-24 02:10:15,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=397068.0, ans=0.125 2024-09-24 02:10:16,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=397068.0, ans=0.125 2024-09-24 02:10:18,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=397068.0, ans=0.025 2024-09-24 02:10:23,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=397114.6666666667, ans=0.2 2024-09-24 02:10:33,680 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.276e+02 1.353e+02 1.572e+02 3.957e+02, threshold=2.706e+02, percent-clipped=1.0 2024-09-24 02:10:35,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2024-09-24 02:10:41,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=397161.3333333333, ans=0.2 2024-09-24 02:10:44,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=397161.3333333333, ans=0.2 2024-09-24 02:10:54,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=397208.0, ans=0.1 2024-09-24 02:10:55,244 INFO [train.py:1198] (0/4) Epoch 22, batch 3300, loss[loss=0.2388, ctc_loss=0.16, cr_loss=0.3943, over 17348.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.14, cr_loss=0.3568, over 3349929.70 frames. ], batch size: 48, lr: 5.41e-03, grad_scale: 32.0 2024-09-24 02:10:57,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2024-09-24 02:11:03,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=397208.0, ans=0.0 2024-09-24 02:11:09,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=397208.0, ans=0.0 2024-09-24 02:11:36,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=397301.3333333333, ans=0.125 2024-09-24 02:11:40,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=397301.3333333333, ans=0.5 2024-09-24 02:12:09,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2024-09-24 02:12:12,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=397394.6666666667, ans=0.95 2024-09-24 02:12:15,301 INFO [train.py:1198] (0/4) Epoch 22, batch 3350, loss[loss=0.2388, ctc_loss=0.1586, cr_loss=0.4011, over 17201.00 frames. ], tot_loss[loss=0.2118, ctc_loss=0.1403, cr_loss=0.3573, over 3354677.20 frames. ], batch size: 55, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:12:29,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=397488.0, ans=0.125 2024-09-24 02:12:40,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397488.0, ans=0.1 2024-09-24 02:12:54,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=397534.6666666667, ans=0.07 2024-09-24 02:12:56,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=397534.6666666667, ans=0.125 2024-09-24 02:12:56,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=397534.6666666667, ans=0.125 2024-09-24 02:12:56,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=397534.6666666667, ans=0.0 2024-09-24 02:13:00,758 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:13:02,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=397581.3333333333, ans=0.0 2024-09-24 02:13:11,463 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.283e+02 1.436e+02 1.658e+02 2.229e+02, threshold=2.872e+02, percent-clipped=0.0 2024-09-24 02:13:31,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=397674.6666666667, ans=0.025 2024-09-24 02:13:33,109 INFO [train.py:1198] (0/4) Epoch 22, batch 3400, loss[loss=0.2256, ctc_loss=0.1534, cr_loss=0.3612, over 16721.00 frames. ], tot_loss[loss=0.211, ctc_loss=0.1397, cr_loss=0.3563, over 3350667.40 frames. ], batch size: 61, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:13:39,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=397674.6666666667, ans=0.125 2024-09-24 02:13:40,616 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=22.5 2024-09-24 02:14:12,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=397768.0, ans=0.0 2024-09-24 02:14:53,442 INFO [train.py:1198] (0/4) Epoch 22, batch 3450, loss[loss=0.1953, ctc_loss=0.128, cr_loss=0.3366, over 17160.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1393, cr_loss=0.3563, over 3360042.49 frames. ], batch size: 48, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:15:11,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=22.5 2024-09-24 02:15:15,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=397954.6666666667, ans=0.04949747468305833 2024-09-24 02:15:17,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=397954.6666666667, ans=0.1 2024-09-24 02:15:43,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=8.0 2024-09-24 02:15:51,970 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.275e+02 1.378e+02 1.514e+02 2.011e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 02:15:53,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=398048.0, ans=0.125 2024-09-24 02:16:04,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=398094.6666666667, ans=10.0 2024-09-24 02:16:04,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=398094.6666666667, ans=0.125 2024-09-24 02:16:13,998 INFO [train.py:1198] (0/4) Epoch 22, batch 3500, loss[loss=0.1824, ctc_loss=0.1175, cr_loss=0.3242, over 17091.00 frames. ], tot_loss[loss=0.2101, ctc_loss=0.139, cr_loss=0.3557, over 3362391.07 frames. ], batch size: 43, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:16:17,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=398141.3333333333, ans=0.0 2024-09-24 02:16:26,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=398141.3333333333, ans=0.125 2024-09-24 02:16:34,730 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:16:58,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=398234.6666666667, ans=0.125 2024-09-24 02:17:11,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-09-24 02:17:17,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=398328.0, ans=0.125 2024-09-24 02:17:21,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=398328.0, ans=0.0 2024-09-24 02:17:23,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=398328.0, ans=0.125 2024-09-24 02:17:34,215 INFO [train.py:1198] (0/4) Epoch 22, batch 3550, loss[loss=0.2505, ctc_loss=0.1714, cr_loss=0.3954, over 11565.00 frames. ], tot_loss[loss=0.2098, ctc_loss=0.1389, cr_loss=0.3546, over 3360555.56 frames. ], batch size: 125, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:17:36,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=12.0 2024-09-24 02:17:37,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=398374.6666666667, ans=0.125 2024-09-24 02:18:31,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=398514.6666666667, ans=0.2 2024-09-24 02:18:32,111 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.257e+02 1.363e+02 1.494e+02 1.950e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-24 02:18:51,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-24 02:18:52,335 INFO [train.py:1198] (0/4) Epoch 22, batch 3600, loss[loss=0.2268, ctc_loss=0.1467, cr_loss=0.4004, over 17014.00 frames. ], tot_loss[loss=0.2088, ctc_loss=0.1381, cr_loss=0.3532, over 3368141.01 frames. ], batch size: 52, lr: 5.40e-03, grad_scale: 32.0 2024-09-24 02:18:52,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=398608.0, ans=0.05 2024-09-24 02:18:58,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=398608.0, ans=0.0 2024-09-24 02:19:06,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=398654.6666666667, ans=0.125 2024-09-24 02:19:07,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398654.6666666667, ans=0.1 2024-09-24 02:19:26,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=398701.3333333333, ans=0.035 2024-09-24 02:19:28,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=398701.3333333333, ans=10.0 2024-09-24 02:19:53,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=398794.6666666667, ans=0.95 2024-09-24 02:19:54,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=398794.6666666667, ans=0.0 2024-09-24 02:20:10,106 INFO [train.py:1198] (0/4) Epoch 22, batch 3650, loss[loss=0.2261, ctc_loss=0.1532, cr_loss=0.3642, over 16517.00 frames. ], tot_loss[loss=0.2098, ctc_loss=0.1389, cr_loss=0.3546, over 3370505.23 frames. ], batch size: 66, lr: 5.39e-03, grad_scale: 32.0 2024-09-24 02:20:12,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2024-09-24 02:20:44,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2024-09-24 02:21:03,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=398981.3333333333, ans=0.125 2024-09-24 02:21:09,558 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.259e+02 1.359e+02 1.456e+02 2.035e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-24 02:21:11,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-24 02:21:16,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=399028.0, ans=0.0 2024-09-24 02:21:20,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-09-24 02:21:21,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=399028.0, ans=0.0 2024-09-24 02:21:24,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=399028.0, ans=0.125 2024-09-24 02:21:27,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=399028.0, ans=0.125 2024-09-24 02:21:30,492 INFO [train.py:1198] (0/4) Epoch 22, batch 3700, loss[loss=0.1841, ctc_loss=0.1173, cr_loss=0.334, over 17173.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.1392, cr_loss=0.355, over 3371073.51 frames. ], batch size: 41, lr: 5.39e-03, grad_scale: 32.0 2024-09-24 02:21:32,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=399074.6666666667, ans=0.025 2024-09-24 02:21:39,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.52 vs. limit=15.0 2024-09-24 02:21:49,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=399121.3333333333, ans=0.0 2024-09-24 02:21:57,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2024-09-24 02:22:17,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=399214.6666666667, ans=0.125 2024-09-24 02:22:17,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-24 02:22:29,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=399214.6666666667, ans=0.035 2024-09-24 02:22:37,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=399261.3333333333, ans=0.0 2024-09-24 02:22:48,076 INFO [train.py:1198] (0/4) Epoch 22, batch 3750, loss[loss=0.2351, ctc_loss=0.1547, cr_loss=0.402, over 17337.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.1399, cr_loss=0.3568, over 3377487.91 frames. ], batch size: 51, lr: 5.39e-03, grad_scale: 32.0 2024-09-24 02:23:08,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=399354.6666666667, ans=0.125 2024-09-24 02:23:21,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=399401.3333333333, ans=0.07 2024-09-24 02:23:21,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 2024-09-24 02:23:30,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=399401.3333333333, ans=0.0 2024-09-24 02:23:46,154 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.265e+02 1.370e+02 1.488e+02 1.870e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-24 02:24:07,238 INFO [train.py:1198] (0/4) Epoch 22, batch 3800, loss[loss=0.2312, ctc_loss=0.1549, cr_loss=0.3818, over 16892.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1415, cr_loss=0.3586, over 3359422.03 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 32.0 2024-09-24 02:24:09,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=399541.3333333333, ans=0.2 2024-09-24 02:24:12,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=399541.3333333333, ans=0.125 2024-09-24 02:24:14,329 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-24 02:24:15,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=399541.3333333333, ans=6.0 2024-09-24 02:24:30,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=22.5 2024-09-24 02:24:48,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=399634.6666666667, ans=0.0 2024-09-24 02:25:06,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399728.0, ans=0.1 2024-09-24 02:25:17,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=399728.0, ans=15.0 2024-09-24 02:25:17,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2024-09-24 02:25:18,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399728.0, ans=0.1 2024-09-24 02:25:23,718 INFO [train.py:1198] (0/4) Epoch 22, batch 3850, loss[loss=0.2753, ctc_loss=0.1972, cr_loss=0.3905, over 11688.00 frames. ], tot_loss[loss=0.2165, ctc_loss=0.1442, cr_loss=0.3616, over 3300698.84 frames. ], batch size: 123, lr: 5.39e-03, grad_scale: 16.0 2024-09-24 02:25:34,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=399774.6666666667, ans=0.1 2024-09-24 02:25:41,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=399821.3333333333, ans=0.025 2024-09-24 02:25:42,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=399821.3333333333, ans=0.125 2024-09-24 02:26:17,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=399914.6666666667, ans=0.125 2024-09-24 02:26:20,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=399914.6666666667, ans=0.125 2024-09-24 02:26:22,073 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.394e+02 1.521e+02 1.653e+02 2.855e+02, threshold=3.042e+02, percent-clipped=1.0 2024-09-24 02:26:25,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=399961.3333333333, ans=0.2 2024-09-24 02:26:31,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=399961.3333333333, ans=0.125 2024-09-24 02:26:35,509 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-22.pt 2024-09-24 02:27:26,203 INFO [train.py:1198] (0/4) Epoch 23, batch 0, loss[loss=0.1868, ctc_loss=0.1214, cr_loss=0.3269, over 16607.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1214, cr_loss=0.3269, over 16607.00 frames. ], batch size: 37, lr: 5.27e-03, grad_scale: 32.0 2024-09-24 02:27:26,204 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 02:27:41,790 INFO [train.py:1230] (0/4) Epoch 23, validation: loss=0.03754, ctc_loss=0.03754, cr_loss=8.311e-15, over 944034.00 frames. 2024-09-24 02:27:41,791 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 02:28:22,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=400087.3333333333, ans=0.0 2024-09-24 02:28:56,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=400180.6666666667, ans=0.04949747468305833 2024-09-24 02:29:03,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=400180.6666666667, ans=0.125 2024-09-24 02:29:04,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=400227.3333333333, ans=0.0 2024-09-24 02:29:06,171 INFO [train.py:1198] (0/4) Epoch 23, batch 50, loss[loss=0.24, ctc_loss=0.162, cr_loss=0.3897, over 14912.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.1395, cr_loss=0.3533, over 751270.88 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:29:08,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=400227.3333333333, ans=0.125 2024-09-24 02:29:08,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-09-24 02:29:10,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=22.5 2024-09-24 02:29:11,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=400227.3333333333, ans=0.2 2024-09-24 02:29:14,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=400227.3333333333, ans=0.0 2024-09-24 02:29:18,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2024-09-24 02:29:47,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=400320.6666666667, ans=0.04949747468305833 2024-09-24 02:29:56,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-09-24 02:30:10,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=400414.0, ans=0.125 2024-09-24 02:30:11,721 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.262e+02 1.335e+02 1.476e+02 2.366e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 02:30:26,017 INFO [train.py:1198] (0/4) Epoch 23, batch 100, loss[loss=0.2509, ctc_loss=0.1693, cr_loss=0.4081, over 17212.00 frames. ], tot_loss[loss=0.211, ctc_loss=0.1394, cr_loss=0.358, over 1336130.55 frames. ], batch size: 55, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:30:59,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=400554.0, ans=0.125 2024-09-24 02:31:17,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=400600.6666666667, ans=0.0 2024-09-24 02:31:50,461 INFO [train.py:1198] (0/4) Epoch 23, batch 150, loss[loss=0.197, ctc_loss=0.1298, cr_loss=0.3358, over 17304.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.1394, cr_loss=0.3566, over 1787289.75 frames. ], batch size: 51, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:31:54,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=400694.0, ans=0.1 2024-09-24 02:32:05,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-09-24 02:32:13,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=400740.6666666667, ans=0.125 2024-09-24 02:32:38,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=400834.0, ans=0.0 2024-09-24 02:32:55,955 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.238e+02 1.330e+02 1.442e+02 1.852e+02, threshold=2.660e+02, percent-clipped=0.0 2024-09-24 02:33:00,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=400880.6666666667, ans=0.025 2024-09-24 02:33:12,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2024-09-24 02:33:13,118 INFO [train.py:1198] (0/4) Epoch 23, batch 200, loss[loss=0.2003, ctc_loss=0.1333, cr_loss=0.3347, over 17030.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1405, cr_loss=0.3579, over 2129471.71 frames. ], batch size: 44, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:33:16,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=400927.3333333333, ans=0.2 2024-09-24 02:33:27,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=400974.0, ans=0.125 2024-09-24 02:33:32,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=400974.0, ans=0.125 2024-09-24 02:34:35,731 INFO [train.py:1198] (0/4) Epoch 23, batch 250, loss[loss=0.2057, ctc_loss=0.1335, cr_loss=0.3609, over 17018.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1405, cr_loss=0.358, over 2405150.79 frames. ], batch size: 44, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:35:09,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=401254.0, ans=0.0 2024-09-24 02:35:41,289 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.253e+02 1.336e+02 1.468e+02 3.243e+02, threshold=2.673e+02, percent-clipped=1.0 2024-09-24 02:35:51,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=401347.3333333333, ans=0.0 2024-09-24 02:35:55,854 INFO [train.py:1198] (0/4) Epoch 23, batch 300, loss[loss=0.1662, ctc_loss=0.1071, cr_loss=0.2957, over 16982.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.14, cr_loss=0.3569, over 2620545.24 frames. ], batch size: 42, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:36:05,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=401394.0, ans=0.0 2024-09-24 02:36:10,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=401394.0, ans=0.04949747468305833 2024-09-24 02:36:29,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=401440.6666666667, ans=0.0 2024-09-24 02:36:50,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=401534.0, ans=0.125 2024-09-24 02:37:11,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=401580.6666666667, ans=0.125 2024-09-24 02:37:12,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=401580.6666666667, ans=0.0 2024-09-24 02:37:20,387 INFO [train.py:1198] (0/4) Epoch 23, batch 350, loss[loss=0.1973, ctc_loss=0.1298, cr_loss=0.3371, over 17299.00 frames. ], tot_loss[loss=0.2104, ctc_loss=0.1391, cr_loss=0.3562, over 2792101.35 frames. ], batch size: 51, lr: 5.26e-03, grad_scale: 32.0 2024-09-24 02:37:38,023 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 02:38:20,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=401767.3333333333, ans=0.125 2024-09-24 02:38:28,351 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.260e+02 1.355e+02 1.507e+02 3.262e+02, threshold=2.709e+02, percent-clipped=1.0 2024-09-24 02:38:29,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2024-09-24 02:38:39,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=401814.0, ans=0.2 2024-09-24 02:38:42,886 INFO [train.py:1198] (0/4) Epoch 23, batch 400, loss[loss=0.2306, ctc_loss=0.1519, cr_loss=0.3935, over 17223.00 frames. ], tot_loss[loss=0.2104, ctc_loss=0.1391, cr_loss=0.3567, over 2916285.25 frames. ], batch size: 50, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:38:58,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=401860.6666666667, ans=0.125 2024-09-24 02:39:49,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=402047.3333333333, ans=0.04949747468305833 2024-09-24 02:39:54,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402047.3333333333, ans=0.1 2024-09-24 02:39:56,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-09-24 02:39:58,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=402047.3333333333, ans=0.0 2024-09-24 02:40:04,970 INFO [train.py:1198] (0/4) Epoch 23, batch 450, loss[loss=0.2532, ctc_loss=0.1795, cr_loss=0.3683, over 11699.00 frames. ], tot_loss[loss=0.21, ctc_loss=0.1388, cr_loss=0.3561, over 3004093.07 frames. ], batch size: 123, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:40:19,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=402140.6666666667, ans=0.0 2024-09-24 02:40:35,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=402187.3333333333, ans=15.0 2024-09-24 02:40:41,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402187.3333333333, ans=0.125 2024-09-24 02:40:49,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=402187.3333333333, ans=0.125 2024-09-24 02:41:12,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=402280.6666666667, ans=0.0 2024-09-24 02:41:13,371 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.257e+02 1.349e+02 1.505e+02 2.195e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 02:41:18,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=402280.6666666667, ans=0.125 2024-09-24 02:41:20,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=402280.6666666667, ans=0.025 2024-09-24 02:41:24,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402280.6666666667, ans=0.1 2024-09-24 02:41:27,648 INFO [train.py:1198] (0/4) Epoch 23, batch 500, loss[loss=0.21, ctc_loss=0.1364, cr_loss=0.3679, over 17146.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1401, cr_loss=0.3579, over 3073679.37 frames. ], batch size: 45, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:41:32,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402327.3333333333, ans=0.1 2024-09-24 02:41:34,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=402327.3333333333, ans=0.125 2024-09-24 02:41:45,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=402374.0, ans=0.2 2024-09-24 02:42:25,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.88 vs. limit=10.0 2024-09-24 02:42:33,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=402514.0, ans=0.125 2024-09-24 02:42:50,032 INFO [train.py:1198] (0/4) Epoch 23, batch 550, loss[loss=0.2063, ctc_loss=0.1365, cr_loss=0.3488, over 17148.00 frames. ], tot_loss[loss=0.2121, ctc_loss=0.1405, cr_loss=0.3583, over 3132864.11 frames. ], batch size: 48, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:42:50,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-24 02:42:51,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=402560.6666666667, ans=0.125 2024-09-24 02:43:17,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=402607.3333333333, ans=0.125 2024-09-24 02:43:57,896 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.240e+02 1.347e+02 1.438e+02 1.839e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-24 02:44:12,301 INFO [train.py:1198] (0/4) Epoch 23, batch 600, loss[loss=0.2412, ctc_loss=0.1636, cr_loss=0.3879, over 17009.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.14, cr_loss=0.3579, over 3188755.54 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:44:20,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=402794.0, ans=0.125 2024-09-24 02:45:13,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=402934.0, ans=0.125 2024-09-24 02:45:13,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=402934.0, ans=0.125 2024-09-24 02:45:24,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=402980.6666666667, ans=0.1 2024-09-24 02:45:32,503 INFO [train.py:1198] (0/4) Epoch 23, batch 650, loss[loss=0.2048, ctc_loss=0.133, cr_loss=0.3588, over 17009.00 frames. ], tot_loss[loss=0.2119, ctc_loss=0.1403, cr_loss=0.358, over 3225582.36 frames. ], batch size: 51, lr: 5.25e-03, grad_scale: 32.0 2024-09-24 02:45:57,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=8.0 2024-09-24 02:46:07,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=403074.0, ans=0.125 2024-09-24 02:46:08,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403120.6666666667, ans=0.1 2024-09-24 02:46:22,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2024-09-24 02:46:43,831 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.212e+02 1.314e+02 1.389e+02 1.753e+02, threshold=2.627e+02, percent-clipped=0.0 2024-09-24 02:46:53,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=403214.0, ans=0.025 2024-09-24 02:46:57,956 INFO [train.py:1198] (0/4) Epoch 23, batch 700, loss[loss=0.2037, ctc_loss=0.1341, cr_loss=0.3479, over 17079.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.1388, cr_loss=0.3554, over 3256422.76 frames. ], batch size: 43, lr: 5.24e-03, grad_scale: 32.0 2024-09-24 02:47:19,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2024-09-24 02:47:35,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=403354.0, ans=0.0 2024-09-24 02:47:35,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-09-24 02:47:54,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403400.6666666667, ans=0.1 2024-09-24 02:48:20,980 INFO [train.py:1198] (0/4) Epoch 23, batch 750, loss[loss=0.1731, ctc_loss=0.1109, cr_loss=0.3113, over 16982.00 frames. ], tot_loss[loss=0.21, ctc_loss=0.139, cr_loss=0.3548, over 3273909.45 frames. ], batch size: 39, lr: 5.24e-03, grad_scale: 32.0 2024-09-24 02:48:25,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=403494.0, ans=0.125 2024-09-24 02:48:31,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.17 vs. limit=5.0 2024-09-24 02:48:41,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=403540.6666666667, ans=0.0 2024-09-24 02:48:43,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403540.6666666667, ans=0.1 2024-09-24 02:49:02,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=403587.3333333333, ans=0.125 2024-09-24 02:49:06,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=403587.3333333333, ans=0.125 2024-09-24 02:49:18,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.00 vs. limit=6.0 2024-09-24 02:49:24,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2024-09-24 02:49:30,262 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.305e+02 1.385e+02 1.543e+02 2.308e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-24 02:49:30,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=403680.6666666667, ans=0.125 2024-09-24 02:49:43,266 INFO [train.py:1198] (0/4) Epoch 23, batch 800, loss[loss=0.2039, ctc_loss=0.1346, cr_loss=0.3466, over 17298.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1396, cr_loss=0.3561, over 3290605.67 frames. ], batch size: 51, lr: 5.24e-03, grad_scale: 32.0 2024-09-24 02:50:08,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-24 02:51:08,894 INFO [train.py:1198] (0/4) Epoch 23, batch 850, loss[loss=0.1759, ctc_loss=0.1109, cr_loss=0.325, over 17304.00 frames. ], tot_loss[loss=0.2114, ctc_loss=0.14, cr_loss=0.3566, over 3306718.53 frames. ], batch size: 46, lr: 5.24e-03, grad_scale: 16.0 2024-09-24 02:51:39,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=404054.0, ans=0.0 2024-09-24 02:52:17,547 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.256e+02 1.343e+02 1.446e+02 2.174e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-24 02:52:28,723 INFO [train.py:1198] (0/4) Epoch 23, batch 900, loss[loss=0.1628, ctc_loss=0.1048, cr_loss=0.2898, over 17084.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.1389, cr_loss=0.3551, over 3322100.29 frames. ], batch size: 40, lr: 5.24e-03, grad_scale: 16.0 2024-09-24 02:52:28,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=404194.0, ans=0.125 2024-09-24 02:52:57,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404240.6666666667, ans=0.1 2024-09-24 02:53:02,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-09-24 02:53:05,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=404287.3333333333, ans=0.125 2024-09-24 02:53:05,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404287.3333333333, ans=0.1 2024-09-24 02:53:14,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=404287.3333333333, ans=0.0 2024-09-24 02:53:16,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=404287.3333333333, ans=0.125 2024-09-24 02:53:27,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=12.0 2024-09-24 02:53:39,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404380.6666666667, ans=0.1 2024-09-24 02:53:53,645 INFO [train.py:1198] (0/4) Epoch 23, batch 950, loss[loss=0.2095, ctc_loss=0.1341, cr_loss=0.3769, over 16988.00 frames. ], tot_loss[loss=0.21, ctc_loss=0.1389, cr_loss=0.3555, over 3330406.42 frames. ], batch size: 53, lr: 5.24e-03, grad_scale: 16.0 2024-09-24 02:54:08,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404474.0, ans=0.1 2024-09-24 02:54:21,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=404474.0, ans=0.0 2024-09-24 02:54:27,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=404520.6666666667, ans=0.0 2024-09-24 02:54:31,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.94 vs. limit=10.0 2024-09-24 02:54:32,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=404520.6666666667, ans=0.2 2024-09-24 02:54:34,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=404520.6666666667, ans=0.2 2024-09-24 02:54:37,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=404520.6666666667, ans=0.125 2024-09-24 02:54:54,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=404567.3333333333, ans=0.125 2024-09-24 02:55:02,711 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.281e+02 1.400e+02 1.572e+02 2.113e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-24 02:55:13,697 INFO [train.py:1198] (0/4) Epoch 23, batch 1000, loss[loss=0.1788, ctc_loss=0.1146, cr_loss=0.3212, over 17099.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1395, cr_loss=0.3562, over 3332907.29 frames. ], batch size: 40, lr: 5.24e-03, grad_scale: 16.0 2024-09-24 02:55:15,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=404660.6666666667, ans=0.125 2024-09-24 02:56:14,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=404800.6666666667, ans=0.2 2024-09-24 02:56:26,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-24 02:56:30,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=404847.3333333333, ans=0.0 2024-09-24 02:56:36,221 INFO [train.py:1198] (0/4) Epoch 23, batch 1050, loss[loss=0.2062, ctc_loss=0.1365, cr_loss=0.3482, over 17083.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1393, cr_loss=0.3561, over 3345311.24 frames. ], batch size: 46, lr: 5.23e-03, grad_scale: 16.0 2024-09-24 02:56:54,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=404940.6666666667, ans=0.125 2024-09-24 02:56:57,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=404940.6666666667, ans=0.125 2024-09-24 02:57:04,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=404940.6666666667, ans=0.125 2024-09-24 02:57:06,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=404987.3333333333, ans=0.125 2024-09-24 02:57:35,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=405034.0, ans=0.125 2024-09-24 02:57:45,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=405080.6666666667, ans=0.125 2024-09-24 02:57:46,811 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.300e+02 1.378e+02 1.529e+02 2.270e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 02:57:57,826 INFO [train.py:1198] (0/4) Epoch 23, batch 1100, loss[loss=0.1983, ctc_loss=0.1318, cr_loss=0.3325, over 16067.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1394, cr_loss=0.3554, over 3351289.08 frames. ], batch size: 74, lr: 5.23e-03, grad_scale: 16.0 2024-09-24 02:58:32,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=405220.6666666667, ans=0.1 2024-09-24 02:59:07,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=405314.0, ans=0.125 2024-09-24 02:59:20,147 INFO [train.py:1198] (0/4) Epoch 23, batch 1150, loss[loss=0.2344, ctc_loss=0.1563, cr_loss=0.3907, over 17018.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1395, cr_loss=0.3558, over 3352862.99 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 16.0 2024-09-24 02:59:23,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=405360.6666666667, ans=0.125 2024-09-24 02:59:41,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=405407.3333333333, ans=0.0 2024-09-24 02:59:41,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405407.3333333333, ans=0.1 2024-09-24 02:59:46,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=405407.3333333333, ans=0.015 2024-09-24 02:59:51,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=405454.0, ans=0.95 2024-09-24 03:00:03,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=405454.0, ans=0.05 2024-09-24 03:00:14,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=22.5 2024-09-24 03:00:29,098 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.259e+02 1.339e+02 1.439e+02 1.652e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 03:00:30,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2024-09-24 03:00:31,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=405547.3333333333, ans=0.125 2024-09-24 03:00:40,375 INFO [train.py:1198] (0/4) Epoch 23, batch 1200, loss[loss=0.1979, ctc_loss=0.1286, cr_loss=0.3463, over 17269.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1393, cr_loss=0.3562, over 3354935.65 frames. ], batch size: 44, lr: 5.23e-03, grad_scale: 32.0 2024-09-24 03:01:03,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=405640.6666666667, ans=0.125 2024-09-24 03:01:06,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=405640.6666666667, ans=10.0 2024-09-24 03:01:17,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=405687.3333333333, ans=0.125 2024-09-24 03:01:18,484 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=15.0 2024-09-24 03:01:19,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=405687.3333333333, ans=0.0 2024-09-24 03:01:27,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-09-24 03:01:41,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=405734.0, ans=0.125 2024-09-24 03:01:42,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=405734.0, ans=22.5 2024-09-24 03:01:57,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=405780.6666666667, ans=0.0 2024-09-24 03:02:00,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.65 vs. limit=10.0 2024-09-24 03:02:05,574 INFO [train.py:1198] (0/4) Epoch 23, batch 1250, loss[loss=0.2186, ctc_loss=0.144, cr_loss=0.3729, over 17078.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1396, cr_loss=0.3575, over 3355538.99 frames. ], batch size: 46, lr: 5.23e-03, grad_scale: 32.0 2024-09-24 03:02:05,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=405827.3333333333, ans=0.0 2024-09-24 03:02:23,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405874.0, ans=0.1 2024-09-24 03:02:25,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=405874.0, ans=0.0 2024-09-24 03:02:29,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=405874.0, ans=0.035 2024-09-24 03:02:53,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=405920.6666666667, ans=0.0 2024-09-24 03:03:14,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=406014.0, ans=0.125 2024-09-24 03:03:18,869 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.257e+02 1.354e+02 1.461e+02 2.818e+02, threshold=2.708e+02, percent-clipped=1.0 2024-09-24 03:03:28,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=406014.0, ans=0.1 2024-09-24 03:03:30,975 INFO [train.py:1198] (0/4) Epoch 23, batch 1300, loss[loss=0.2109, ctc_loss=0.136, cr_loss=0.3741, over 16986.00 frames. ], tot_loss[loss=0.2101, ctc_loss=0.1389, cr_loss=0.356, over 3363115.32 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 16.0 2024-09-24 03:03:36,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-09-24 03:03:48,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=406107.3333333333, ans=0.0 2024-09-24 03:03:50,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=406107.3333333333, ans=0.2 2024-09-24 03:03:53,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=406107.3333333333, ans=0.125 2024-09-24 03:03:57,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.75 vs. limit=5.0 2024-09-24 03:04:20,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=406200.6666666667, ans=0.035 2024-09-24 03:04:33,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=406247.3333333333, ans=0.025 2024-09-24 03:04:47,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=406247.3333333333, ans=0.0 2024-09-24 03:04:50,625 INFO [train.py:1198] (0/4) Epoch 23, batch 1350, loss[loss=0.2425, ctc_loss=0.1616, cr_loss=0.4043, over 17190.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1391, cr_loss=0.3569, over 3369792.59 frames. ], batch size: 47, lr: 5.23e-03, grad_scale: 8.0 2024-09-24 03:05:48,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=406434.0, ans=0.125 2024-09-24 03:06:07,397 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.244e+02 1.338e+02 1.449e+02 2.733e+02, threshold=2.676e+02, percent-clipped=1.0 2024-09-24 03:06:15,516 INFO [train.py:1198] (0/4) Epoch 23, batch 1400, loss[loss=0.1816, ctc_loss=0.115, cr_loss=0.3331, over 16334.00 frames. ], tot_loss[loss=0.2104, ctc_loss=0.139, cr_loss=0.3567, over 3372217.58 frames. ], batch size: 36, lr: 5.22e-03, grad_scale: 8.0 2024-09-24 03:06:19,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=406527.3333333333, ans=0.125 2024-09-24 03:06:26,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=406527.3333333333, ans=0.1 2024-09-24 03:06:30,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=406574.0, ans=0.015 2024-09-24 03:06:37,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.17 vs. limit=10.0 2024-09-24 03:06:40,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=12.0 2024-09-24 03:06:46,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=406620.6666666667, ans=0.125 2024-09-24 03:07:03,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=406667.3333333333, ans=0.5 2024-09-24 03:07:11,357 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:07:11,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=406667.3333333333, ans=0.125 2024-09-24 03:07:35,431 INFO [train.py:1198] (0/4) Epoch 23, batch 1450, loss[loss=0.2129, ctc_loss=0.1418, cr_loss=0.3557, over 17324.00 frames. ], tot_loss[loss=0.2101, ctc_loss=0.1389, cr_loss=0.356, over 3370367.24 frames. ], batch size: 51, lr: 5.22e-03, grad_scale: 8.0 2024-09-24 03:07:35,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=406760.6666666667, ans=0.125 2024-09-24 03:07:48,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2024-09-24 03:07:51,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=22.5 2024-09-24 03:07:54,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=406807.3333333333, ans=0.125 2024-09-24 03:08:27,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=406900.6666666667, ans=0.0 2024-09-24 03:08:39,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=406900.6666666667, ans=0.025 2024-09-24 03:08:52,574 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.227e+02 1.307e+02 1.393e+02 1.809e+02, threshold=2.613e+02, percent-clipped=0.0 2024-09-24 03:09:00,495 INFO [train.py:1198] (0/4) Epoch 23, batch 1500, loss[loss=0.2125, ctc_loss=0.1447, cr_loss=0.339, over 17155.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.1391, cr_loss=0.3555, over 3361843.87 frames. ], batch size: 45, lr: 5.22e-03, grad_scale: 8.0 2024-09-24 03:09:02,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2024-09-24 03:09:20,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2024-09-24 03:09:23,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2024-09-24 03:09:48,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=407134.0, ans=0.125 2024-09-24 03:09:54,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=407134.0, ans=0.2 2024-09-24 03:10:04,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=407180.6666666667, ans=0.025 2024-09-24 03:10:20,387 INFO [train.py:1198] (0/4) Epoch 23, batch 1550, loss[loss=0.2152, ctc_loss=0.1446, cr_loss=0.3527, over 17038.00 frames. ], tot_loss[loss=0.2101, ctc_loss=0.139, cr_loss=0.3552, over 3363425.41 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 8.0 2024-09-24 03:10:41,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=407274.0, ans=0.125 2024-09-24 03:11:21,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=407367.3333333333, ans=0.0 2024-09-24 03:11:37,267 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.292e+02 1.390e+02 1.535e+02 2.050e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-24 03:11:45,286 INFO [train.py:1198] (0/4) Epoch 23, batch 1600, loss[loss=0.2074, ctc_loss=0.1377, cr_loss=0.3484, over 17210.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1386, cr_loss=0.3545, over 3367383.95 frames. ], batch size: 47, lr: 5.22e-03, grad_scale: 16.0 2024-09-24 03:11:50,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2024-09-24 03:12:11,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=407507.3333333333, ans=0.1 2024-09-24 03:12:40,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=407600.6666666667, ans=0.025 2024-09-24 03:12:55,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=12.0 2024-09-24 03:13:08,033 INFO [train.py:1198] (0/4) Epoch 23, batch 1650, loss[loss=0.2285, ctc_loss=0.1504, cr_loss=0.3903, over 16922.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1385, cr_loss=0.3549, over 3372270.90 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 16.0 2024-09-24 03:13:17,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=407694.0, ans=0.2 2024-09-24 03:13:25,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=12.0 2024-09-24 03:13:26,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=407740.6666666667, ans=0.2 2024-09-24 03:13:34,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407740.6666666667, ans=0.1 2024-09-24 03:14:21,925 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.251e+02 1.320e+02 1.451e+02 2.604e+02, threshold=2.640e+02, percent-clipped=0.0 2024-09-24 03:14:29,900 INFO [train.py:1198] (0/4) Epoch 23, batch 1700, loss[loss=0.2333, ctc_loss=0.1555, cr_loss=0.389, over 17002.00 frames. ], tot_loss[loss=0.2091, ctc_loss=0.1382, cr_loss=0.3548, over 3376007.39 frames. ], batch size: 53, lr: 5.22e-03, grad_scale: 16.0 2024-09-24 03:14:52,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=407974.0, ans=0.07 2024-09-24 03:14:57,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=407974.0, ans=0.125 2024-09-24 03:15:21,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=408067.3333333333, ans=0.0 2024-09-24 03:15:34,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=408114.0, ans=0.025 2024-09-24 03:15:38,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=408114.0, ans=0.125 2024-09-24 03:15:52,377 INFO [train.py:1198] (0/4) Epoch 23, batch 1750, loss[loss=0.1933, ctc_loss=0.1257, cr_loss=0.3378, over 17026.00 frames. ], tot_loss[loss=0.2093, ctc_loss=0.1383, cr_loss=0.3548, over 3375974.70 frames. ], batch size: 44, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:16:08,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=408160.6666666667, ans=0.0 2024-09-24 03:16:14,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=408207.3333333333, ans=0.0 2024-09-24 03:16:21,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2024-09-24 03:16:50,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2024-09-24 03:16:56,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=408300.6666666667, ans=0.125 2024-09-24 03:17:06,658 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.263e+02 1.353e+02 1.472e+02 2.567e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 03:17:14,534 INFO [train.py:1198] (0/4) Epoch 23, batch 1800, loss[loss=0.1921, ctc_loss=0.1235, cr_loss=0.343, over 17248.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.1389, cr_loss=0.3551, over 3355553.58 frames. ], batch size: 44, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:17:19,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=408394.0, ans=0.0 2024-09-24 03:17:22,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=408394.0, ans=0.125 2024-09-24 03:17:34,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=408440.6666666667, ans=0.07 2024-09-24 03:18:14,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=408534.0, ans=0.125 2024-09-24 03:18:16,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=408534.0, ans=0.125 2024-09-24 03:18:39,949 INFO [train.py:1198] (0/4) Epoch 23, batch 1850, loss[loss=0.2229, ctc_loss=0.1504, cr_loss=0.3624, over 17020.00 frames. ], tot_loss[loss=0.2098, ctc_loss=0.1389, cr_loss=0.3545, over 3350901.81 frames. ], batch size: 56, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:19:02,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=408674.0, ans=0.125 2024-09-24 03:19:09,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.07 vs. limit=12.0 2024-09-24 03:19:13,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=408720.6666666667, ans=0.125 2024-09-24 03:19:39,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=408767.3333333333, ans=0.0 2024-09-24 03:19:42,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=408814.0, ans=0.1 2024-09-24 03:19:52,002 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.253e+02 1.335e+02 1.430e+02 2.025e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 03:20:00,002 INFO [train.py:1198] (0/4) Epoch 23, batch 1900, loss[loss=0.2006, ctc_loss=0.1307, cr_loss=0.3494, over 17084.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.1396, cr_loss=0.3566, over 3353773.96 frames. ], batch size: 43, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:20:14,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=408907.3333333333, ans=0.0 2024-09-24 03:20:19,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=408907.3333333333, ans=22.5 2024-09-24 03:20:25,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=408907.3333333333, ans=0.125 2024-09-24 03:20:29,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=408907.3333333333, ans=0.0 2024-09-24 03:20:33,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=408954.0, ans=0.125 2024-09-24 03:21:16,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=409047.3333333333, ans=0.0 2024-09-24 03:21:25,581 INFO [train.py:1198] (0/4) Epoch 23, batch 1950, loss[loss=0.2421, ctc_loss=0.1608, cr_loss=0.4063, over 17251.00 frames. ], tot_loss[loss=0.2096, ctc_loss=0.1385, cr_loss=0.355, over 3351952.60 frames. ], batch size: 55, lr: 5.21e-03, grad_scale: 16.0 2024-09-24 03:21:27,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=409094.0, ans=0.125 2024-09-24 03:21:40,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=409140.6666666667, ans=0.125 2024-09-24 03:21:50,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-09-24 03:22:09,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=409187.3333333333, ans=0.125 2024-09-24 03:22:40,199 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.272e+02 1.356e+02 1.502e+02 2.746e+02, threshold=2.712e+02, percent-clipped=1.0 2024-09-24 03:22:42,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=409280.6666666667, ans=0.125 2024-09-24 03:22:48,042 INFO [train.py:1198] (0/4) Epoch 23, batch 2000, loss[loss=0.187, ctc_loss=0.1209, cr_loss=0.3306, over 16966.00 frames. ], tot_loss[loss=0.209, ctc_loss=0.1382, cr_loss=0.3543, over 3358763.49 frames. ], batch size: 42, lr: 5.21e-03, grad_scale: 32.0 2024-09-24 03:23:25,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2024-09-24 03:23:34,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=409420.6666666667, ans=0.1 2024-09-24 03:23:42,116 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:24:10,502 INFO [train.py:1198] (0/4) Epoch 23, batch 2050, loss[loss=0.228, ctc_loss=0.1556, cr_loss=0.362, over 17095.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1383, cr_loss=0.3544, over 3359706.77 frames. ], batch size: 49, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:25:09,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=409700.6666666667, ans=0.0 2024-09-24 03:25:09,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=409700.6666666667, ans=0.0 2024-09-24 03:25:10,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=409700.6666666667, ans=0.07 2024-09-24 03:25:12,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=409700.6666666667, ans=0.125 2024-09-24 03:25:15,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=409747.3333333333, ans=0.0 2024-09-24 03:25:24,758 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.284e+02 1.371e+02 1.476e+02 1.863e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-24 03:25:31,153 INFO [train.py:1198] (0/4) Epoch 23, batch 2100, loss[loss=0.2561, ctc_loss=0.1773, cr_loss=0.3938, over 11995.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1384, cr_loss=0.3544, over 3351169.28 frames. ], batch size: 124, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:25:47,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=12.0 2024-09-24 03:26:07,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.05 vs. limit=15.0 2024-09-24 03:26:38,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=409980.6666666667, ans=0.0 2024-09-24 03:26:45,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=409980.6666666667, ans=0.125 2024-09-24 03:26:49,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=409980.6666666667, ans=0.0 2024-09-24 03:26:56,010 INFO [train.py:1198] (0/4) Epoch 23, batch 2150, loss[loss=0.2259, ctc_loss=0.1532, cr_loss=0.3638, over 16729.00 frames. ], tot_loss[loss=0.2087, ctc_loss=0.138, cr_loss=0.3536, over 3352604.20 frames. ], batch size: 61, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:26:57,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=410027.3333333333, ans=0.025 2024-09-24 03:27:20,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410074.0, ans=0.1 2024-09-24 03:27:20,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=410074.0, ans=0.125 2024-09-24 03:27:20,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=410074.0, ans=0.5 2024-09-24 03:28:09,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=410214.0, ans=0.025 2024-09-24 03:28:15,213 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.254e+02 1.321e+02 1.428e+02 2.805e+02, threshold=2.642e+02, percent-clipped=1.0 2024-09-24 03:28:21,745 INFO [train.py:1198] (0/4) Epoch 23, batch 2200, loss[loss=0.1937, ctc_loss=0.1286, cr_loss=0.3255, over 17026.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1376, cr_loss=0.353, over 3359575.45 frames. ], batch size: 56, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:28:57,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410354.0, ans=0.1 2024-09-24 03:29:19,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=410400.6666666667, ans=0.025 2024-09-24 03:29:37,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=410447.3333333333, ans=0.125 2024-09-24 03:29:41,637 INFO [train.py:1198] (0/4) Epoch 23, batch 2250, loss[loss=0.1993, ctc_loss=0.1304, cr_loss=0.3445, over 17061.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.1382, cr_loss=0.3535, over 3350840.73 frames. ], batch size: 39, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:29:52,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=22.5 2024-09-24 03:30:05,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=410540.6666666667, ans=0.2 2024-09-24 03:30:06,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=410540.6666666667, ans=0.0 2024-09-24 03:30:10,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=410540.6666666667, ans=0.125 2024-09-24 03:30:16,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-09-24 03:30:42,053 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-88000.pt 2024-09-24 03:30:53,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.65 vs. limit=22.5 2024-09-24 03:31:02,825 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.239e+02 1.305e+02 1.382e+02 2.270e+02, threshold=2.609e+02, percent-clipped=0.0 2024-09-24 03:31:09,444 INFO [train.py:1198] (0/4) Epoch 23, batch 2300, loss[loss=0.2045, ctc_loss=0.1326, cr_loss=0.3595, over 17019.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1375, cr_loss=0.3525, over 3360249.90 frames. ], batch size: 52, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:31:11,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=410727.3333333333, ans=0.035 2024-09-24 03:31:11,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=410727.3333333333, ans=0.2 2024-09-24 03:31:11,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2024-09-24 03:31:32,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=410774.0, ans=0.2 2024-09-24 03:31:43,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=410820.6666666667, ans=0.05 2024-09-24 03:31:51,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=410820.6666666667, ans=0.0 2024-09-24 03:31:52,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=410820.6666666667, ans=0.2 2024-09-24 03:32:02,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410867.3333333333, ans=0.1 2024-09-24 03:32:31,673 INFO [train.py:1198] (0/4) Epoch 23, batch 2350, loss[loss=0.1671, ctc_loss=0.1062, cr_loss=0.3045, over 17085.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1367, cr_loss=0.3509, over 3366244.34 frames. ], batch size: 40, lr: 5.20e-03, grad_scale: 16.0 2024-09-24 03:32:32,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=410960.6666666667, ans=0.0 2024-09-24 03:32:43,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=410960.6666666667, ans=0.0 2024-09-24 03:32:48,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=411007.3333333333, ans=0.2 2024-09-24 03:33:35,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=411100.6666666667, ans=0.025 2024-09-24 03:33:41,905 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:33:47,831 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.259e+02 1.351e+02 1.489e+02 2.005e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-24 03:33:54,329 INFO [train.py:1198] (0/4) Epoch 23, batch 2400, loss[loss=0.2306, ctc_loss=0.1538, cr_loss=0.3841, over 17100.00 frames. ], tot_loss[loss=0.2076, ctc_loss=0.1372, cr_loss=0.352, over 3358173.53 frames. ], batch size: 49, lr: 5.19e-03, grad_scale: 32.0 2024-09-24 03:34:05,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411194.0, ans=0.1 2024-09-24 03:34:06,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=15.0 2024-09-24 03:34:23,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411240.6666666667, ans=0.1 2024-09-24 03:34:34,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=411287.3333333333, ans=0.125 2024-09-24 03:34:58,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=411380.6666666667, ans=0.125 2024-09-24 03:35:06,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=411380.6666666667, ans=0.0 2024-09-24 03:35:14,351 INFO [train.py:1198] (0/4) Epoch 23, batch 2450, loss[loss=0.2149, ctc_loss=0.1415, cr_loss=0.3668, over 17267.00 frames. ], tot_loss[loss=0.2084, ctc_loss=0.1377, cr_loss=0.3533, over 3366316.58 frames. ], batch size: 44, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:35:33,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=411474.0, ans=0.2 2024-09-24 03:35:52,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=411520.6666666667, ans=0.125 2024-09-24 03:36:05,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411567.3333333333, ans=0.1 2024-09-24 03:36:27,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411614.0, ans=0.1 2024-09-24 03:36:35,086 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.268e+02 1.337e+02 1.491e+02 2.179e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-24 03:36:35,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-09-24 03:36:39,854 INFO [train.py:1198] (0/4) Epoch 23, batch 2500, loss[loss=0.221, ctc_loss=0.1454, cr_loss=0.3783, over 17148.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.1381, cr_loss=0.3537, over 3354517.76 frames. ], batch size: 48, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:38:02,886 INFO [train.py:1198] (0/4) Epoch 23, batch 2550, loss[loss=0.2309, ctc_loss=0.1548, cr_loss=0.3805, over 16976.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.138, cr_loss=0.3542, over 3360650.09 frames. ], batch size: 42, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:38:21,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411940.6666666667, ans=0.1 2024-09-24 03:38:53,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=412034.0, ans=0.125 2024-09-24 03:39:15,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412080.6666666667, ans=0.1 2024-09-24 03:39:19,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2024-09-24 03:39:20,141 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.256e+02 1.367e+02 1.488e+02 2.148e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-24 03:39:24,811 INFO [train.py:1198] (0/4) Epoch 23, batch 2600, loss[loss=0.2323, ctc_loss=0.1527, cr_loss=0.3978, over 17052.00 frames. ], tot_loss[loss=0.2105, ctc_loss=0.1393, cr_loss=0.3559, over 3355660.24 frames. ], batch size: 56, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:39:40,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=412174.0, ans=0.2 2024-09-24 03:39:45,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=412174.0, ans=0.125 2024-09-24 03:39:56,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=412220.6666666667, ans=0.0 2024-09-24 03:40:08,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=412220.6666666667, ans=0.1 2024-09-24 03:40:32,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=412314.0, ans=0.2 2024-09-24 03:40:42,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=412314.0, ans=0.125 2024-09-24 03:40:43,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=22.5 2024-09-24 03:40:49,868 INFO [train.py:1198] (0/4) Epoch 23, batch 2650, loss[loss=0.1997, ctc_loss=0.1296, cr_loss=0.3504, over 16677.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.1395, cr_loss=0.356, over 3347139.55 frames. ], batch size: 37, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:40:54,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-09-24 03:40:56,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.46 vs. limit=10.0 2024-09-24 03:41:16,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=412407.3333333333, ans=0.125 2024-09-24 03:41:17,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=412407.3333333333, ans=0.09899494936611666 2024-09-24 03:41:20,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=412454.0, ans=0.125 2024-09-24 03:41:27,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=412454.0, ans=0.2 2024-09-24 03:41:30,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=412454.0, ans=0.0 2024-09-24 03:41:40,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.79 vs. limit=10.0 2024-09-24 03:41:45,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=412500.6666666667, ans=0.0 2024-09-24 03:41:56,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=412547.3333333333, ans=0.1 2024-09-24 03:42:05,849 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.276e+02 1.370e+02 1.490e+02 2.063e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-24 03:42:06,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=412547.3333333333, ans=0.125 2024-09-24 03:42:10,851 INFO [train.py:1198] (0/4) Epoch 23, batch 2700, loss[loss=0.2476, ctc_loss=0.1658, cr_loss=0.4091, over 17011.00 frames. ], tot_loss[loss=0.2113, ctc_loss=0.1398, cr_loss=0.3576, over 3350475.32 frames. ], batch size: 53, lr: 5.19e-03, grad_scale: 16.0 2024-09-24 03:42:19,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=412594.0, ans=0.0 2024-09-24 03:42:41,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=22.5 2024-09-24 03:42:49,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2024-09-24 03:42:50,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412687.3333333333, ans=0.1 2024-09-24 03:43:11,299 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:43:32,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=412780.6666666667, ans=0.125 2024-09-24 03:43:32,375 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:43:37,011 INFO [train.py:1198] (0/4) Epoch 23, batch 2750, loss[loss=0.163, ctc_loss=0.1041, cr_loss=0.2944, over 17043.00 frames. ], tot_loss[loss=0.2097, ctc_loss=0.1386, cr_loss=0.3555, over 3348509.13 frames. ], batch size: 39, lr: 5.18e-03, grad_scale: 16.0 2024-09-24 03:43:51,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=412874.0, ans=0.125 2024-09-24 03:44:30,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=412967.3333333333, ans=0.125 2024-09-24 03:44:44,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=413014.0, ans=0.2 2024-09-24 03:44:44,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=12.0 2024-09-24 03:44:52,047 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.288e+02 1.407e+02 1.537e+02 4.593e+02, threshold=2.814e+02, percent-clipped=2.0 2024-09-24 03:44:56,676 INFO [train.py:1198] (0/4) Epoch 23, batch 2800, loss[loss=0.1819, ctc_loss=0.1205, cr_loss=0.3074, over 17043.00 frames. ], tot_loss[loss=0.2108, ctc_loss=0.1394, cr_loss=0.3569, over 3349161.65 frames. ], batch size: 39, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:45:03,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-09-24 03:45:04,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=413060.6666666667, ans=0.125 2024-09-24 03:45:21,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413107.3333333333, ans=0.1 2024-09-24 03:45:23,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=413107.3333333333, ans=0.125 2024-09-24 03:45:24,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413107.3333333333, ans=0.1 2024-09-24 03:46:21,928 INFO [train.py:1198] (0/4) Epoch 23, batch 2850, loss[loss=0.2325, ctc_loss=0.1492, cr_loss=0.4162, over 16929.00 frames. ], tot_loss[loss=0.2119, ctc_loss=0.1403, cr_loss=0.3584, over 3336884.51 frames. ], batch size: 58, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:46:25,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=413294.0, ans=0.2 2024-09-24 03:46:48,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-09-24 03:47:25,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=413434.0, ans=0.0 2024-09-24 03:47:25,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=413434.0, ans=0.125 2024-09-24 03:47:29,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=413480.6666666667, ans=0.0 2024-09-24 03:47:31,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2024-09-24 03:47:39,673 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.282e+02 1.396e+02 1.534e+02 2.289e+02, threshold=2.792e+02, percent-clipped=0.0 2024-09-24 03:47:41,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=413480.6666666667, ans=0.05 2024-09-24 03:47:44,652 INFO [train.py:1198] (0/4) Epoch 23, batch 2900, loss[loss=0.2015, ctc_loss=0.1319, cr_loss=0.3478, over 16653.00 frames. ], tot_loss[loss=0.2122, ctc_loss=0.1405, cr_loss=0.3585, over 3338997.00 frames. ], batch size: 37, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:48:15,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-24 03:48:23,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=413620.6666666667, ans=0.025 2024-09-24 03:48:40,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=413667.3333333333, ans=0.1 2024-09-24 03:48:42,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=413667.3333333333, ans=0.025 2024-09-24 03:49:01,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=413714.0, ans=0.125 2024-09-24 03:49:07,439 INFO [train.py:1198] (0/4) Epoch 23, batch 2950, loss[loss=0.1926, ctc_loss=0.1268, cr_loss=0.3292, over 17013.00 frames. ], tot_loss[loss=0.2126, ctc_loss=0.1407, cr_loss=0.3594, over 3347215.79 frames. ], batch size: 44, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:50:00,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=413900.6666666667, ans=0.125 2024-09-24 03:50:12,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413947.3333333333, ans=0.1 2024-09-24 03:50:17,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=413947.3333333333, ans=0.0 2024-09-24 03:50:22,355 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.232e+02 1.296e+02 1.402e+02 3.065e+02, threshold=2.592e+02, percent-clipped=1.0 2024-09-24 03:50:27,208 INFO [train.py:1198] (0/4) Epoch 23, batch 3000, loss[loss=0.2189, ctc_loss=0.1453, cr_loss=0.3681, over 16884.00 frames. ], tot_loss[loss=0.2112, ctc_loss=0.1397, cr_loss=0.3577, over 3351673.82 frames. ], batch size: 58, lr: 5.18e-03, grad_scale: 32.0 2024-09-24 03:50:27,210 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 03:50:42,635 INFO [train.py:1230] (0/4) Epoch 23, validation: loss=0.03816, ctc_loss=0.03816, cr_loss=8.083e-15, over 944034.00 frames. 2024-09-24 03:50:42,636 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 03:50:46,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=12.0 2024-09-24 03:51:15,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=12.0 2024-09-24 03:51:43,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=414134.0, ans=0.125 2024-09-24 03:52:03,782 INFO [train.py:1198] (0/4) Epoch 23, batch 3050, loss[loss=0.1851, ctc_loss=0.1208, cr_loss=0.322, over 17069.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1384, cr_loss=0.3553, over 3362750.82 frames. ], batch size: 39, lr: 5.18e-03, grad_scale: 16.0 2024-09-24 03:52:16,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=414227.3333333333, ans=0.025 2024-09-24 03:52:56,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=22.5 2024-09-24 03:52:57,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=414367.3333333333, ans=0.0 2024-09-24 03:53:01,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.30 vs. limit=10.0 2024-09-24 03:53:05,617 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2024-09-24 03:53:11,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=414414.0, ans=0.125 2024-09-24 03:53:19,033 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.284e+02 1.405e+02 1.521e+02 2.229e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-24 03:53:22,206 INFO [train.py:1198] (0/4) Epoch 23, batch 3100, loss[loss=0.2001, ctc_loss=0.1299, cr_loss=0.351, over 17296.00 frames. ], tot_loss[loss=0.2103, ctc_loss=0.139, cr_loss=0.3563, over 3355257.43 frames. ], batch size: 51, lr: 5.17e-03, grad_scale: 16.0 2024-09-24 03:53:55,262 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:54:34,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=12.0 2024-09-24 03:54:39,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=414647.3333333333, ans=0.125 2024-09-24 03:54:42,869 INFO [train.py:1198] (0/4) Epoch 23, batch 3150, loss[loss=0.1921, ctc_loss=0.1254, cr_loss=0.3336, over 17189.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1382, cr_loss=0.3547, over 3364312.62 frames. ], batch size: 41, lr: 5.17e-03, grad_scale: 16.0 2024-09-24 03:54:44,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=414694.0, ans=0.2 2024-09-24 03:55:23,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=414787.3333333333, ans=0.125 2024-09-24 03:55:57,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=414880.6666666667, ans=0.0 2024-09-24 03:55:59,846 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.246e+02 1.369e+02 1.500e+02 3.916e+02, threshold=2.738e+02, percent-clipped=1.0 2024-09-24 03:56:03,018 INFO [train.py:1198] (0/4) Epoch 23, batch 3200, loss[loss=0.196, ctc_loss=0.1284, cr_loss=0.3381, over 17159.00 frames. ], tot_loss[loss=0.2094, ctc_loss=0.1385, cr_loss=0.3542, over 3367529.10 frames. ], batch size: 48, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 03:56:08,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=414927.3333333333, ans=0.2 2024-09-24 03:56:11,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.86 vs. limit=22.5 2024-09-24 03:56:31,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=414974.0, ans=0.0 2024-09-24 03:56:37,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415020.6666666667, ans=0.1 2024-09-24 03:57:07,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=415114.0, ans=0.125 2024-09-24 03:57:07,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=415114.0, ans=0.125 2024-09-24 03:57:10,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=415114.0, ans=0.125 2024-09-24 03:57:13,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=415114.0, ans=0.025 2024-09-24 03:57:21,131 INFO [train.py:1198] (0/4) Epoch 23, batch 3250, loss[loss=0.2273, ctc_loss=0.1523, cr_loss=0.3748, over 17018.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1387, cr_loss=0.3543, over 3366552.57 frames. ], batch size: 52, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 03:57:24,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415160.6666666667, ans=0.1 2024-09-24 03:57:35,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=415207.3333333333, ans=0.2 2024-09-24 03:57:43,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=415207.3333333333, ans=0.125 2024-09-24 03:57:44,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=415207.3333333333, ans=0.125 2024-09-24 03:58:20,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=415300.6666666667, ans=0.125 2024-09-24 03:58:22,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415347.3333333333, ans=0.1 2024-09-24 03:58:23,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=415347.3333333333, ans=0.125 2024-09-24 03:58:36,029 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.250e+02 1.334e+02 1.506e+02 2.556e+02, threshold=2.668e+02, percent-clipped=0.0 2024-09-24 03:58:39,221 INFO [train.py:1198] (0/4) Epoch 23, batch 3300, loss[loss=0.2338, ctc_loss=0.1581, cr_loss=0.3784, over 16788.00 frames. ], tot_loss[loss=0.209, ctc_loss=0.1383, cr_loss=0.3537, over 3359658.94 frames. ], batch size: 61, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 03:58:42,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=415394.0, ans=0.125 2024-09-24 03:58:49,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=415394.0, ans=0.125 2024-09-24 03:58:49,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.63 vs. limit=10.0 2024-09-24 03:58:55,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415440.6666666667, ans=0.1 2024-09-24 03:58:55,408 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:59:04,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=415440.6666666667, ans=0.07 2024-09-24 03:59:39,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=415534.0, ans=0.125 2024-09-24 03:59:40,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=415580.6666666667, ans=0.2 2024-09-24 03:59:42,140 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 03:59:54,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=415580.6666666667, ans=0.1 2024-09-24 03:59:57,318 INFO [train.py:1198] (0/4) Epoch 23, batch 3350, loss[loss=0.2178, ctc_loss=0.1421, cr_loss=0.3786, over 17037.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.1394, cr_loss=0.3562, over 3356847.40 frames. ], batch size: 52, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 04:00:14,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=415674.0, ans=0.125 2024-09-24 04:00:22,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=415674.0, ans=0.0 2024-09-24 04:00:30,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2024-09-24 04:00:40,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415720.6666666667, ans=0.1 2024-09-24 04:00:47,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=415767.3333333333, ans=0.0 2024-09-24 04:00:56,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=415767.3333333333, ans=0.025 2024-09-24 04:01:04,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.41 vs. limit=12.0 2024-09-24 04:01:14,621 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.296e+02 1.415e+02 1.543e+02 2.020e+02, threshold=2.829e+02, percent-clipped=0.0 2024-09-24 04:01:17,789 INFO [train.py:1198] (0/4) Epoch 23, batch 3400, loss[loss=0.1763, ctc_loss=0.1162, cr_loss=0.3002, over 17103.00 frames. ], tot_loss[loss=0.2111, ctc_loss=0.1397, cr_loss=0.3571, over 3347860.15 frames. ], batch size: 40, lr: 5.17e-03, grad_scale: 32.0 2024-09-24 04:01:25,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=415860.6666666667, ans=0.025 2024-09-24 04:01:39,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=415907.3333333333, ans=0.125 2024-09-24 04:01:52,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=415954.0, ans=0.125 2024-09-24 04:02:12,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=416000.6666666667, ans=0.5 2024-09-24 04:02:19,307 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:02:32,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=416047.3333333333, ans=0.0 2024-09-24 04:02:35,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=22.5 2024-09-24 04:02:38,114 INFO [train.py:1198] (0/4) Epoch 23, batch 3450, loss[loss=0.2209, ctc_loss=0.1462, cr_loss=0.3737, over 17219.00 frames. ], tot_loss[loss=0.211, ctc_loss=0.1396, cr_loss=0.357, over 3348666.05 frames. ], batch size: 55, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:02:40,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=416094.0, ans=0.125 2024-09-24 04:02:55,768 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:03:29,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2024-09-24 04:03:53,801 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.237e+02 1.333e+02 1.427e+02 1.642e+02, threshold=2.667e+02, percent-clipped=0.0 2024-09-24 04:03:57,048 INFO [train.py:1198] (0/4) Epoch 23, batch 3500, loss[loss=0.1806, ctc_loss=0.1167, cr_loss=0.3194, over 17050.00 frames. ], tot_loss[loss=0.2102, ctc_loss=0.139, cr_loss=0.3559, over 3357466.29 frames. ], batch size: 39, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:04:15,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=416374.0, ans=0.025 2024-09-24 04:04:33,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=416420.6666666667, ans=0.125 2024-09-24 04:04:46,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=416467.3333333333, ans=0.125 2024-09-24 04:05:03,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416514.0, ans=0.1 2024-09-24 04:05:03,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=416514.0, ans=0.2 2024-09-24 04:05:17,347 INFO [train.py:1198] (0/4) Epoch 23, batch 3550, loss[loss=0.1774, ctc_loss=0.1136, cr_loss=0.3194, over 17019.00 frames. ], tot_loss[loss=0.21, ctc_loss=0.1388, cr_loss=0.356, over 3360831.55 frames. ], batch size: 39, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:05:28,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=416560.6666666667, ans=0.0 2024-09-24 04:05:36,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=416607.3333333333, ans=0.2 2024-09-24 04:05:42,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=416607.3333333333, ans=0.0 2024-09-24 04:05:49,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=416654.0, ans=0.0 2024-09-24 04:05:53,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=416654.0, ans=0.0 2024-09-24 04:06:24,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-24 04:06:34,282 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.260e+02 1.366e+02 1.469e+02 1.975e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 04:06:37,349 INFO [train.py:1198] (0/4) Epoch 23, batch 3600, loss[loss=0.2335, ctc_loss=0.1552, cr_loss=0.3915, over 17039.00 frames. ], tot_loss[loss=0.2094, ctc_loss=0.1384, cr_loss=0.3551, over 3368406.12 frames. ], batch size: 52, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:07:02,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=416840.6666666667, ans=0.2 2024-09-24 04:07:40,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=416980.6666666667, ans=22.5 2024-09-24 04:07:55,466 INFO [train.py:1198] (0/4) Epoch 23, batch 3650, loss[loss=0.2314, ctc_loss=0.1531, cr_loss=0.3914, over 17284.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1377, cr_loss=0.354, over 3369243.97 frames. ], batch size: 46, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:08:16,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2024-09-24 04:08:19,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=417074.0, ans=0.0 2024-09-24 04:08:31,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=417120.6666666667, ans=0.025 2024-09-24 04:08:33,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=417120.6666666667, ans=0.025 2024-09-24 04:09:11,670 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.268e+02 1.365e+02 1.526e+02 2.043e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-24 04:09:14,882 INFO [train.py:1198] (0/4) Epoch 23, batch 3700, loss[loss=0.1947, ctc_loss=0.1269, cr_loss=0.3391, over 17021.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1385, cr_loss=0.3551, over 3366980.11 frames. ], batch size: 44, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:09:18,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-24 04:09:32,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=417307.3333333333, ans=0.125 2024-09-24 04:09:33,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=417307.3333333333, ans=0.0 2024-09-24 04:09:41,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=417307.3333333333, ans=0.125 2024-09-24 04:10:12,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=417400.6666666667, ans=0.0 2024-09-24 04:10:24,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-09-24 04:10:33,640 INFO [train.py:1198] (0/4) Epoch 23, batch 3750, loss[loss=0.1774, ctc_loss=0.1173, cr_loss=0.3009, over 17294.00 frames. ], tot_loss[loss=0.2094, ctc_loss=0.1383, cr_loss=0.3553, over 3366695.30 frames. ], batch size: 46, lr: 5.16e-03, grad_scale: 32.0 2024-09-24 04:10:36,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=417494.0, ans=0.125 2024-09-24 04:11:03,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=417587.3333333333, ans=0.09899494936611666 2024-09-24 04:11:05,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=417587.3333333333, ans=0.0 2024-09-24 04:11:22,283 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:11:27,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=417634.0, ans=0.5 2024-09-24 04:11:33,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=417634.0, ans=0.2 2024-09-24 04:11:50,428 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.304e+02 1.374e+02 1.461e+02 1.993e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-24 04:11:53,566 INFO [train.py:1198] (0/4) Epoch 23, batch 3800, loss[loss=0.2416, ctc_loss=0.1633, cr_loss=0.3915, over 15376.00 frames. ], tot_loss[loss=0.21, ctc_loss=0.1388, cr_loss=0.3563, over 3360448.76 frames. ], batch size: 89, lr: 5.15e-03, grad_scale: 32.0 2024-09-24 04:11:58,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417727.3333333333, ans=0.1 2024-09-24 04:12:12,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=417774.0, ans=0.125 2024-09-24 04:12:25,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=417820.6666666667, ans=0.05 2024-09-24 04:12:25,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=417820.6666666667, ans=0.125 2024-09-24 04:12:38,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=417820.6666666667, ans=0.0 2024-09-24 04:12:38,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-24 04:13:12,143 INFO [train.py:1198] (0/4) Epoch 23, batch 3850, loss[loss=0.2162, ctc_loss=0.1439, cr_loss=0.3612, over 17011.00 frames. ], tot_loss[loss=0.2116, ctc_loss=0.1401, cr_loss=0.3574, over 3337563.52 frames. ], batch size: 51, lr: 5.15e-03, grad_scale: 32.0 2024-09-24 04:13:18,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=417960.6666666667, ans=0.125 2024-09-24 04:13:21,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=417960.6666666667, ans=0.09899494936611666 2024-09-24 04:14:05,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=418100.6666666667, ans=0.125 2024-09-24 04:14:22,352 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-23.pt 2024-09-24 04:15:15,635 INFO [train.py:1198] (0/4) Epoch 24, batch 0, loss[loss=0.1859, ctc_loss=0.1198, cr_loss=0.3302, over 16688.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1198, cr_loss=0.3302, over 16688.00 frames. ], batch size: 37, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:15:15,636 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 04:15:29,194 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3126, 3.4480, 2.7060, 3.3673], device='cuda:0') 2024-09-24 04:15:33,344 INFO [train.py:1230] (0/4) Epoch 24, validation: loss=0.03789, ctc_loss=0.03789, cr_loss=8.011e-15, over 944034.00 frames. 2024-09-24 04:15:33,345 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 04:15:36,594 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.212e+02 1.389e+02 1.528e+02 1.642e+02 3.495e+02, threshold=3.056e+02, percent-clipped=0.0 2024-09-24 04:15:40,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=418175.3333333333, ans=0.0 2024-09-24 04:15:57,637 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:16:27,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=418315.3333333333, ans=0.125 2024-09-24 04:16:31,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=418315.3333333333, ans=0.0 2024-09-24 04:16:34,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.97 vs. limit=22.5 2024-09-24 04:16:53,367 INFO [train.py:1198] (0/4) Epoch 24, batch 50, loss[loss=0.2517, ctc_loss=0.1679, cr_loss=0.4192, over 17028.00 frames. ], tot_loss[loss=0.2132, ctc_loss=0.1406, cr_loss=0.3628, over 759623.74 frames. ], batch size: 52, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:17:01,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418408.6666666667, ans=0.1 2024-09-24 04:17:18,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-24 04:17:32,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=418502.0, ans=0.125 2024-09-24 04:17:44,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-09-24 04:17:56,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=418548.6666666667, ans=0.0 2024-09-24 04:18:15,659 INFO [train.py:1198] (0/4) Epoch 24, batch 100, loss[loss=0.2379, ctc_loss=0.1602, cr_loss=0.3886, over 17002.00 frames. ], tot_loss[loss=0.2107, ctc_loss=0.139, cr_loss=0.3585, over 1344562.66 frames. ], batch size: 56, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:18:18,832 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.264e+02 1.318e+02 1.445e+02 2.140e+02, threshold=2.636e+02, percent-clipped=1.0 2024-09-24 04:18:35,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=418688.6666666667, ans=0.125 2024-09-24 04:18:43,597 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=22.5 2024-09-24 04:18:53,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=418735.3333333333, ans=0.125 2024-09-24 04:19:12,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2024-09-24 04:19:15,637 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-24 04:19:38,293 INFO [train.py:1198] (0/4) Epoch 24, batch 150, loss[loss=0.1963, ctc_loss=0.1273, cr_loss=0.345, over 17021.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1389, cr_loss=0.3588, over 1795134.30 frames. ], batch size: 44, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:19:54,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-09-24 04:19:56,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=418922.0, ans=0.05 2024-09-24 04:20:13,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=418968.6666666667, ans=0.125 2024-09-24 04:20:17,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-24 04:20:25,346 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-24 04:21:00,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=419062.0, ans=0.125 2024-09-24 04:21:03,916 INFO [train.py:1198] (0/4) Epoch 24, batch 200, loss[loss=0.2181, ctc_loss=0.1466, cr_loss=0.3575, over 17146.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1382, cr_loss=0.3563, over 2144883.32 frames. ], batch size: 45, lr: 5.04e-03, grad_scale: 32.0 2024-09-24 04:21:07,013 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.247e+02 1.339e+02 1.438e+02 1.930e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 04:22:26,695 INFO [train.py:1198] (0/4) Epoch 24, batch 250, loss[loss=0.2026, ctc_loss=0.1326, cr_loss=0.35, over 17100.00 frames. ], tot_loss[loss=0.2086, ctc_loss=0.1376, cr_loss=0.355, over 2419189.58 frames. ], batch size: 49, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:22:36,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=419342.0, ans=0.0 2024-09-24 04:22:41,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419388.6666666667, ans=0.1 2024-09-24 04:22:57,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=419435.3333333333, ans=0.125 2024-09-24 04:23:42,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=419528.6666666667, ans=22.5 2024-09-24 04:23:46,170 INFO [train.py:1198] (0/4) Epoch 24, batch 300, loss[loss=0.1759, ctc_loss=0.1139, cr_loss=0.3098, over 17071.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1362, cr_loss=0.3523, over 2628469.93 frames. ], batch size: 43, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:23:49,254 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.252e+02 1.369e+02 1.490e+02 2.368e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-24 04:24:02,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=419622.0, ans=0.125 2024-09-24 04:24:13,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=22.5 2024-09-24 04:24:14,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=419622.0, ans=0.0 2024-09-24 04:24:24,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=419668.6666666667, ans=0.07 2024-09-24 04:25:08,436 INFO [train.py:1198] (0/4) Epoch 24, batch 350, loss[loss=0.2143, ctc_loss=0.1417, cr_loss=0.3626, over 16702.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1357, cr_loss=0.351, over 2798524.81 frames. ], batch size: 61, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:25:12,474 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=12.0 2024-09-24 04:25:16,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=419808.6666666667, ans=0.125 2024-09-24 04:25:20,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=419808.6666666667, ans=0.125 2024-09-24 04:25:25,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=419855.3333333333, ans=0.0 2024-09-24 04:25:44,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=419902.0, ans=0.125 2024-09-24 04:26:34,306 INFO [train.py:1198] (0/4) Epoch 24, batch 400, loss[loss=0.2099, ctc_loss=0.1381, cr_loss=0.3589, over 17215.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1356, cr_loss=0.3512, over 2935316.71 frames. ], batch size: 47, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:26:37,599 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.274e+02 1.350e+02 1.482e+02 1.874e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 04:26:53,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.10 vs. limit=15.0 2024-09-24 04:26:55,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=420088.6666666667, ans=0.125 2024-09-24 04:27:06,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=420088.6666666667, ans=0.1 2024-09-24 04:27:10,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.00 vs. limit=10.0 2024-09-24 04:27:11,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420135.3333333333, ans=0.1 2024-09-24 04:27:17,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420135.3333333333, ans=0.1 2024-09-24 04:27:20,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=420135.3333333333, ans=0.05 2024-09-24 04:27:38,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=420182.0, ans=0.125 2024-09-24 04:27:49,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=420228.6666666667, ans=0.04949747468305833 2024-09-24 04:27:57,684 INFO [train.py:1198] (0/4) Epoch 24, batch 450, loss[loss=0.2198, ctc_loss=0.1469, cr_loss=0.3646, over 17022.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1372, cr_loss=0.3538, over 3024522.92 frames. ], batch size: 51, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:28:09,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=15.0 2024-09-24 04:28:27,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2024-09-24 04:28:32,094 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:29:07,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=420462.0, ans=0.125 2024-09-24 04:29:13,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420462.0, ans=0.1 2024-09-24 04:29:16,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=420508.6666666667, ans=0.125 2024-09-24 04:29:17,789 INFO [train.py:1198] (0/4) Epoch 24, batch 500, loss[loss=0.2862, ctc_loss=0.2064, cr_loss=0.3986, over 11877.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.1371, cr_loss=0.3532, over 3106435.03 frames. ], batch size: 123, lr: 5.03e-03, grad_scale: 32.0 2024-09-24 04:29:21,053 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.236e+02 1.308e+02 1.375e+02 2.594e+02, threshold=2.616e+02, percent-clipped=0.0 2024-09-24 04:29:49,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=420555.3333333333, ans=0.125 2024-09-24 04:29:52,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.45 vs. limit=15.0 2024-09-24 04:30:00,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=420602.0, ans=0.0 2024-09-24 04:30:37,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=420695.3333333333, ans=0.2 2024-09-24 04:30:45,511 INFO [train.py:1198] (0/4) Epoch 24, batch 550, loss[loss=0.1854, ctc_loss=0.1193, cr_loss=0.3302, over 17158.00 frames. ], tot_loss[loss=0.2072, ctc_loss=0.1365, cr_loss=0.3531, over 3163086.12 frames. ], batch size: 41, lr: 5.03e-03, grad_scale: 16.0 2024-09-24 04:31:07,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2024-09-24 04:31:10,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2024-09-24 04:31:13,326 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:31:44,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420882.0, ans=0.1 2024-09-24 04:32:08,521 INFO [train.py:1198] (0/4) Epoch 24, batch 600, loss[loss=0.1719, ctc_loss=0.1151, cr_loss=0.2842, over 16305.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1362, cr_loss=0.3516, over 3209854.99 frames. ], batch size: 36, lr: 5.02e-03, grad_scale: 16.0 2024-09-24 04:32:13,194 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.249e+02 1.314e+02 1.426e+02 3.030e+02, threshold=2.628e+02, percent-clipped=1.0 2024-09-24 04:32:26,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=421022.0, ans=0.0 2024-09-24 04:32:27,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=421022.0, ans=0.0 2024-09-24 04:32:34,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2024-09-24 04:32:42,610 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=22.5 2024-09-24 04:33:01,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=421115.3333333333, ans=0.125 2024-09-24 04:33:06,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=421115.3333333333, ans=0.2 2024-09-24 04:33:20,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=421162.0, ans=0.025 2024-09-24 04:33:25,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.60 vs. limit=5.0 2024-09-24 04:33:28,603 INFO [train.py:1198] (0/4) Epoch 24, batch 650, loss[loss=0.1641, ctc_loss=0.1076, cr_loss=0.2827, over 16981.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1369, cr_loss=0.3529, over 3232528.08 frames. ], batch size: 42, lr: 5.02e-03, grad_scale: 16.0 2024-09-24 04:33:33,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=421208.6666666667, ans=0.0 2024-09-24 04:33:46,634 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:33:49,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=421255.3333333333, ans=0.0 2024-09-24 04:34:01,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=421302.0, ans=0.1 2024-09-24 04:34:12,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=421302.0, ans=0.2 2024-09-24 04:34:13,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=421302.0, ans=0.125 2024-09-24 04:34:21,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=421348.6666666667, ans=0.125 2024-09-24 04:34:51,512 INFO [train.py:1198] (0/4) Epoch 24, batch 700, loss[loss=0.1993, ctc_loss=0.1343, cr_loss=0.3249, over 17360.00 frames. ], tot_loss[loss=0.2084, ctc_loss=0.1376, cr_loss=0.354, over 3253774.91 frames. ], batch size: 48, lr: 5.02e-03, grad_scale: 16.0 2024-09-24 04:34:55,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=421442.0, ans=0.2 2024-09-24 04:34:56,334 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.315e+02 1.415e+02 1.543e+02 2.275e+02, threshold=2.830e+02, percent-clipped=0.0 2024-09-24 04:34:56,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=421442.0, ans=0.125 2024-09-24 04:35:28,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=421535.3333333333, ans=0.0 2024-09-24 04:36:05,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421628.6666666667, ans=0.1 2024-09-24 04:36:14,460 INFO [train.py:1198] (0/4) Epoch 24, batch 750, loss[loss=0.2288, ctc_loss=0.1548, cr_loss=0.3699, over 17316.00 frames. ], tot_loss[loss=0.2088, ctc_loss=0.1379, cr_loss=0.3545, over 3281057.07 frames. ], batch size: 51, lr: 5.02e-03, grad_scale: 16.0 2024-09-24 04:36:16,326 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:36:16,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=421675.3333333333, ans=0.0 2024-09-24 04:37:12,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2024-09-24 04:37:34,535 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:37:37,164 INFO [train.py:1198] (0/4) Epoch 24, batch 800, loss[loss=0.2265, ctc_loss=0.1479, cr_loss=0.3927, over 17083.00 frames. ], tot_loss[loss=0.2086, ctc_loss=0.1378, cr_loss=0.3537, over 3290701.63 frames. ], batch size: 49, lr: 5.02e-03, grad_scale: 32.0 2024-09-24 04:37:41,990 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.296e+02 1.355e+02 1.500e+02 2.153e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-24 04:37:51,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=421955.3333333333, ans=0.125 2024-09-24 04:37:52,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2024-09-24 04:37:55,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=421955.3333333333, ans=0.125 2024-09-24 04:38:40,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=22.5 2024-09-24 04:38:52,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422095.3333333333, ans=0.1 2024-09-24 04:38:57,343 INFO [train.py:1198] (0/4) Epoch 24, batch 850, loss[loss=0.1899, ctc_loss=0.1232, cr_loss=0.3335, over 17245.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.1373, cr_loss=0.3522, over 3297361.23 frames. ], batch size: 44, lr: 5.02e-03, grad_scale: 32.0 2024-09-24 04:39:28,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=422188.6666666667, ans=0.0 2024-09-24 04:39:34,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=22.5 2024-09-24 04:39:39,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=422235.3333333333, ans=0.0 2024-09-24 04:39:42,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=422235.3333333333, ans=0.125 2024-09-24 04:39:46,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=422282.0, ans=0.09899494936611666 2024-09-24 04:39:54,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-09-24 04:39:55,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=422282.0, ans=0.2 2024-09-24 04:39:59,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.99 vs. limit=22.5 2024-09-24 04:40:25,046 INFO [train.py:1198] (0/4) Epoch 24, batch 900, loss[loss=0.2309, ctc_loss=0.1527, cr_loss=0.3908, over 17213.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1375, cr_loss=0.3526, over 3302355.89 frames. ], batch size: 50, lr: 5.02e-03, grad_scale: 32.0 2024-09-24 04:40:29,775 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.271e+02 1.382e+02 1.510e+02 2.333e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-24 04:40:33,977 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.53 vs. limit=10.0 2024-09-24 04:40:39,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=422422.0, ans=0.125 2024-09-24 04:40:55,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=422468.6666666667, ans=0.09899494936611666 2024-09-24 04:41:13,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=422515.3333333333, ans=0.125 2024-09-24 04:41:18,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-24 04:41:34,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=422562.0, ans=0.2 2024-09-24 04:41:45,283 INFO [train.py:1198] (0/4) Epoch 24, batch 950, loss[loss=0.2274, ctc_loss=0.1487, cr_loss=0.3934, over 17243.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1374, cr_loss=0.3526, over 3317124.89 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:42:23,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422702.0, ans=0.1 2024-09-24 04:42:33,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=422702.0, ans=0.2 2024-09-24 04:43:00,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=422795.3333333333, ans=0.05 2024-09-24 04:43:08,570 INFO [train.py:1198] (0/4) Epoch 24, batch 1000, loss[loss=0.1885, ctc_loss=0.1227, cr_loss=0.329, over 17277.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.1373, cr_loss=0.3525, over 3324133.50 frames. ], batch size: 42, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:43:11,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=15.0 2024-09-24 04:43:13,212 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.311e+02 1.412e+02 1.541e+02 1.926e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-24 04:43:29,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422888.6666666667, ans=0.125 2024-09-24 04:43:48,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=422935.3333333333, ans=0.0 2024-09-24 04:43:58,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=12.0 2024-09-24 04:44:18,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=423028.6666666667, ans=0.0 2024-09-24 04:44:30,831 INFO [train.py:1198] (0/4) Epoch 24, batch 1050, loss[loss=0.1794, ctc_loss=0.1152, cr_loss=0.321, over 17027.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1369, cr_loss=0.3519, over 3336602.95 frames. ], batch size: 51, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:44:34,319 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:44:45,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=423122.0, ans=0.0 2024-09-24 04:44:56,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=12.0 2024-09-24 04:45:38,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-24 04:45:56,523 INFO [train.py:1198] (0/4) Epoch 24, batch 1100, loss[loss=0.2169, ctc_loss=0.1475, cr_loss=0.3469, over 17012.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1367, cr_loss=0.3512, over 3348815.30 frames. ], batch size: 56, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:45:59,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=423308.6666666667, ans=0.0 2024-09-24 04:46:01,274 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.919e+01 1.239e+02 1.343e+02 1.468e+02 1.769e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-24 04:46:03,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=423308.6666666667, ans=0.2 2024-09-24 04:46:08,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=423308.6666666667, ans=0.125 2024-09-24 04:46:20,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=423355.3333333333, ans=0.0 2024-09-24 04:46:23,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423355.3333333333, ans=0.1 2024-09-24 04:46:25,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-24 04:46:46,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.21 vs. limit=22.5 2024-09-24 04:46:59,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=423448.6666666667, ans=0.025 2024-09-24 04:47:11,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=423495.3333333333, ans=0.1 2024-09-24 04:47:18,699 INFO [train.py:1198] (0/4) Epoch 24, batch 1150, loss[loss=0.2154, ctc_loss=0.144, cr_loss=0.3571, over 15943.00 frames. ], tot_loss[loss=0.2076, ctc_loss=0.1372, cr_loss=0.3521, over 3351071.71 frames. ], batch size: 74, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:47:23,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=423542.0, ans=0.2 2024-09-24 04:47:31,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=423542.0, ans=0.125 2024-09-24 04:47:31,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=423542.0, ans=0.125 2024-09-24 04:47:42,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423588.6666666667, ans=0.1 2024-09-24 04:48:06,017 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-09-24 04:48:07,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=423682.0, ans=0.0 2024-09-24 04:48:20,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=423682.0, ans=0.125 2024-09-24 04:48:39,273 INFO [train.py:1198] (0/4) Epoch 24, batch 1200, loss[loss=0.2283, ctc_loss=0.1495, cr_loss=0.394, over 17223.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1374, cr_loss=0.3527, over 3353276.10 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:48:44,027 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.248e+02 1.325e+02 1.416e+02 2.562e+02, threshold=2.650e+02, percent-clipped=0.0 2024-09-24 04:48:52,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=423775.3333333333, ans=0.125 2024-09-24 04:48:54,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=22.5 2024-09-24 04:48:57,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=423822.0, ans=0.0 2024-09-24 04:49:12,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=423868.6666666667, ans=0.125 2024-09-24 04:49:20,464 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:49:31,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=423915.3333333333, ans=0.2 2024-09-24 04:49:37,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=423915.3333333333, ans=0.2 2024-09-24 04:49:39,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2024-09-24 04:49:47,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=423962.0, ans=0.025 2024-09-24 04:50:06,369 INFO [train.py:1198] (0/4) Epoch 24, batch 1250, loss[loss=0.2223, ctc_loss=0.1467, cr_loss=0.3775, over 17352.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1367, cr_loss=0.3512, over 3354919.58 frames. ], batch size: 48, lr: 5.01e-03, grad_scale: 32.0 2024-09-24 04:50:10,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=22.5 2024-09-24 04:50:26,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-09-24 04:50:32,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=424055.3333333333, ans=0.125 2024-09-24 04:50:40,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=424102.0, ans=0.125 2024-09-24 04:50:40,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=424102.0, ans=0.125 2024-09-24 04:50:43,521 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:50:58,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=424148.6666666667, ans=0.0 2024-09-24 04:51:03,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=424148.6666666667, ans=0.125 2024-09-24 04:51:03,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=424148.6666666667, ans=0.0 2024-09-24 04:51:11,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=424195.3333333333, ans=0.125 2024-09-24 04:51:11,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2024-09-24 04:51:26,793 INFO [train.py:1198] (0/4) Epoch 24, batch 1300, loss[loss=0.2122, ctc_loss=0.1403, cr_loss=0.3597, over 17213.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1363, cr_loss=0.3507, over 3355693.59 frames. ], batch size: 47, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:51:31,563 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.033e+02 1.254e+02 1.331e+02 1.449e+02 1.850e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-24 04:51:36,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=424242.0, ans=0.125 2024-09-24 04:51:37,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.73 vs. limit=15.0 2024-09-24 04:52:28,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=424382.0, ans=10.0 2024-09-24 04:52:39,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=424428.6666666667, ans=0.125 2024-09-24 04:52:49,354 INFO [train.py:1198] (0/4) Epoch 24, batch 1350, loss[loss=0.225, ctc_loss=0.1541, cr_loss=0.3544, over 17022.00 frames. ], tot_loss[loss=0.2072, ctc_loss=0.1369, cr_loss=0.3515, over 3361117.16 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:52:55,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=424475.3333333333, ans=0.125 2024-09-24 04:53:01,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.28 vs. limit=22.5 2024-09-24 04:53:06,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2024-09-24 04:54:01,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=424662.0, ans=0.125 2024-09-24 04:54:11,954 INFO [train.py:1198] (0/4) Epoch 24, batch 1400, loss[loss=0.1952, ctc_loss=0.1283, cr_loss=0.3345, over 17359.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1365, cr_loss=0.3511, over 3357295.04 frames. ], batch size: 48, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:54:12,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=424708.6666666667, ans=0.125 2024-09-24 04:54:16,812 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.257e+02 1.358e+02 1.495e+02 1.831e+02, threshold=2.716e+02, percent-clipped=0.0 2024-09-24 04:54:20,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-09-24 04:54:38,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2024-09-24 04:54:39,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=424755.3333333333, ans=0.125 2024-09-24 04:54:46,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=424802.0, ans=0.0 2024-09-24 04:54:55,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=22.5 2024-09-24 04:54:58,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424802.0, ans=0.1 2024-09-24 04:55:08,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=424848.6666666667, ans=0.125 2024-09-24 04:55:21,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=424895.3333333333, ans=0.125 2024-09-24 04:55:37,531 INFO [train.py:1198] (0/4) Epoch 24, batch 1450, loss[loss=0.2156, ctc_loss=0.1427, cr_loss=0.3649, over 17359.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.1362, cr_loss=0.3505, over 3348518.59 frames. ], batch size: 48, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:55:42,779 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 04:55:49,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2024-09-24 04:56:05,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=424988.6666666667, ans=0.0 2024-09-24 04:56:14,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=425035.3333333333, ans=0.125 2024-09-24 04:56:32,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=425082.0, ans=0.2 2024-09-24 04:56:59,699 INFO [train.py:1198] (0/4) Epoch 24, batch 1500, loss[loss=0.1997, ctc_loss=0.1309, cr_loss=0.3437, over 16968.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1366, cr_loss=0.3508, over 3335681.03 frames. ], batch size: 42, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:57:04,518 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.246e+02 1.334e+02 1.449e+02 2.075e+02, threshold=2.667e+02, percent-clipped=0.0 2024-09-24 04:57:04,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425175.3333333333, ans=0.1 2024-09-24 04:57:17,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=425222.0, ans=0.0 2024-09-24 04:57:27,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=425222.0, ans=0.025 2024-09-24 04:57:30,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-09-24 04:57:48,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=425315.3333333333, ans=0.125 2024-09-24 04:57:57,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=425315.3333333333, ans=0.0 2024-09-24 04:58:17,192 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=22.5 2024-09-24 04:58:19,809 INFO [train.py:1198] (0/4) Epoch 24, batch 1550, loss[loss=0.221, ctc_loss=0.1432, cr_loss=0.3888, over 17027.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1357, cr_loss=0.3499, over 3342467.06 frames. ], batch size: 51, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:58:39,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=425455.3333333333, ans=0.125 2024-09-24 04:58:44,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=425455.3333333333, ans=0.025 2024-09-24 04:59:08,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=425548.6666666667, ans=0.0 2024-09-24 04:59:16,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=425548.6666666667, ans=0.05 2024-09-24 04:59:28,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425595.3333333333, ans=0.1 2024-09-24 04:59:42,280 INFO [train.py:1198] (0/4) Epoch 24, batch 1600, loss[loss=0.2412, ctc_loss=0.1626, cr_loss=0.3926, over 17207.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1363, cr_loss=0.3512, over 3352656.94 frames. ], batch size: 55, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 04:59:47,009 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.230e+02 1.386e+02 1.499e+02 2.034e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-24 04:59:54,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=425642.0, ans=0.1 2024-09-24 05:00:05,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-24 05:00:22,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=425735.3333333333, ans=0.125 2024-09-24 05:00:27,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=22.5 2024-09-24 05:00:29,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=425735.3333333333, ans=0.0 2024-09-24 05:00:35,346 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:00:56,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=425828.6666666667, ans=0.125 2024-09-24 05:01:07,191 INFO [train.py:1198] (0/4) Epoch 24, batch 1650, loss[loss=0.2331, ctc_loss=0.1548, cr_loss=0.3912, over 17266.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1358, cr_loss=0.3509, over 3353262.21 frames. ], batch size: 55, lr: 5.00e-03, grad_scale: 32.0 2024-09-24 05:01:42,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=425968.6666666667, ans=0.0 2024-09-24 05:02:10,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=426015.3333333333, ans=0.125 2024-09-24 05:02:18,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=426062.0, ans=0.2 2024-09-24 05:02:29,580 INFO [train.py:1198] (0/4) Epoch 24, batch 1700, loss[loss=0.2273, ctc_loss=0.1525, cr_loss=0.3741, over 16766.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1357, cr_loss=0.3507, over 3361755.03 frames. ], batch size: 61, lr: 4.99e-03, grad_scale: 32.0 2024-09-24 05:02:34,431 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.248e+02 1.319e+02 1.421e+02 3.276e+02, threshold=2.637e+02, percent-clipped=2.0 2024-09-24 05:02:38,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2024-09-24 05:02:40,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=426108.6666666667, ans=0.015 2024-09-24 05:03:15,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=426202.0, ans=0.0 2024-09-24 05:03:22,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2024-09-24 05:03:22,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=15.0 2024-09-24 05:03:31,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=426248.6666666667, ans=0.0 2024-09-24 05:03:33,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=426295.3333333333, ans=0.2 2024-09-24 05:03:41,962 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2024-09-24 05:03:50,636 INFO [train.py:1198] (0/4) Epoch 24, batch 1750, loss[loss=0.1919, ctc_loss=0.1243, cr_loss=0.338, over 17067.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1361, cr_loss=0.3518, over 3372849.97 frames. ], batch size: 39, lr: 4.99e-03, grad_scale: 32.0 2024-09-24 05:03:50,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=426342.0, ans=0.5 2024-09-24 05:03:55,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=426342.0, ans=0.125 2024-09-24 05:04:03,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=426342.0, ans=0.2 2024-09-24 05:04:36,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=426435.3333333333, ans=0.125 2024-09-24 05:04:49,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=426482.0, ans=0.125 2024-09-24 05:04:49,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=426482.0, ans=0.0 2024-09-24 05:05:17,717 INFO [train.py:1198] (0/4) Epoch 24, batch 1800, loss[loss=0.169, ctc_loss=0.108, cr_loss=0.3052, over 16957.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1357, cr_loss=0.3522, over 3377294.04 frames. ], batch size: 42, lr: 4.99e-03, grad_scale: 32.0 2024-09-24 05:05:22,469 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.235e+02 1.322e+02 1.422e+02 1.827e+02, threshold=2.643e+02, percent-clipped=0.0 2024-09-24 05:05:54,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=426668.6666666667, ans=0.025 2024-09-24 05:06:17,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=426715.3333333333, ans=0.0 2024-09-24 05:06:25,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=426762.0, ans=0.2 2024-09-24 05:06:37,697 INFO [train.py:1198] (0/4) Epoch 24, batch 1850, loss[loss=0.2034, ctc_loss=0.1339, cr_loss=0.3472, over 17158.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1362, cr_loss=0.3533, over 3383293.38 frames. ], batch size: 45, lr: 4.99e-03, grad_scale: 32.0 2024-09-24 05:06:41,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=426808.6666666667, ans=0.0 2024-09-24 05:06:44,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=426808.6666666667, ans=0.125 2024-09-24 05:06:47,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=426808.6666666667, ans=0.09899494936611666 2024-09-24 05:06:59,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=426855.3333333333, ans=0.0 2024-09-24 05:07:03,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=22.5 2024-09-24 05:07:04,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=426855.3333333333, ans=0.125 2024-09-24 05:07:15,745 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:07:47,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=426995.3333333333, ans=0.125 2024-09-24 05:08:00,018 INFO [train.py:1198] (0/4) Epoch 24, batch 1900, loss[loss=0.162, ctc_loss=0.1012, cr_loss=0.304, over 16793.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1372, cr_loss=0.3547, over 3382611.17 frames. ], batch size: 37, lr: 4.99e-03, grad_scale: 16.0 2024-09-24 05:08:06,244 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.237e+02 1.306e+02 1.387e+02 1.778e+02, threshold=2.611e+02, percent-clipped=0.0 2024-09-24 05:08:12,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=427042.0, ans=0.125 2024-09-24 05:08:19,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-09-24 05:08:28,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427088.6666666667, ans=0.1 2024-09-24 05:08:39,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-09-24 05:09:22,692 INFO [train.py:1198] (0/4) Epoch 24, batch 1950, loss[loss=0.2317, ctc_loss=0.1545, cr_loss=0.3862, over 16998.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1379, cr_loss=0.3565, over 3378405.95 frames. ], batch size: 53, lr: 4.99e-03, grad_scale: 16.0 2024-09-24 05:09:51,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=427322.0, ans=0.125 2024-09-24 05:10:25,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=427415.3333333333, ans=0.125 2024-09-24 05:10:32,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427462.0, ans=0.1 2024-09-24 05:10:47,724 INFO [train.py:1198] (0/4) Epoch 24, batch 2000, loss[loss=0.2079, ctc_loss=0.1379, cr_loss=0.3499, over 17354.00 frames. ], tot_loss[loss=0.209, ctc_loss=0.1378, cr_loss=0.3557, over 3375551.86 frames. ], batch size: 48, lr: 4.99e-03, grad_scale: 16.0 2024-09-24 05:10:51,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427508.6666666667, ans=0.1 2024-09-24 05:10:55,710 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.252e+02 1.338e+02 1.434e+02 1.849e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 05:11:02,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-09-24 05:11:55,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=427695.3333333333, ans=0.2 2024-09-24 05:12:09,511 INFO [train.py:1198] (0/4) Epoch 24, batch 2050, loss[loss=0.2087, ctc_loss=0.1411, cr_loss=0.3382, over 17050.00 frames. ], tot_loss[loss=0.2094, ctc_loss=0.1382, cr_loss=0.3564, over 3377412.80 frames. ], batch size: 46, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:12:21,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.80 vs. limit=15.0 2024-09-24 05:12:46,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=427835.3333333333, ans=0.125 2024-09-24 05:12:49,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=427835.3333333333, ans=0.125 2024-09-24 05:12:53,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2024-09-24 05:13:00,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=427882.0, ans=0.2 2024-09-24 05:13:21,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=427928.6666666667, ans=0.125 2024-09-24 05:13:29,447 INFO [train.py:1198] (0/4) Epoch 24, batch 2100, loss[loss=0.1728, ctc_loss=0.1102, cr_loss=0.3131, over 17061.00 frames. ], tot_loss[loss=0.2095, ctc_loss=0.1383, cr_loss=0.3559, over 3377249.81 frames. ], batch size: 39, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:13:37,479 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.278e+02 1.335e+02 1.481e+02 2.167e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 05:13:37,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=427975.3333333333, ans=0.015 2024-09-24 05:13:37,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427975.3333333333, ans=0.1 2024-09-24 05:14:15,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=428068.6666666667, ans=0.125 2024-09-24 05:14:22,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=428115.3333333333, ans=0.0 2024-09-24 05:14:25,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=428115.3333333333, ans=0.125 2024-09-24 05:14:30,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=428115.3333333333, ans=0.125 2024-09-24 05:14:38,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=428162.0, ans=0.0 2024-09-24 05:14:54,829 INFO [train.py:1198] (0/4) Epoch 24, batch 2150, loss[loss=0.1714, ctc_loss=0.1114, cr_loss=0.3001, over 17106.00 frames. ], tot_loss[loss=0.2096, ctc_loss=0.1384, cr_loss=0.3561, over 3375911.38 frames. ], batch size: 40, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:14:59,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=428208.6666666667, ans=0.0 2024-09-24 05:15:02,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=428208.6666666667, ans=0.0 2024-09-24 05:15:02,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=428208.6666666667, ans=0.125 2024-09-24 05:15:09,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2024-09-24 05:15:15,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-09-24 05:15:30,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-24 05:15:45,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=428348.6666666667, ans=0.125 2024-09-24 05:16:15,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2024-09-24 05:16:17,796 INFO [train.py:1198] (0/4) Epoch 24, batch 2200, loss[loss=0.2225, ctc_loss=0.1458, cr_loss=0.3834, over 17230.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.1388, cr_loss=0.3559, over 3361808.81 frames. ], batch size: 47, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:16:25,774 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.245e+02 1.329e+02 1.411e+02 2.102e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-24 05:16:48,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=428535.3333333333, ans=0.2 2024-09-24 05:17:10,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=428582.0, ans=0.125 2024-09-24 05:17:15,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=428582.0, ans=0.0 2024-09-24 05:17:18,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=428582.0, ans=0.0 2024-09-24 05:17:18,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428582.0, ans=0.1 2024-09-24 05:17:40,980 INFO [train.py:1198] (0/4) Epoch 24, batch 2250, loss[loss=0.2195, ctc_loss=0.1457, cr_loss=0.3687, over 17056.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1383, cr_loss=0.3547, over 3354502.10 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:17:41,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=428675.3333333333, ans=0.95 2024-09-24 05:17:56,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=428722.0, ans=0.1 2024-09-24 05:18:00,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428722.0, ans=0.1 2024-09-24 05:18:37,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=428815.3333333333, ans=0.2 2024-09-24 05:18:56,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=428862.0, ans=0.125 2024-09-24 05:19:00,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=22.5 2024-09-24 05:19:01,268 INFO [train.py:1198] (0/4) Epoch 24, batch 2300, loss[loss=0.1885, ctc_loss=0.125, cr_loss=0.3174, over 17063.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1374, cr_loss=0.3541, over 3358135.43 frames. ], batch size: 46, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:19:08,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.33 vs. limit=22.5 2024-09-24 05:19:11,774 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.244e+02 1.345e+02 1.429e+02 2.009e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-24 05:19:37,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=429002.0, ans=0.125 2024-09-24 05:20:14,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=429095.3333333333, ans=0.0 2024-09-24 05:20:28,881 INFO [train.py:1198] (0/4) Epoch 24, batch 2350, loss[loss=0.219, ctc_loss=0.1436, cr_loss=0.3771, over 17057.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1372, cr_loss=0.3543, over 3367283.42 frames. ], batch size: 52, lr: 4.98e-03, grad_scale: 16.0 2024-09-24 05:21:19,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=429282.0, ans=0.125 2024-09-24 05:21:20,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=429282.0, ans=0.125 2024-09-24 05:21:33,626 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-92000.pt 2024-09-24 05:21:54,431 INFO [train.py:1198] (0/4) Epoch 24, batch 2400, loss[loss=0.1977, ctc_loss=0.1304, cr_loss=0.3364, over 16940.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1364, cr_loss=0.3522, over 3362099.03 frames. ], batch size: 42, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:22:02,373 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.270e+02 1.408e+02 1.564e+02 2.432e+02, threshold=2.816e+02, percent-clipped=0.0 2024-09-24 05:22:05,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=429375.3333333333, ans=0.09899494936611666 2024-09-24 05:22:23,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=429422.0, ans=0.125 2024-09-24 05:22:39,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=429468.6666666667, ans=0.025 2024-09-24 05:22:41,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=429515.3333333333, ans=0.05 2024-09-24 05:22:49,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-09-24 05:22:54,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=429515.3333333333, ans=0.125 2024-09-24 05:23:06,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=429562.0, ans=0.2 2024-09-24 05:23:14,641 INFO [train.py:1198] (0/4) Epoch 24, batch 2450, loss[loss=0.1772, ctc_loss=0.1128, cr_loss=0.3219, over 17274.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1365, cr_loss=0.3524, over 3363442.65 frames. ], batch size: 42, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:23:56,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=429702.0, ans=0.125 2024-09-24 05:23:58,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=429702.0, ans=0.0 2024-09-24 05:24:14,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=429748.6666666667, ans=0.125 2024-09-24 05:24:37,353 INFO [train.py:1198] (0/4) Epoch 24, batch 2500, loss[loss=0.1828, ctc_loss=0.1188, cr_loss=0.32, over 17068.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1374, cr_loss=0.3536, over 3356809.54 frames. ], batch size: 46, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:24:48,051 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.250e+02 1.362e+02 1.457e+02 2.366e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-24 05:24:57,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=429888.6666666667, ans=0.125 2024-09-24 05:25:08,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=429888.6666666667, ans=0.125 2024-09-24 05:25:10,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=429888.6666666667, ans=0.125 2024-09-24 05:25:11,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=429888.6666666667, ans=0.0 2024-09-24 05:25:56,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=430028.6666666667, ans=0.125 2024-09-24 05:26:02,840 INFO [train.py:1198] (0/4) Epoch 24, batch 2550, loss[loss=0.1994, ctc_loss=0.1286, cr_loss=0.354, over 17256.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1363, cr_loss=0.3521, over 3362246.51 frames. ], batch size: 44, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:26:09,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=430075.3333333333, ans=0.0 2024-09-24 05:26:44,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=430168.6666666667, ans=0.125 2024-09-24 05:26:57,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=430215.3333333333, ans=0.04949747468305833 2024-09-24 05:27:13,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=430262.0, ans=0.125 2024-09-24 05:27:24,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=430308.6666666667, ans=0.2 2024-09-24 05:27:25,869 INFO [train.py:1198] (0/4) Epoch 24, batch 2600, loss[loss=0.197, ctc_loss=0.1278, cr_loss=0.3459, over 17255.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.136, cr_loss=0.3519, over 3363629.26 frames. ], batch size: 44, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:27:26,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=430308.6666666667, ans=0.2 2024-09-24 05:27:27,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=430308.6666666667, ans=0.125 2024-09-24 05:27:29,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2024-09-24 05:27:33,811 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.253e+02 1.345e+02 1.495e+02 2.149e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-24 05:27:34,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=430308.6666666667, ans=0.2 2024-09-24 05:28:38,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=430495.3333333333, ans=0.125 2024-09-24 05:28:44,974 INFO [train.py:1198] (0/4) Epoch 24, batch 2650, loss[loss=0.2084, ctc_loss=0.1382, cr_loss=0.351, over 17119.00 frames. ], tot_loss[loss=0.2079, ctc_loss=0.1372, cr_loss=0.3537, over 3339077.84 frames. ], batch size: 43, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:28:50,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=430542.0, ans=0.0 2024-09-24 05:28:50,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2024-09-24 05:29:01,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.98 vs. limit=6.0 2024-09-24 05:29:10,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=430588.6666666667, ans=0.0 2024-09-24 05:29:10,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=430588.6666666667, ans=0.125 2024-09-24 05:29:43,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=430682.0, ans=0.2 2024-09-24 05:29:49,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=430682.0, ans=0.125 2024-09-24 05:30:02,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=430728.6666666667, ans=15.0 2024-09-24 05:30:12,485 INFO [train.py:1198] (0/4) Epoch 24, batch 2700, loss[loss=0.2088, ctc_loss=0.1399, cr_loss=0.3442, over 16956.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.137, cr_loss=0.3537, over 3344777.41 frames. ], batch size: 58, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:30:19,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=430775.3333333333, ans=0.1 2024-09-24 05:30:20,466 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.268e+02 1.356e+02 1.521e+02 3.128e+02, threshold=2.712e+02, percent-clipped=1.0 2024-09-24 05:30:27,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430822.0, ans=0.1 2024-09-24 05:30:46,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=430868.6666666667, ans=0.125 2024-09-24 05:31:09,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=430915.3333333333, ans=0.2 2024-09-24 05:31:31,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=22.5 2024-09-24 05:31:32,249 INFO [train.py:1198] (0/4) Epoch 24, batch 2750, loss[loss=0.1908, ctc_loss=0.1232, cr_loss=0.3382, over 17294.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1361, cr_loss=0.3524, over 3358010.60 frames. ], batch size: 49, lr: 4.97e-03, grad_scale: 32.0 2024-09-24 05:31:58,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2024-09-24 05:31:59,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=431055.3333333333, ans=0.125 2024-09-24 05:32:04,286 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:32:17,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431102.0, ans=0.125 2024-09-24 05:32:17,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=431102.0, ans=0.125 2024-09-24 05:32:55,659 INFO [train.py:1198] (0/4) Epoch 24, batch 2800, loss[loss=0.2228, ctc_loss=0.1467, cr_loss=0.3807, over 16987.00 frames. ], tot_loss[loss=0.2074, ctc_loss=0.1367, cr_loss=0.3536, over 3358472.16 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 32.0 2024-09-24 05:32:57,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=22.5 2024-09-24 05:32:59,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=431242.0, ans=0.125 2024-09-24 05:33:00,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=431242.0, ans=0.125 2024-09-24 05:33:03,460 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.249e+02 1.392e+02 1.537e+02 2.011e+02, threshold=2.784e+02, percent-clipped=0.0 2024-09-24 05:33:03,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=431242.0, ans=0.125 2024-09-24 05:33:28,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2024-09-24 05:34:13,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431428.6666666667, ans=0.1 2024-09-24 05:34:18,047 INFO [train.py:1198] (0/4) Epoch 24, batch 2850, loss[loss=0.1958, ctc_loss=0.1304, cr_loss=0.327, over 17070.00 frames. ], tot_loss[loss=0.2074, ctc_loss=0.1367, cr_loss=0.3537, over 3361326.32 frames. ], batch size: 46, lr: 4.96e-03, grad_scale: 32.0 2024-09-24 05:34:31,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=431475.3333333333, ans=0.0 2024-09-24 05:35:02,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=431568.6666666667, ans=0.025 2024-09-24 05:35:02,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2024-09-24 05:35:33,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2024-09-24 05:35:34,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=431662.0, ans=0.1 2024-09-24 05:35:34,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=431662.0, ans=0.125 2024-09-24 05:35:43,354 INFO [train.py:1198] (0/4) Epoch 24, batch 2900, loss[loss=0.17, ctc_loss=0.1092, cr_loss=0.3041, over 17139.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1363, cr_loss=0.353, over 3369090.33 frames. ], batch size: 40, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:35:52,964 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.236e+02 1.333e+02 1.474e+02 3.420e+02, threshold=2.666e+02, percent-clipped=1.0 2024-09-24 05:36:09,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=431755.3333333333, ans=0.0 2024-09-24 05:36:40,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=431848.6666666667, ans=0.04949747468305833 2024-09-24 05:36:44,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=431848.6666666667, ans=0.125 2024-09-24 05:37:06,445 INFO [train.py:1198] (0/4) Epoch 24, batch 2950, loss[loss=0.1637, ctc_loss=0.1069, cr_loss=0.2839, over 16955.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1357, cr_loss=0.3526, over 3374511.93 frames. ], batch size: 42, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:37:31,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-09-24 05:37:54,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=432082.0, ans=0.125 2024-09-24 05:38:01,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=12.0 2024-09-24 05:38:11,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=432128.6666666667, ans=0.2 2024-09-24 05:38:15,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2024-09-24 05:38:26,285 INFO [train.py:1198] (0/4) Epoch 24, batch 3000, loss[loss=0.2246, ctc_loss=0.1497, cr_loss=0.3744, over 17202.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1359, cr_loss=0.3519, over 3369078.33 frames. ], batch size: 55, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:38:26,286 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 05:38:42,491 INFO [train.py:1230] (0/4) Epoch 24, validation: loss=0.03786, ctc_loss=0.03786, cr_loss=8.617e-15, over 944034.00 frames. 2024-09-24 05:38:42,492 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 05:38:51,890 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.040e+02 1.243e+02 1.342e+02 1.455e+02 1.995e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-24 05:39:00,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=432222.0, ans=0.1 2024-09-24 05:39:14,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=432268.6666666667, ans=0.025 2024-09-24 05:39:40,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=432315.3333333333, ans=0.125 2024-09-24 05:39:42,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=432315.3333333333, ans=0.125 2024-09-24 05:39:53,248 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:40:03,773 INFO [train.py:1198] (0/4) Epoch 24, batch 3050, loss[loss=0.2359, ctc_loss=0.1579, cr_loss=0.3898, over 17215.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.136, cr_loss=0.3524, over 3364642.81 frames. ], batch size: 47, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:41:09,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=432595.3333333333, ans=0.0 2024-09-24 05:41:19,912 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.33 vs. limit=10.0 2024-09-24 05:41:26,721 INFO [train.py:1198] (0/4) Epoch 24, batch 3100, loss[loss=0.196, ctc_loss=0.1294, cr_loss=0.3328, over 17335.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1357, cr_loss=0.3509, over 3363217.20 frames. ], batch size: 48, lr: 4.96e-03, grad_scale: 16.0 2024-09-24 05:41:35,818 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.243e+02 1.328e+02 1.464e+02 2.073e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-24 05:41:36,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=432642.0, ans=10.0 2024-09-24 05:41:39,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=432642.0, ans=0.125 2024-09-24 05:41:51,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=432688.6666666667, ans=0.125 2024-09-24 05:42:04,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=432735.3333333333, ans=0.0 2024-09-24 05:42:10,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=432735.3333333333, ans=0.0 2024-09-24 05:42:15,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=432782.0, ans=0.035 2024-09-24 05:42:44,646 INFO [train.py:1198] (0/4) Epoch 24, batch 3150, loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3355, over 17343.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1359, cr_loss=0.3515, over 3357707.74 frames. ], batch size: 48, lr: 4.95e-03, grad_scale: 16.0 2024-09-24 05:42:46,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=432875.3333333333, ans=0.125 2024-09-24 05:43:02,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.87 vs. limit=15.0 2024-09-24 05:43:08,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=432922.0, ans=0.125 2024-09-24 05:43:11,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432922.0, ans=0.1 2024-09-24 05:43:38,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=12.0 2024-09-24 05:44:02,936 INFO [train.py:1198] (0/4) Epoch 24, batch 3200, loss[loss=0.1979, ctc_loss=0.1294, cr_loss=0.3424, over 17323.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1362, cr_loss=0.3518, over 3359178.21 frames. ], batch size: 51, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:44:12,035 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.252e+02 1.364e+02 1.478e+02 2.406e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-24 05:44:15,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=433108.6666666667, ans=0.0 2024-09-24 05:44:18,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=433155.3333333333, ans=0.2 2024-09-24 05:44:28,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=12.0 2024-09-24 05:44:35,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=433202.0, ans=0.125 2024-09-24 05:44:40,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=433202.0, ans=0.125 2024-09-24 05:44:51,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=433248.6666666667, ans=0.0 2024-09-24 05:45:16,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=433295.3333333333, ans=0.125 2024-09-24 05:45:20,627 INFO [train.py:1198] (0/4) Epoch 24, batch 3250, loss[loss=0.2006, ctc_loss=0.1313, cr_loss=0.3465, over 15944.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.1372, cr_loss=0.3528, over 3347953.32 frames. ], batch size: 74, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:45:38,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433388.6666666667, ans=0.0 2024-09-24 05:45:54,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433435.3333333333, ans=0.1 2024-09-24 05:46:30,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433528.6666666667, ans=0.1 2024-09-24 05:46:40,882 INFO [train.py:1198] (0/4) Epoch 24, batch 3300, loss[loss=0.1894, ctc_loss=0.1231, cr_loss=0.3318, over 17261.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1374, cr_loss=0.3541, over 3351092.18 frames. ], batch size: 42, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:46:50,401 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.267e+02 1.337e+02 1.528e+02 2.027e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-24 05:46:57,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2024-09-24 05:46:59,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=433622.0, ans=0.125 2024-09-24 05:47:18,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=12.0 2024-09-24 05:47:26,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433715.3333333333, ans=0.1 2024-09-24 05:47:37,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433715.3333333333, ans=0.0 2024-09-24 05:47:40,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433715.3333333333, ans=0.1 2024-09-24 05:47:40,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2024-09-24 05:47:52,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=433762.0, ans=0.2 2024-09-24 05:47:57,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=433808.6666666667, ans=0.125 2024-09-24 05:47:58,603 INFO [train.py:1198] (0/4) Epoch 24, batch 3350, loss[loss=0.2543, ctc_loss=0.1729, cr_loss=0.4069, over 17292.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1372, cr_loss=0.354, over 3352606.58 frames. ], batch size: 49, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:48:49,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433948.6666666667, ans=0.1 2024-09-24 05:49:16,899 INFO [train.py:1198] (0/4) Epoch 24, batch 3400, loss[loss=0.2476, ctc_loss=0.1695, cr_loss=0.3902, over 11682.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.1381, cr_loss=0.3543, over 3335833.93 frames. ], batch size: 123, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:49:17,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=434042.0, ans=0.0 2024-09-24 05:49:26,488 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.267e+02 1.361e+02 1.503e+02 2.338e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-24 05:49:33,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=12.0 2024-09-24 05:49:34,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=434088.6666666667, ans=0.0 2024-09-24 05:49:58,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=434135.3333333333, ans=0.125 2024-09-24 05:50:03,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=434135.3333333333, ans=0.2 2024-09-24 05:50:09,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-09-24 05:50:36,792 INFO [train.py:1198] (0/4) Epoch 24, batch 3450, loss[loss=0.1838, ctc_loss=0.1236, cr_loss=0.3007, over 17057.00 frames. ], tot_loss[loss=0.2099, ctc_loss=0.1389, cr_loss=0.3551, over 3326680.00 frames. ], batch size: 46, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:50:37,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=434275.3333333333, ans=0.1 2024-09-24 05:50:48,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=434275.3333333333, ans=0.125 2024-09-24 05:51:15,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=434368.6666666667, ans=0.125 2024-09-24 05:51:25,853 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.60 vs. limit=10.0 2024-09-24 05:51:47,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=434462.0, ans=0.125 2024-09-24 05:51:48,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-24 05:51:59,812 INFO [train.py:1198] (0/4) Epoch 24, batch 3500, loss[loss=0.1934, ctc_loss=0.1242, cr_loss=0.3459, over 17149.00 frames. ], tot_loss[loss=0.2096, ctc_loss=0.1385, cr_loss=0.3553, over 3333380.88 frames. ], batch size: 48, lr: 4.95e-03, grad_scale: 32.0 2024-09-24 05:52:04,672 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:52:10,825 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.042e+02 1.254e+02 1.358e+02 1.511e+02 3.142e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-24 05:52:11,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.61 vs. limit=15.0 2024-09-24 05:52:12,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=434508.6666666667, ans=0.125 2024-09-24 05:53:18,585 INFO [train.py:1198] (0/4) Epoch 24, batch 3550, loss[loss=0.2217, ctc_loss=0.1457, cr_loss=0.3799, over 17297.00 frames. ], tot_loss[loss=0.2094, ctc_loss=0.1384, cr_loss=0.3553, over 3339620.95 frames. ], batch size: 51, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 05:53:36,498 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-09-24 05:53:38,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=15.0 2024-09-24 05:53:48,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=434835.3333333333, ans=0.2 2024-09-24 05:54:01,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-24 05:54:05,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=434882.0, ans=0.025 2024-09-24 05:54:18,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=434882.0, ans=0.05 2024-09-24 05:54:36,759 INFO [train.py:1198] (0/4) Epoch 24, batch 3600, loss[loss=0.1802, ctc_loss=0.1149, cr_loss=0.3267, over 16968.00 frames. ], tot_loss[loss=0.2106, ctc_loss=0.1393, cr_loss=0.3563, over 3332943.95 frames. ], batch size: 42, lr: 4.94e-03, grad_scale: 32.0 2024-09-24 05:54:47,631 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.266e+02 1.361e+02 1.484e+02 1.804e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-24 05:54:49,366 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 05:55:32,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2024-09-24 05:55:38,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=435115.3333333333, ans=0.09899494936611666 2024-09-24 05:55:57,299 INFO [train.py:1198] (0/4) Epoch 24, batch 3650, loss[loss=0.2125, ctc_loss=0.1394, cr_loss=0.3657, over 17308.00 frames. ], tot_loss[loss=0.2092, ctc_loss=0.1382, cr_loss=0.355, over 3342410.94 frames. ], batch size: 46, lr: 4.94e-03, grad_scale: 32.0 2024-09-24 05:55:58,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=22.5 2024-09-24 05:56:07,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=435208.6666666667, ans=0.125 2024-09-24 05:56:32,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=435302.0, ans=0.95 2024-09-24 05:56:52,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2024-09-24 05:57:12,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=435395.3333333333, ans=0.0 2024-09-24 05:57:16,736 INFO [train.py:1198] (0/4) Epoch 24, batch 3700, loss[loss=0.179, ctc_loss=0.1147, cr_loss=0.3215, over 16818.00 frames. ], tot_loss[loss=0.208, ctc_loss=0.1374, cr_loss=0.353, over 3335031.46 frames. ], batch size: 37, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 05:57:29,297 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.270e+02 1.350e+02 1.462e+02 1.892e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-24 05:57:31,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=435488.6666666667, ans=0.125 2024-09-24 05:57:41,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=435488.6666666667, ans=0.125 2024-09-24 05:57:57,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=435535.3333333333, ans=10.0 2024-09-24 05:58:14,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=435582.0, ans=0.125 2024-09-24 05:58:32,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435628.6666666667, ans=0.1 2024-09-24 05:58:35,098 INFO [train.py:1198] (0/4) Epoch 24, batch 3750, loss[loss=0.2265, ctc_loss=0.1513, cr_loss=0.3756, over 15109.00 frames. ], tot_loss[loss=0.2082, ctc_loss=0.1375, cr_loss=0.3531, over 3325429.33 frames. ], batch size: 89, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 05:58:58,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435722.0, ans=0.1 2024-09-24 05:59:27,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=435815.3333333333, ans=0.125 2024-09-24 05:59:48,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435862.0, ans=0.1 2024-09-24 05:59:54,385 INFO [train.py:1198] (0/4) Epoch 24, batch 3800, loss[loss=0.2202, ctc_loss=0.1476, cr_loss=0.3631, over 15091.00 frames. ], tot_loss[loss=0.2084, ctc_loss=0.1377, cr_loss=0.3531, over 3302201.20 frames. ], batch size: 89, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 06:00:07,110 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.258e+02 1.341e+02 1.479e+02 2.397e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-24 06:00:13,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=435955.3333333333, ans=0.0 2024-09-24 06:00:30,008 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:00:39,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=436002.0, ans=0.125 2024-09-24 06:00:49,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=436048.6666666667, ans=0.125 2024-09-24 06:00:52,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=436048.6666666667, ans=0.025 2024-09-24 06:01:08,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-24 06:01:14,194 INFO [train.py:1198] (0/4) Epoch 24, batch 3850, loss[loss=0.1956, ctc_loss=0.1312, cr_loss=0.3217, over 16962.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.1362, cr_loss=0.3504, over 3316257.52 frames. ], batch size: 42, lr: 4.94e-03, grad_scale: 16.0 2024-09-24 06:01:20,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=436142.0, ans=0.0 2024-09-24 06:01:42,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=436188.6666666667, ans=0.125 2024-09-24 06:02:00,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=436282.0, ans=0.125 2024-09-24 06:02:06,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=436282.0, ans=0.125 2024-09-24 06:02:24,014 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-24.pt 2024-09-24 06:03:16,545 INFO [train.py:1198] (0/4) Epoch 25, batch 0, loss[loss=0.2047, ctc_loss=0.1343, cr_loss=0.3522, over 17269.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1343, cr_loss=0.3522, over 17269.00 frames. ], batch size: 44, lr: 4.83e-03, grad_scale: 32.0 2024-09-24 06:03:16,546 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 06:03:31,929 INFO [train.py:1230] (0/4) Epoch 25, validation: loss=0.03759, ctc_loss=0.03759, cr_loss=8.067e-15, over 944034.00 frames. 2024-09-24 06:03:31,930 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 06:03:51,061 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.314e+02 1.430e+02 1.672e+02 2.033e+02, threshold=2.861e+02, percent-clipped=0.0 2024-09-24 06:04:11,785 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:04:14,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=436450.0, ans=0.125 2024-09-24 06:04:23,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-24 06:04:24,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=436496.6666666667, ans=0.07 2024-09-24 06:04:29,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=436496.6666666667, ans=0.125 2024-09-24 06:04:39,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2024-09-24 06:04:48,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.88 vs. limit=6.0 2024-09-24 06:04:54,469 INFO [train.py:1198] (0/4) Epoch 25, batch 50, loss[loss=0.1585, ctc_loss=0.1021, cr_loss=0.2816, over 16692.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1366, cr_loss=0.3531, over 756738.58 frames. ], batch size: 37, lr: 4.83e-03, grad_scale: 32.0 2024-09-24 06:05:11,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-09-24 06:05:12,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=436636.6666666667, ans=0.0 2024-09-24 06:05:54,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=436730.0, ans=0.125 2024-09-24 06:06:19,641 INFO [train.py:1198] (0/4) Epoch 25, batch 100, loss[loss=0.2036, ctc_loss=0.1312, cr_loss=0.3624, over 17370.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1369, cr_loss=0.3557, over 1339677.74 frames. ], batch size: 48, lr: 4.83e-03, grad_scale: 16.0 2024-09-24 06:06:21,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=436823.3333333333, ans=0.125 2024-09-24 06:06:27,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=436823.3333333333, ans=0.0 2024-09-24 06:06:40,252 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.292e+02 1.373e+02 1.497e+02 2.148e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-24 06:07:20,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=436963.3333333333, ans=0.1 2024-09-24 06:07:21,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=437010.0, ans=10.0 2024-09-24 06:07:42,085 INFO [train.py:1198] (0/4) Epoch 25, batch 150, loss[loss=0.2272, ctc_loss=0.1503, cr_loss=0.3844, over 17229.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1352, cr_loss=0.3529, over 1795721.90 frames. ], batch size: 55, lr: 4.83e-03, grad_scale: 16.0 2024-09-24 06:08:24,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-09-24 06:08:37,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=437196.6666666667, ans=0.125 2024-09-24 06:08:39,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=437196.6666666667, ans=0.125 2024-09-24 06:08:50,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437243.3333333333, ans=0.1 2024-09-24 06:08:50,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=437243.3333333333, ans=0.125 2024-09-24 06:09:04,126 INFO [train.py:1198] (0/4) Epoch 25, batch 200, loss[loss=0.2161, ctc_loss=0.1453, cr_loss=0.3541, over 17293.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1357, cr_loss=0.3536, over 2151921.49 frames. ], batch size: 49, lr: 4.83e-03, grad_scale: 8.0 2024-09-24 06:09:10,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=437290.0, ans=0.125 2024-09-24 06:09:19,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=437336.6666666667, ans=0.125 2024-09-24 06:09:26,583 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.223e+02 1.322e+02 1.442e+02 1.903e+02, threshold=2.645e+02, percent-clipped=0.0 2024-09-24 06:09:30,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=437336.6666666667, ans=0.0 2024-09-24 06:09:36,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=437383.3333333333, ans=0.0 2024-09-24 06:09:39,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=437383.3333333333, ans=0.125 2024-09-24 06:09:46,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437383.3333333333, ans=0.1 2024-09-24 06:10:01,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=437430.0, ans=0.1 2024-09-24 06:10:06,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=437476.6666666667, ans=0.0 2024-09-24 06:10:24,169 INFO [train.py:1198] (0/4) Epoch 25, batch 250, loss[loss=0.1905, ctc_loss=0.1237, cr_loss=0.3339, over 17301.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.136, cr_loss=0.353, over 2414910.28 frames. ], batch size: 49, lr: 4.83e-03, grad_scale: 8.0 2024-09-24 06:10:33,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=437523.3333333333, ans=0.125 2024-09-24 06:11:05,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-09-24 06:11:06,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=437616.6666666667, ans=0.05 2024-09-24 06:11:24,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437663.3333333333, ans=0.1 2024-09-24 06:11:26,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2024-09-24 06:11:32,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=437710.0, ans=0.125 2024-09-24 06:11:49,772 INFO [train.py:1198] (0/4) Epoch 25, batch 300, loss[loss=0.2219, ctc_loss=0.1462, cr_loss=0.3786, over 16975.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.136, cr_loss=0.3525, over 2635792.35 frames. ], batch size: 53, lr: 4.83e-03, grad_scale: 8.0 2024-09-24 06:11:55,334 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.64 vs. limit=15.0 2024-09-24 06:12:12,131 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.257e+02 1.339e+02 1.438e+02 1.926e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 06:12:25,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=437850.0, ans=0.2 2024-09-24 06:13:08,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2024-09-24 06:13:12,246 INFO [train.py:1198] (0/4) Epoch 25, batch 350, loss[loss=0.1884, ctc_loss=0.1229, cr_loss=0.3274, over 17110.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1365, cr_loss=0.353, over 2785438.19 frames. ], batch size: 40, lr: 4.82e-03, grad_scale: 8.0 2024-09-24 06:13:28,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=438036.6666666667, ans=0.09899494936611666 2024-09-24 06:13:35,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=438036.6666666667, ans=0.0 2024-09-24 06:13:36,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=438036.6666666667, ans=0.125 2024-09-24 06:13:38,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=438036.6666666667, ans=0.125 2024-09-24 06:13:58,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=438083.3333333333, ans=0.025 2024-09-24 06:14:06,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=438130.0, ans=0.125 2024-09-24 06:14:20,792 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:14:35,221 INFO [train.py:1198] (0/4) Epoch 25, batch 400, loss[loss=0.1728, ctc_loss=0.1105, cr_loss=0.3117, over 16285.00 frames. ], tot_loss[loss=0.2077, ctc_loss=0.1369, cr_loss=0.3542, over 2905090.49 frames. ], batch size: 36, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:14:42,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=438223.3333333333, ans=0.125 2024-09-24 06:14:46,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=438223.3333333333, ans=0.09899494936611666 2024-09-24 06:14:57,569 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.241e+02 1.339e+02 1.521e+02 2.224e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 06:14:58,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=438270.0, ans=0.125 2024-09-24 06:15:02,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=438270.0, ans=0.0 2024-09-24 06:15:23,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2024-09-24 06:15:26,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=438363.3333333333, ans=0.125 2024-09-24 06:15:57,605 INFO [train.py:1198] (0/4) Epoch 25, batch 450, loss[loss=0.1663, ctc_loss=0.1056, cr_loss=0.3032, over 17021.00 frames. ], tot_loss[loss=0.2078, ctc_loss=0.1369, cr_loss=0.3545, over 3007244.61 frames. ], batch size: 39, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:16:16,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=438503.3333333333, ans=0.0 2024-09-24 06:16:53,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=438596.6666666667, ans=0.125 2024-09-24 06:16:58,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=438596.6666666667, ans=0.125 2024-09-24 06:17:11,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438643.3333333333, ans=0.1 2024-09-24 06:17:20,717 INFO [train.py:1198] (0/4) Epoch 25, batch 500, loss[loss=0.198, ctc_loss=0.1281, cr_loss=0.3491, over 17072.00 frames. ], tot_loss[loss=0.2079, ctc_loss=0.137, cr_loss=0.3544, over 3075884.82 frames. ], batch size: 43, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:17:27,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=438690.0, ans=0.0 2024-09-24 06:17:46,421 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.265e+02 1.364e+02 1.484e+02 2.816e+02, threshold=2.728e+02, percent-clipped=1.0 2024-09-24 06:18:04,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=438783.3333333333, ans=0.125 2024-09-24 06:18:17,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=438830.0, ans=0.125 2024-09-24 06:18:41,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=438876.6666666667, ans=0.125 2024-09-24 06:18:44,382 INFO [train.py:1198] (0/4) Epoch 25, batch 550, loss[loss=0.1787, ctc_loss=0.1151, cr_loss=0.3181, over 16702.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1362, cr_loss=0.3532, over 3146849.46 frames. ], batch size: 37, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:19:06,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=438970.0, ans=0.125 2024-09-24 06:19:11,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=438970.0, ans=0.125 2024-09-24 06:19:14,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438970.0, ans=0.1 2024-09-24 06:19:14,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=438970.0, ans=0.125 2024-09-24 06:19:25,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=22.5 2024-09-24 06:19:44,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=439063.3333333333, ans=0.125 2024-09-24 06:19:48,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=439063.3333333333, ans=0.2 2024-09-24 06:19:52,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=439110.0, ans=0.0 2024-09-24 06:20:06,959 INFO [train.py:1198] (0/4) Epoch 25, batch 600, loss[loss=0.1867, ctc_loss=0.1214, cr_loss=0.3267, over 17021.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1361, cr_loss=0.3533, over 3199120.28 frames. ], batch size: 44, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:20:13,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.26 vs. limit=22.5 2024-09-24 06:20:29,582 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.242e+02 1.338e+02 1.453e+02 1.774e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 06:20:33,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=439203.3333333333, ans=0.125 2024-09-24 06:20:42,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=439250.0, ans=0.07 2024-09-24 06:21:32,785 INFO [train.py:1198] (0/4) Epoch 25, batch 650, loss[loss=0.181, ctc_loss=0.1181, cr_loss=0.3149, over 17182.00 frames. ], tot_loss[loss=0.2063, ctc_loss=0.1358, cr_loss=0.3525, over 3229164.97 frames. ], batch size: 41, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:21:34,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=439390.0, ans=0.125 2024-09-24 06:21:59,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=439436.6666666667, ans=0.0 2024-09-24 06:22:26,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=439530.0, ans=0.025 2024-09-24 06:22:52,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=439576.6666666667, ans=0.125 2024-09-24 06:22:52,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=439576.6666666667, ans=0.2 2024-09-24 06:22:54,951 INFO [train.py:1198] (0/4) Epoch 25, batch 700, loss[loss=0.2265, ctc_loss=0.1515, cr_loss=0.3752, over 17157.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1352, cr_loss=0.3514, over 3260218.02 frames. ], batch size: 45, lr: 4.82e-03, grad_scale: 16.0 2024-09-24 06:23:03,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=439623.3333333333, ans=0.125 2024-09-24 06:23:17,412 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.239e+02 1.345e+02 1.493e+02 2.005e+02, threshold=2.689e+02, percent-clipped=0.0 2024-09-24 06:23:50,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=439763.3333333333, ans=0.035 2024-09-24 06:23:50,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439763.3333333333, ans=0.1 2024-09-24 06:23:57,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=439763.3333333333, ans=0.1 2024-09-24 06:24:01,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=439810.0, ans=0.0 2024-09-24 06:24:17,400 INFO [train.py:1198] (0/4) Epoch 25, batch 750, loss[loss=0.2268, ctc_loss=0.1483, cr_loss=0.3927, over 16054.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1354, cr_loss=0.3522, over 3285627.98 frames. ], batch size: 74, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:24:17,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=439856.6666666667, ans=0.2 2024-09-24 06:24:19,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2024-09-24 06:24:27,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=439856.6666666667, ans=0.125 2024-09-24 06:24:42,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-09-24 06:24:59,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=439950.0, ans=0.125 2024-09-24 06:25:21,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=440043.3333333333, ans=0.125 2024-09-24 06:25:28,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440043.3333333333, ans=0.1 2024-09-24 06:25:37,729 INFO [train.py:1198] (0/4) Epoch 25, batch 800, loss[loss=0.2125, ctc_loss=0.141, cr_loss=0.3574, over 16754.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.135, cr_loss=0.3509, over 3306220.03 frames. ], batch size: 61, lr: 4.81e-03, grad_scale: 32.0 2024-09-24 06:25:49,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=440090.0, ans=0.125 2024-09-24 06:25:50,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=440090.0, ans=0.0 2024-09-24 06:26:02,618 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.255e+02 1.320e+02 1.414e+02 2.395e+02, threshold=2.641e+02, percent-clipped=0.0 2024-09-24 06:26:18,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440183.3333333333, ans=0.1 2024-09-24 06:26:36,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=440230.0, ans=0.125 2024-09-24 06:26:53,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=440276.6666666667, ans=0.2 2024-09-24 06:27:03,009 INFO [train.py:1198] (0/4) Epoch 25, batch 850, loss[loss=0.2107, ctc_loss=0.1374, cr_loss=0.3668, over 17300.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.135, cr_loss=0.3508, over 3314200.57 frames. ], batch size: 49, lr: 4.81e-03, grad_scale: 32.0 2024-09-24 06:27:09,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=440323.3333333333, ans=0.025 2024-09-24 06:27:13,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.96 vs. limit=10.0 2024-09-24 06:27:17,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=440370.0, ans=0.125 2024-09-24 06:27:49,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=440416.6666666667, ans=0.125 2024-09-24 06:28:08,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=440510.0, ans=0.125 2024-09-24 06:28:16,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=440510.0, ans=0.2 2024-09-24 06:28:26,005 INFO [train.py:1198] (0/4) Epoch 25, batch 900, loss[loss=0.2141, ctc_loss=0.1419, cr_loss=0.3608, over 17232.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1347, cr_loss=0.3501, over 3318726.69 frames. ], batch size: 50, lr: 4.81e-03, grad_scale: 32.0 2024-09-24 06:28:36,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=22.5 2024-09-24 06:28:50,995 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.308e+02 1.404e+02 1.529e+02 2.023e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-24 06:28:54,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=440603.3333333333, ans=0.0 2024-09-24 06:28:57,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=440603.3333333333, ans=0.1 2024-09-24 06:29:19,831 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=440696.6666666667, ans=0.125 2024-09-24 06:29:47,842 INFO [train.py:1198] (0/4) Epoch 25, batch 950, loss[loss=0.2352, ctc_loss=0.1558, cr_loss=0.3971, over 17219.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1347, cr_loss=0.3504, over 3327275.35 frames. ], batch size: 55, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:30:31,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440883.3333333333, ans=0.1 2024-09-24 06:30:34,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=440930.0, ans=0.0 2024-09-24 06:30:45,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=440930.0, ans=0.025 2024-09-24 06:31:04,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-09-24 06:31:12,908 INFO [train.py:1198] (0/4) Epoch 25, batch 1000, loss[loss=0.2097, ctc_loss=0.1386, cr_loss=0.3557, over 17072.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1359, cr_loss=0.3533, over 3343397.48 frames. ], batch size: 56, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:31:27,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=441070.0, ans=0.025 2024-09-24 06:31:29,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=441070.0, ans=0.125 2024-09-24 06:31:32,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=441070.0, ans=0.0 2024-09-24 06:31:32,724 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2024-09-24 06:31:36,534 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.287e+02 1.404e+02 1.496e+02 1.832e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-24 06:31:51,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=441116.6666666667, ans=0.0 2024-09-24 06:32:05,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=441163.3333333333, ans=0.125 2024-09-24 06:32:06,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-24 06:32:07,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=441163.3333333333, ans=0.0 2024-09-24 06:32:20,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=441210.0, ans=0.0 2024-09-24 06:32:32,884 INFO [train.py:1198] (0/4) Epoch 25, batch 1050, loss[loss=0.2209, ctc_loss=0.1463, cr_loss=0.3729, over 16778.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1352, cr_loss=0.3528, over 3352479.47 frames. ], batch size: 61, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:32:36,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=22.5 2024-09-24 06:32:42,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=441256.6666666667, ans=0.125 2024-09-24 06:32:47,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=441256.6666666667, ans=0.125 2024-09-24 06:33:10,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.50 vs. limit=22.5 2024-09-24 06:33:27,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=441396.6666666667, ans=0.2 2024-09-24 06:33:34,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.0 2024-09-24 06:33:36,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=441396.6666666667, ans=0.125 2024-09-24 06:33:53,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=441443.3333333333, ans=12.0 2024-09-24 06:33:58,446 INFO [train.py:1198] (0/4) Epoch 25, batch 1100, loss[loss=0.2107, ctc_loss=0.1399, cr_loss=0.3541, over 17003.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1346, cr_loss=0.3515, over 3356259.65 frames. ], batch size: 44, lr: 4.81e-03, grad_scale: 16.0 2024-09-24 06:34:22,528 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.251e+02 1.334e+02 1.435e+02 1.725e+02, threshold=2.668e+02, percent-clipped=0.0 2024-09-24 06:34:38,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=441583.3333333333, ans=0.125 2024-09-24 06:34:50,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2024-09-24 06:34:58,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=12.0 2024-09-24 06:35:18,624 INFO [train.py:1198] (0/4) Epoch 25, batch 1150, loss[loss=0.2009, ctc_loss=0.1297, cr_loss=0.3557, over 17088.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.135, cr_loss=0.3518, over 3355108.01 frames. ], batch size: 40, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:35:23,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=441723.3333333333, ans=0.125 2024-09-24 06:35:39,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=441770.0, ans=0.0 2024-09-24 06:36:02,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=441816.6666666667, ans=0.125 2024-09-24 06:36:11,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=441863.3333333333, ans=0.125 2024-09-24 06:36:29,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2024-09-24 06:36:32,055 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:36:32,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441910.0, ans=0.1 2024-09-24 06:36:38,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.67 vs. limit=6.0 2024-09-24 06:36:40,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-24 06:36:42,700 INFO [train.py:1198] (0/4) Epoch 25, batch 1200, loss[loss=0.2331, ctc_loss=0.1595, cr_loss=0.3677, over 16416.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1362, cr_loss=0.3536, over 3350948.25 frames. ], batch size: 66, lr: 4.80e-03, grad_scale: 32.0 2024-09-24 06:37:06,723 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.253e+02 1.340e+02 1.429e+02 1.909e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 06:37:16,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=442050.0, ans=0.125 2024-09-24 06:37:38,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=442096.6666666667, ans=0.07 2024-09-24 06:38:05,402 INFO [train.py:1198] (0/4) Epoch 25, batch 1250, loss[loss=0.1795, ctc_loss=0.1159, cr_loss=0.3178, over 16701.00 frames. ], tot_loss[loss=0.2072, ctc_loss=0.1364, cr_loss=0.3539, over 3353173.39 frames. ], batch size: 37, lr: 4.80e-03, grad_scale: 32.0 2024-09-24 06:38:44,748 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-09-24 06:38:57,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=442330.0, ans=0.0 2024-09-24 06:39:06,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=442330.0, ans=0.1 2024-09-24 06:39:21,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=442376.6666666667, ans=0.5 2024-09-24 06:39:27,925 INFO [train.py:1198] (0/4) Epoch 25, batch 1300, loss[loss=0.2389, ctc_loss=0.1576, cr_loss=0.4065, over 16978.00 frames. ], tot_loss[loss=0.2076, ctc_loss=0.1366, cr_loss=0.3548, over 3354946.34 frames. ], batch size: 56, lr: 4.80e-03, grad_scale: 32.0 2024-09-24 06:39:31,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=442423.3333333333, ans=0.0 2024-09-24 06:39:34,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=442423.3333333333, ans=0.025 2024-09-24 06:39:52,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=442470.0, ans=0.125 2024-09-24 06:39:53,308 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.277e+02 1.396e+02 1.533e+02 2.196e+02, threshold=2.791e+02, percent-clipped=0.0 2024-09-24 06:39:53,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=442470.0, ans=0.125 2024-09-24 06:40:30,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=442610.0, ans=0.1 2024-09-24 06:40:47,419 INFO [train.py:1198] (0/4) Epoch 25, batch 1350, loss[loss=0.2359, ctc_loss=0.1557, cr_loss=0.401, over 17035.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1362, cr_loss=0.3537, over 3351360.95 frames. ], batch size: 52, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:41:15,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=12.0 2024-09-24 06:41:18,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=442703.3333333333, ans=0.125 2024-09-24 06:41:51,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=442796.6666666667, ans=0.1 2024-09-24 06:41:52,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=442796.6666666667, ans=0.125 2024-09-24 06:41:52,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=442796.6666666667, ans=0.125 2024-09-24 06:41:53,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=442796.6666666667, ans=0.125 2024-09-24 06:42:12,269 INFO [train.py:1198] (0/4) Epoch 25, batch 1400, loss[loss=0.223, ctc_loss=0.1508, cr_loss=0.361, over 16736.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1352, cr_loss=0.3515, over 3354614.15 frames. ], batch size: 61, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:42:30,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-09-24 06:42:38,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=442936.6666666667, ans=10.0 2024-09-24 06:42:38,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=442936.6666666667, ans=0.125 2024-09-24 06:42:38,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=442936.6666666667, ans=0.0 2024-09-24 06:42:40,062 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.265e+02 1.378e+02 1.497e+02 2.360e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 06:42:40,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442936.6666666667, ans=0.1 2024-09-24 06:42:43,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=442936.6666666667, ans=0.125 2024-09-24 06:42:43,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=442936.6666666667, ans=0.125 2024-09-24 06:42:46,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=442983.3333333333, ans=0.125 2024-09-24 06:42:59,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=442983.3333333333, ans=0.125 2024-09-24 06:43:12,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=12.0 2024-09-24 06:43:20,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=443076.6666666667, ans=0.125 2024-09-24 06:43:22,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-09-24 06:43:36,566 INFO [train.py:1198] (0/4) Epoch 25, batch 1450, loss[loss=0.223, ctc_loss=0.146, cr_loss=0.3846, over 17306.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1353, cr_loss=0.3512, over 3345928.97 frames. ], batch size: 51, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:43:43,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=443123.3333333333, ans=0.0 2024-09-24 06:43:54,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=443170.0, ans=0.125 2024-09-24 06:44:32,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=443263.3333333333, ans=0.2 2024-09-24 06:44:37,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2024-09-24 06:44:43,356 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:44:55,971 INFO [train.py:1198] (0/4) Epoch 25, batch 1500, loss[loss=0.2214, ctc_loss=0.1474, cr_loss=0.3702, over 15921.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1354, cr_loss=0.3515, over 3338934.94 frames. ], batch size: 74, lr: 4.80e-03, grad_scale: 16.0 2024-09-24 06:45:12,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.32 vs. limit=6.0 2024-09-24 06:45:17,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=443403.3333333333, ans=0.0 2024-09-24 06:45:21,559 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.238e+02 1.333e+02 1.437e+02 1.693e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-24 06:45:47,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=443496.6666666667, ans=0.0 2024-09-24 06:46:02,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2024-09-24 06:46:08,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443543.3333333333, ans=0.1 2024-09-24 06:46:21,156 INFO [train.py:1198] (0/4) Epoch 25, batch 1550, loss[loss=0.2, ctc_loss=0.1297, cr_loss=0.3512, over 17031.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1365, cr_loss=0.3532, over 3333207.66 frames. ], batch size: 51, lr: 4.79e-03, grad_scale: 16.0 2024-09-24 06:46:34,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443590.0, ans=0.1 2024-09-24 06:46:44,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=443636.6666666667, ans=0.025 2024-09-24 06:46:55,775 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2024-09-24 06:47:03,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=443683.3333333333, ans=0.125 2024-09-24 06:47:10,161 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2024-09-24 06:47:19,305 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:47:37,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443776.6666666667, ans=0.125 2024-09-24 06:47:43,800 INFO [train.py:1198] (0/4) Epoch 25, batch 1600, loss[loss=0.2343, ctc_loss=0.1557, cr_loss=0.393, over 17233.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1362, cr_loss=0.353, over 3335608.96 frames. ], batch size: 55, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:47:56,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=443823.3333333333, ans=0.125 2024-09-24 06:48:00,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=443870.0, ans=0.125 2024-09-24 06:48:08,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=443870.0, ans=0.125 2024-09-24 06:48:09,684 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.061e+02 1.242e+02 1.313e+02 1.472e+02 2.185e+02, threshold=2.626e+02, percent-clipped=0.0 2024-09-24 06:48:21,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-24 06:49:03,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=444010.0, ans=0.2 2024-09-24 06:49:06,446 INFO [train.py:1198] (0/4) Epoch 25, batch 1650, loss[loss=0.2212, ctc_loss=0.1449, cr_loss=0.3812, over 17290.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1362, cr_loss=0.3524, over 3336158.95 frames. ], batch size: 49, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:49:24,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=444103.3333333333, ans=0.0 2024-09-24 06:49:57,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=444196.6666666667, ans=0.09899494936611666 2024-09-24 06:50:18,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=444243.3333333333, ans=0.025 2024-09-24 06:50:26,226 INFO [train.py:1198] (0/4) Epoch 25, batch 1700, loss[loss=0.2154, ctc_loss=0.1425, cr_loss=0.3648, over 17023.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1359, cr_loss=0.3519, over 3345451.43 frames. ], batch size: 44, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:50:31,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=444290.0, ans=0.035 2024-09-24 06:50:36,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.0 2024-09-24 06:50:48,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=444336.6666666667, ans=0.2 2024-09-24 06:50:54,229 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.301e+02 1.388e+02 1.513e+02 1.890e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-24 06:51:20,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=444430.0, ans=0.125 2024-09-24 06:51:24,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=444430.0, ans=0.125 2024-09-24 06:51:44,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=444476.6666666667, ans=0.07 2024-09-24 06:51:51,034 INFO [train.py:1198] (0/4) Epoch 25, batch 1750, loss[loss=0.168, ctc_loss=0.1075, cr_loss=0.3022, over 17085.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1352, cr_loss=0.3509, over 3350527.02 frames. ], batch size: 40, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:52:10,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=444570.0, ans=0.0 2024-09-24 06:52:19,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-09-24 06:52:19,850 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 06:52:39,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=444663.3333333333, ans=0.125 2024-09-24 06:52:41,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=444663.3333333333, ans=0.025 2024-09-24 06:52:46,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=444663.3333333333, ans=0.125 2024-09-24 06:52:49,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2024-09-24 06:52:55,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=444710.0, ans=0.125 2024-09-24 06:53:12,786 INFO [train.py:1198] (0/4) Epoch 25, batch 1800, loss[loss=0.2043, ctc_loss=0.1293, cr_loss=0.375, over 17017.00 frames. ], tot_loss[loss=0.2056, ctc_loss=0.1354, cr_loss=0.3511, over 3347393.90 frames. ], batch size: 39, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:53:16,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=22.5 2024-09-24 06:53:40,996 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.233e+02 1.324e+02 1.428e+02 1.805e+02, threshold=2.648e+02, percent-clipped=0.0 2024-09-24 06:53:46,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=444850.0, ans=0.125 2024-09-24 06:53:47,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=444850.0, ans=0.125 2024-09-24 06:54:05,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=444896.6666666667, ans=0.5 2024-09-24 06:54:16,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=444896.6666666667, ans=0.125 2024-09-24 06:54:27,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=444943.3333333333, ans=0.0 2024-09-24 06:54:32,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=444943.3333333333, ans=0.125 2024-09-24 06:54:35,386 INFO [train.py:1198] (0/4) Epoch 25, batch 1850, loss[loss=0.1685, ctc_loss=0.1079, cr_loss=0.3029, over 16268.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1353, cr_loss=0.3507, over 3343108.36 frames. ], batch size: 36, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:54:39,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=444990.0, ans=0.0 2024-09-24 06:54:50,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-24 06:55:55,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=445176.6666666667, ans=0.125 2024-09-24 06:56:01,116 INFO [train.py:1198] (0/4) Epoch 25, batch 1900, loss[loss=0.1864, ctc_loss=0.1187, cr_loss=0.3386, over 17095.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1354, cr_loss=0.3507, over 3350982.89 frames. ], batch size: 43, lr: 4.79e-03, grad_scale: 32.0 2024-09-24 06:56:26,823 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.232e+02 1.312e+02 1.425e+02 2.956e+02, threshold=2.624e+02, percent-clipped=1.0 2024-09-24 06:56:44,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=445316.6666666667, ans=0.125 2024-09-24 06:56:47,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=445363.3333333333, ans=0.2 2024-09-24 06:56:55,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=445363.3333333333, ans=0.125 2024-09-24 06:57:13,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=445410.0, ans=0.0 2024-09-24 06:57:20,855 INFO [train.py:1198] (0/4) Epoch 25, batch 1950, loss[loss=0.2421, ctc_loss=0.1611, cr_loss=0.4053, over 17233.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.136, cr_loss=0.352, over 3345639.02 frames. ], batch size: 50, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 06:57:22,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=445456.6666666667, ans=0.0 2024-09-24 06:57:54,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=445550.0, ans=0.09899494936611666 2024-09-24 06:57:54,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=445550.0, ans=0.2 2024-09-24 06:58:24,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=445596.6666666667, ans=0.125 2024-09-24 06:58:28,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=445643.3333333333, ans=0.125 2024-09-24 06:58:36,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=445643.3333333333, ans=0.125 2024-09-24 06:58:46,270 INFO [train.py:1198] (0/4) Epoch 25, batch 2000, loss[loss=0.2321, ctc_loss=0.155, cr_loss=0.3857, over 16894.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1361, cr_loss=0.352, over 3339070.15 frames. ], batch size: 58, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 06:59:10,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2024-09-24 06:59:11,910 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.272e+02 1.363e+02 1.482e+02 1.870e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-24 06:59:18,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2024-09-24 06:59:28,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=445783.3333333333, ans=0.0 2024-09-24 06:59:53,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=445876.6666666667, ans=0.125 2024-09-24 06:59:55,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=445876.6666666667, ans=0.2 2024-09-24 07:00:06,180 INFO [train.py:1198] (0/4) Epoch 25, batch 2050, loss[loss=0.2243, ctc_loss=0.1504, cr_loss=0.3695, over 17021.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1358, cr_loss=0.352, over 3343144.73 frames. ], batch size: 56, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 07:00:08,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445923.3333333333, ans=0.1 2024-09-24 07:00:25,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=445970.0, ans=0.125 2024-09-24 07:00:27,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=445970.0, ans=0.025 2024-09-24 07:00:33,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=445970.0, ans=0.0 2024-09-24 07:00:37,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=446016.6666666667, ans=0.0 2024-09-24 07:00:41,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=446016.6666666667, ans=0.0 2024-09-24 07:01:05,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=446063.3333333333, ans=0.125 2024-09-24 07:01:22,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=446110.0, ans=0.1 2024-09-24 07:01:31,630 INFO [train.py:1198] (0/4) Epoch 25, batch 2100, loss[loss=0.2264, ctc_loss=0.156, cr_loss=0.3522, over 11374.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1363, cr_loss=0.3525, over 3326112.10 frames. ], batch size: 123, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 07:01:31,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=446156.6666666667, ans=0.0 2024-09-24 07:01:48,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2024-09-24 07:01:56,976 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.289e+02 1.372e+02 1.523e+02 2.426e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-24 07:02:03,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-24 07:02:08,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=446250.0, ans=0.0 2024-09-24 07:02:22,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=446296.6666666667, ans=0.0 2024-09-24 07:02:32,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=446296.6666666667, ans=0.125 2024-09-24 07:02:54,491 INFO [train.py:1198] (0/4) Epoch 25, batch 2150, loss[loss=0.2186, ctc_loss=0.1456, cr_loss=0.3648, over 17103.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1362, cr_loss=0.3525, over 3341909.49 frames. ], batch size: 49, lr: 4.78e-03, grad_scale: 32.0 2024-09-24 07:02:56,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=446390.0, ans=0.125 2024-09-24 07:03:05,903 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 07:03:52,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=446530.0, ans=0.0 2024-09-24 07:04:00,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=446576.6666666667, ans=0.025 2024-09-24 07:04:00,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=12.0 2024-09-24 07:04:08,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=446576.6666666667, ans=0.0 2024-09-24 07:04:14,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=446576.6666666667, ans=0.5 2024-09-24 07:04:17,543 INFO [train.py:1198] (0/4) Epoch 25, batch 2200, loss[loss=0.1857, ctc_loss=0.1207, cr_loss=0.3248, over 16964.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1366, cr_loss=0.3537, over 3353642.21 frames. ], batch size: 42, lr: 4.78e-03, grad_scale: 16.0 2024-09-24 07:04:27,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=446623.3333333333, ans=0.2 2024-09-24 07:04:30,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=446623.3333333333, ans=0.0 2024-09-24 07:04:44,642 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.273e+02 1.353e+02 1.559e+02 2.204e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 07:05:09,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=446763.3333333333, ans=0.0 2024-09-24 07:05:10,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=446763.3333333333, ans=0.125 2024-09-24 07:05:14,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=446763.3333333333, ans=0.2 2024-09-24 07:05:17,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=12.0 2024-09-24 07:05:35,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=446810.0, ans=0.125 2024-09-24 07:05:37,840 INFO [train.py:1198] (0/4) Epoch 25, batch 2250, loss[loss=0.1646, ctc_loss=0.1057, cr_loss=0.2944, over 16223.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1365, cr_loss=0.354, over 3357948.22 frames. ], batch size: 36, lr: 4.78e-03, grad_scale: 16.0 2024-09-24 07:05:53,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=446856.6666666667, ans=0.2 2024-09-24 07:06:50,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=447043.3333333333, ans=0.125 2024-09-24 07:06:57,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=447043.3333333333, ans=0.0 2024-09-24 07:07:03,316 INFO [train.py:1198] (0/4) Epoch 25, batch 2300, loss[loss=0.1999, ctc_loss=0.1324, cr_loss=0.3378, over 17220.00 frames. ], tot_loss[loss=0.2076, ctc_loss=0.1368, cr_loss=0.3542, over 3365445.17 frames. ], batch size: 50, lr: 4.78e-03, grad_scale: 16.0 2024-09-24 07:07:21,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2024-09-24 07:07:30,512 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.028e+02 1.259e+02 1.322e+02 1.446e+02 1.963e+02, threshold=2.643e+02, percent-clipped=0.0 2024-09-24 07:07:41,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=447183.3333333333, ans=0.0 2024-09-24 07:08:04,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-24 07:08:06,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447230.0, ans=0.1 2024-09-24 07:08:15,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=447276.6666666667, ans=0.0 2024-09-24 07:08:23,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447276.6666666667, ans=0.1 2024-09-24 07:08:28,420 INFO [train.py:1198] (0/4) Epoch 25, batch 2350, loss[loss=0.2391, ctc_loss=0.1632, cr_loss=0.3796, over 17022.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1363, cr_loss=0.3532, over 3375138.17 frames. ], batch size: 53, lr: 4.77e-03, grad_scale: 16.0 2024-09-24 07:08:41,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2024-09-24 07:08:44,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=447370.0, ans=0.125 2024-09-24 07:08:47,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=447370.0, ans=0.125 2024-09-24 07:09:19,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=12.0 2024-09-24 07:09:22,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=447463.3333333333, ans=0.125 2024-09-24 07:09:47,647 INFO [train.py:1198] (0/4) Epoch 25, batch 2400, loss[loss=0.1681, ctc_loss=0.1064, cr_loss=0.3086, over 17014.00 frames. ], tot_loss[loss=0.2075, ctc_loss=0.1366, cr_loss=0.3541, over 3364847.23 frames. ], batch size: 39, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:09:59,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=447556.6666666667, ans=0.125 2024-09-24 07:10:13,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=447603.3333333333, ans=0.0 2024-09-24 07:10:14,558 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.299e+02 1.395e+02 1.509e+02 2.801e+02, threshold=2.791e+02, percent-clipped=1.0 2024-09-24 07:10:21,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=447650.0, ans=0.0 2024-09-24 07:10:46,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=447696.6666666667, ans=0.125 2024-09-24 07:11:08,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=447743.3333333333, ans=0.09899494936611666 2024-09-24 07:11:11,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=447790.0, ans=0.0 2024-09-24 07:11:12,577 INFO [train.py:1198] (0/4) Epoch 25, batch 2450, loss[loss=0.2337, ctc_loss=0.1562, cr_loss=0.3878, over 17088.00 frames. ], tot_loss[loss=0.2085, ctc_loss=0.1376, cr_loss=0.3544, over 3343102.20 frames. ], batch size: 49, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:11:33,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=447836.6666666667, ans=0.0 2024-09-24 07:12:04,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=447930.0, ans=0.125 2024-09-24 07:12:09,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=447930.0, ans=0.125 2024-09-24 07:12:15,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=447976.6666666667, ans=0.1 2024-09-24 07:12:25,828 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-96000.pt 2024-09-24 07:12:31,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=447976.6666666667, ans=0.0 2024-09-24 07:12:37,502 INFO [train.py:1198] (0/4) Epoch 25, batch 2500, loss[loss=0.2062, ctc_loss=0.1351, cr_loss=0.3555, over 17304.00 frames. ], tot_loss[loss=0.2088, ctc_loss=0.1378, cr_loss=0.3551, over 3343216.70 frames. ], batch size: 46, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:12:46,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=22.5 2024-09-24 07:13:03,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=448070.0, ans=0.125 2024-09-24 07:13:04,743 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.243e+02 1.319e+02 1.410e+02 2.410e+02, threshold=2.638e+02, percent-clipped=0.0 2024-09-24 07:13:04,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=448070.0, ans=0.0 2024-09-24 07:13:29,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=448163.3333333333, ans=0.0 2024-09-24 07:13:51,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=448210.0, ans=12.0 2024-09-24 07:13:57,202 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=22.5 2024-09-24 07:13:58,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-09-24 07:13:59,333 INFO [train.py:1198] (0/4) Epoch 25, batch 2550, loss[loss=0.1925, ctc_loss=0.1243, cr_loss=0.3409, over 17095.00 frames. ], tot_loss[loss=0.2076, ctc_loss=0.1368, cr_loss=0.354, over 3349462.64 frames. ], batch size: 49, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:14:16,420 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.39 vs. limit=15.0 2024-09-24 07:14:18,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=448303.3333333333, ans=0.125 2024-09-24 07:14:39,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=448350.0, ans=0.125 2024-09-24 07:14:52,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-09-24 07:15:19,295 INFO [train.py:1198] (0/4) Epoch 25, batch 2600, loss[loss=0.223, ctc_loss=0.1468, cr_loss=0.3812, over 17001.00 frames. ], tot_loss[loss=0.2076, ctc_loss=0.1367, cr_loss=0.3545, over 3357429.97 frames. ], batch size: 51, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:15:43,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=448536.6666666667, ans=0.2 2024-09-24 07:15:48,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=448536.6666666667, ans=0.0 2024-09-24 07:15:51,575 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.258e+02 1.378e+02 1.492e+02 2.205e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-24 07:16:44,544 INFO [train.py:1198] (0/4) Epoch 25, batch 2650, loss[loss=0.2329, ctc_loss=0.1566, cr_loss=0.3816, over 16808.00 frames. ], tot_loss[loss=0.2081, ctc_loss=0.1372, cr_loss=0.3548, over 3352376.94 frames. ], batch size: 61, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:17:30,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=448816.6666666667, ans=0.5 2024-09-24 07:17:46,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=12.0 2024-09-24 07:17:50,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-24 07:17:53,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=448910.0, ans=0.0 2024-09-24 07:18:01,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=448910.0, ans=0.125 2024-09-24 07:18:10,106 INFO [train.py:1198] (0/4) Epoch 25, batch 2700, loss[loss=0.2084, ctc_loss=0.1384, cr_loss=0.3499, over 16761.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.1377, cr_loss=0.3559, over 3351994.43 frames. ], batch size: 61, lr: 4.77e-03, grad_scale: 32.0 2024-09-24 07:18:23,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=448956.6666666667, ans=0.125 2024-09-24 07:18:37,168 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.035e+02 1.249e+02 1.338e+02 1.409e+02 1.725e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 07:18:40,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=449050.0, ans=0.1 2024-09-24 07:18:44,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=449050.0, ans=0.125 2024-09-24 07:18:49,500 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=22.5 2024-09-24 07:18:53,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=449050.0, ans=0.0 2024-09-24 07:19:14,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=449143.3333333333, ans=0.125 2024-09-24 07:19:29,778 INFO [train.py:1198] (0/4) Epoch 25, batch 2750, loss[loss=0.2415, ctc_loss=0.1689, cr_loss=0.363, over 11767.00 frames. ], tot_loss[loss=0.2088, ctc_loss=0.1376, cr_loss=0.3558, over 3349707.83 frames. ], batch size: 124, lr: 4.76e-03, grad_scale: 16.0 2024-09-24 07:19:34,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=449190.0, ans=0.125 2024-09-24 07:19:47,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=449236.6666666667, ans=0.2 2024-09-24 07:19:58,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=449236.6666666667, ans=0.025 2024-09-24 07:20:55,050 INFO [train.py:1198] (0/4) Epoch 25, batch 2800, loss[loss=0.2304, ctc_loss=0.1545, cr_loss=0.3796, over 15988.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.1377, cr_loss=0.3557, over 3340326.36 frames. ], batch size: 74, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:21:03,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=449423.3333333333, ans=0.0 2024-09-24 07:21:06,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=449423.3333333333, ans=0.2 2024-09-24 07:21:23,921 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.252e+02 1.365e+02 1.477e+02 3.816e+02, threshold=2.730e+02, percent-clipped=1.0 2024-09-24 07:21:48,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=449563.3333333333, ans=0.125 2024-09-24 07:21:54,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=449563.3333333333, ans=0.125 2024-09-24 07:21:58,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=12.0 2024-09-24 07:22:07,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449610.0, ans=0.1 2024-09-24 07:22:17,828 INFO [train.py:1198] (0/4) Epoch 25, batch 2850, loss[loss=0.1943, ctc_loss=0.1292, cr_loss=0.3257, over 17244.00 frames. ], tot_loss[loss=0.2073, ctc_loss=0.1366, cr_loss=0.3536, over 3344127.82 frames. ], batch size: 44, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:22:18,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=449656.6666666667, ans=0.2 2024-09-24 07:22:18,752 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-09-24 07:23:23,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=449843.3333333333, ans=0.025 2024-09-24 07:23:28,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=449843.3333333333, ans=0.0 2024-09-24 07:23:37,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=449843.3333333333, ans=0.025 2024-09-24 07:23:37,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=449843.3333333333, ans=0.2 2024-09-24 07:23:40,781 INFO [train.py:1198] (0/4) Epoch 25, batch 2900, loss[loss=0.1906, ctc_loss=0.1235, cr_loss=0.3359, over 17156.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1355, cr_loss=0.3513, over 3344394.64 frames. ], batch size: 45, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:24:00,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=449936.6666666667, ans=0.025 2024-09-24 07:24:09,619 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.270e+02 1.360e+02 1.492e+02 4.410e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-24 07:24:32,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=450030.0, ans=0.0 2024-09-24 07:24:37,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.87 vs. limit=12.0 2024-09-24 07:24:53,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=450076.6666666667, ans=0.125 2024-09-24 07:24:58,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=22.5 2024-09-24 07:24:58,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-09-24 07:25:00,615 INFO [train.py:1198] (0/4) Epoch 25, batch 2950, loss[loss=0.2563, ctc_loss=0.1711, cr_loss=0.426, over 17037.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1353, cr_loss=0.3511, over 3355914.92 frames. ], batch size: 52, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:25:02,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=450123.3333333333, ans=0.125 2024-09-24 07:25:21,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=450170.0, ans=0.09899494936611666 2024-09-24 07:25:45,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2024-09-24 07:25:52,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=450263.3333333333, ans=0.015 2024-09-24 07:25:55,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=450263.3333333333, ans=0.0 2024-09-24 07:26:01,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=450263.3333333333, ans=0.0 2024-09-24 07:26:03,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=450263.3333333333, ans=0.125 2024-09-24 07:26:16,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=450310.0, ans=0.0 2024-09-24 07:26:22,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=450310.0, ans=0.125 2024-09-24 07:26:25,453 INFO [train.py:1198] (0/4) Epoch 25, batch 3000, loss[loss=0.2094, ctc_loss=0.1366, cr_loss=0.3643, over 17293.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1361, cr_loss=0.3526, over 3351158.76 frames. ], batch size: 49, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:26:25,454 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 07:26:41,272 INFO [train.py:1230] (0/4) Epoch 25, validation: loss=0.03749, ctc_loss=0.03749, cr_loss=8.201e-15, over 944034.00 frames. 2024-09-24 07:26:41,273 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 07:26:44,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=450356.6666666667, ans=0.0 2024-09-24 07:26:46,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=450356.6666666667, ans=0.0 2024-09-24 07:27:09,644 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.216e+02 1.332e+02 1.454e+02 2.331e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-24 07:27:39,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=450496.6666666667, ans=0.125 2024-09-24 07:28:01,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=450590.0, ans=0.125 2024-09-24 07:28:02,384 INFO [train.py:1198] (0/4) Epoch 25, batch 3050, loss[loss=0.2113, ctc_loss=0.1361, cr_loss=0.3762, over 17108.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1364, cr_loss=0.3536, over 3341370.61 frames. ], batch size: 49, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:28:08,936 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 07:28:36,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-24 07:28:48,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2024-09-24 07:29:02,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=450730.0, ans=0.2 2024-09-24 07:29:15,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=450776.6666666667, ans=0.125 2024-09-24 07:29:21,016 INFO [train.py:1198] (0/4) Epoch 25, batch 3100, loss[loss=0.2365, ctc_loss=0.1541, cr_loss=0.4119, over 17030.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1358, cr_loss=0.3522, over 3343838.78 frames. ], batch size: 53, lr: 4.76e-03, grad_scale: 32.0 2024-09-24 07:29:30,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=450823.3333333333, ans=0.2 2024-09-24 07:29:51,454 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.263e+02 1.349e+02 1.464e+02 5.566e+02, threshold=2.698e+02, percent-clipped=1.0 2024-09-24 07:30:12,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450963.3333333333, ans=0.1 2024-09-24 07:30:18,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=450963.3333333333, ans=0.0 2024-09-24 07:30:41,215 INFO [train.py:1198] (0/4) Epoch 25, batch 3150, loss[loss=0.1991, ctc_loss=0.128, cr_loss=0.3554, over 17183.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1356, cr_loss=0.3522, over 3351513.94 frames. ], batch size: 41, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:31:11,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=451150.0, ans=0.2 2024-09-24 07:31:31,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=451196.6666666667, ans=0.125 2024-09-24 07:31:39,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=451196.6666666667, ans=0.125 2024-09-24 07:31:40,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=451196.6666666667, ans=0.125 2024-09-24 07:31:58,815 INFO [train.py:1198] (0/4) Epoch 25, batch 3200, loss[loss=0.1957, ctc_loss=0.1288, cr_loss=0.3349, over 17022.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1357, cr_loss=0.3517, over 3347447.77 frames. ], batch size: 51, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:32:09,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=15.0 2024-09-24 07:32:15,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=22.5 2024-09-24 07:32:26,943 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.256e+02 1.371e+02 1.503e+02 1.985e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-24 07:32:27,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=451336.6666666667, ans=0.025 2024-09-24 07:33:16,640 INFO [train.py:1198] (0/4) Epoch 25, batch 3250, loss[loss=0.2199, ctc_loss=0.1465, cr_loss=0.367, over 17060.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1361, cr_loss=0.3527, over 3344402.13 frames. ], batch size: 46, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:33:37,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=451570.0, ans=0.0 2024-09-24 07:33:55,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=451616.6666666667, ans=0.0 2024-09-24 07:34:08,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-09-24 07:34:10,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2024-09-24 07:34:31,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=451710.0, ans=0.025 2024-09-24 07:34:34,604 INFO [train.py:1198] (0/4) Epoch 25, batch 3300, loss[loss=0.2517, ctc_loss=0.1715, cr_loss=0.4009, over 11941.00 frames. ], tot_loss[loss=0.2071, ctc_loss=0.1365, cr_loss=0.3528, over 3337875.97 frames. ], batch size: 123, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:34:47,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451756.6666666667, ans=0.1 2024-09-24 07:35:04,443 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.271e+02 1.364e+02 1.475e+02 2.205e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-24 07:35:33,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=451896.6666666667, ans=0.125 2024-09-24 07:35:42,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=451943.3333333333, ans=0.125 2024-09-24 07:35:47,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-24 07:35:56,275 INFO [train.py:1198] (0/4) Epoch 25, batch 3350, loss[loss=0.1831, ctc_loss=0.1174, cr_loss=0.3284, over 17260.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.136, cr_loss=0.3523, over 3350606.02 frames. ], batch size: 44, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:35:58,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-09-24 07:36:01,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=451990.0, ans=0.125 2024-09-24 07:36:50,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=452130.0, ans=0.125 2024-09-24 07:36:51,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=452130.0, ans=0.0 2024-09-24 07:36:56,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=452130.0, ans=0.125 2024-09-24 07:37:02,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=452176.6666666667, ans=0.0 2024-09-24 07:37:15,194 INFO [train.py:1198] (0/4) Epoch 25, batch 3400, loss[loss=0.2421, ctc_loss=0.1627, cr_loss=0.397, over 17222.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.136, cr_loss=0.3525, over 3353833.52 frames. ], batch size: 55, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:37:15,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=452223.3333333333, ans=0.125 2024-09-24 07:37:26,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=452223.3333333333, ans=0.2 2024-09-24 07:37:43,411 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.052e+02 1.278e+02 1.359e+02 1.525e+02 3.338e+02, threshold=2.719e+02, percent-clipped=1.0 2024-09-24 07:38:04,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-24 07:38:07,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=452363.3333333333, ans=0.0 2024-09-24 07:38:30,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=452410.0, ans=0.0 2024-09-24 07:38:35,006 INFO [train.py:1198] (0/4) Epoch 25, batch 3450, loss[loss=0.1914, ctc_loss=0.1248, cr_loss=0.3329, over 16660.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1356, cr_loss=0.3522, over 3347719.18 frames. ], batch size: 37, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:38:41,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=452456.6666666667, ans=0.0 2024-09-24 07:38:44,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=452456.6666666667, ans=0.125 2024-09-24 07:39:09,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=452550.0, ans=0.125 2024-09-24 07:39:46,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=452643.3333333333, ans=0.0 2024-09-24 07:39:55,475 INFO [train.py:1198] (0/4) Epoch 25, batch 3500, loss[loss=0.1843, ctc_loss=0.1202, cr_loss=0.3206, over 17281.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1347, cr_loss=0.3504, over 3347202.45 frames. ], batch size: 42, lr: 4.75e-03, grad_scale: 32.0 2024-09-24 07:39:57,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2024-09-24 07:40:06,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=452690.0, ans=0.0 2024-09-24 07:40:17,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=452736.6666666667, ans=10.0 2024-09-24 07:40:23,618 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.253e+02 1.349e+02 1.455e+02 2.168e+02, threshold=2.697e+02, percent-clipped=1.0 2024-09-24 07:40:57,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2024-09-24 07:41:13,559 INFO [train.py:1198] (0/4) Epoch 25, batch 3550, loss[loss=0.1918, ctc_loss=0.1261, cr_loss=0.3285, over 17159.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1356, cr_loss=0.3516, over 3348503.98 frames. ], batch size: 48, lr: 4.74e-03, grad_scale: 16.0 2024-09-24 07:41:26,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=452923.3333333333, ans=0.125 2024-09-24 07:41:37,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=452970.0, ans=0.125 2024-09-24 07:41:58,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453063.3333333333, ans=0.1 2024-09-24 07:42:00,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=12.0 2024-09-24 07:42:25,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=453110.0, ans=0.2 2024-09-24 07:42:31,018 INFO [train.py:1198] (0/4) Epoch 25, batch 3600, loss[loss=0.1705, ctc_loss=0.1081, cr_loss=0.3118, over 16921.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1346, cr_loss=0.3507, over 3351288.42 frames. ], batch size: 42, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:42:39,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=453156.6666666667, ans=0.125 2024-09-24 07:42:45,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=453203.3333333333, ans=0.125 2024-09-24 07:42:59,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453203.3333333333, ans=0.1 2024-09-24 07:43:00,636 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.246e+02 1.327e+02 1.467e+02 2.954e+02, threshold=2.655e+02, percent-clipped=1.0 2024-09-24 07:43:32,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-09-24 07:43:48,679 INFO [train.py:1198] (0/4) Epoch 25, batch 3650, loss[loss=0.2179, ctc_loss=0.143, cr_loss=0.3745, over 17222.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1354, cr_loss=0.3525, over 3358590.36 frames. ], batch size: 50, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:44:10,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=453436.6666666667, ans=0.0 2024-09-24 07:44:23,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=453483.3333333333, ans=0.125 2024-09-24 07:44:29,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453483.3333333333, ans=0.1 2024-09-24 07:44:30,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2024-09-24 07:44:44,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=453530.0, ans=0.2 2024-09-24 07:44:57,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=453576.6666666667, ans=0.0 2024-09-24 07:45:01,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=453576.6666666667, ans=0.5 2024-09-24 07:45:09,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=453576.6666666667, ans=0.05 2024-09-24 07:45:11,867 INFO [train.py:1198] (0/4) Epoch 25, batch 3700, loss[loss=0.2381, ctc_loss=0.1667, cr_loss=0.3574, over 11950.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1354, cr_loss=0.3525, over 3354421.45 frames. ], batch size: 123, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:45:19,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=453623.3333333333, ans=0.1 2024-09-24 07:45:41,668 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.237e+02 1.315e+02 1.374e+02 1.764e+02, threshold=2.629e+02, percent-clipped=0.0 2024-09-24 07:45:59,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=453763.3333333333, ans=0.125 2024-09-24 07:46:08,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453763.3333333333, ans=0.1 2024-09-24 07:46:11,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=453763.3333333333, ans=0.125 2024-09-24 07:46:19,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=453810.0, ans=0.025 2024-09-24 07:46:30,216 INFO [train.py:1198] (0/4) Epoch 25, batch 3750, loss[loss=0.2501, ctc_loss=0.166, cr_loss=0.4206, over 15124.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1362, cr_loss=0.3533, over 3345070.58 frames. ], batch size: 89, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:46:30,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=453856.6666666667, ans=0.2 2024-09-24 07:46:38,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=453856.6666666667, ans=0.2 2024-09-24 07:46:44,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=453903.3333333333, ans=0.125 2024-09-24 07:46:51,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2024-09-24 07:46:54,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2024-09-24 07:46:58,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=453903.3333333333, ans=0.0 2024-09-24 07:46:59,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2024-09-24 07:47:49,006 INFO [train.py:1198] (0/4) Epoch 25, batch 3800, loss[loss=0.2135, ctc_loss=0.1402, cr_loss=0.3662, over 17003.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1362, cr_loss=0.352, over 3315265.82 frames. ], batch size: 56, lr: 4.74e-03, grad_scale: 32.0 2024-09-24 07:48:14,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=454136.6666666667, ans=0.125 2024-09-24 07:48:18,770 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.251e+02 1.351e+02 1.487e+02 2.041e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-24 07:48:23,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=454183.3333333333, ans=0.125 2024-09-24 07:48:23,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=454183.3333333333, ans=0.125 2024-09-24 07:48:41,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=454230.0, ans=0.125 2024-09-24 07:48:44,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=454230.0, ans=0.1 2024-09-24 07:48:57,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=454276.6666666667, ans=0.035 2024-09-24 07:49:07,750 INFO [train.py:1198] (0/4) Epoch 25, batch 3850, loss[loss=0.2488, ctc_loss=0.1746, cr_loss=0.371, over 11993.00 frames. ], tot_loss[loss=0.209, ctc_loss=0.1381, cr_loss=0.3545, over 3264352.80 frames. ], batch size: 123, lr: 4.74e-03, grad_scale: 16.0 2024-09-24 07:49:15,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454323.3333333333, ans=0.1 2024-09-24 07:49:44,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=454416.6666666667, ans=0.125 2024-09-24 07:50:17,705 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-25.pt 2024-09-24 07:51:09,081 INFO [train.py:1198] (0/4) Epoch 26, batch 0, loss[loss=0.2047, ctc_loss=0.1364, cr_loss=0.3413, over 17212.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1364, cr_loss=0.3413, over 17212.00 frames. ], batch size: 50, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:51:09,082 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 07:51:25,123 INFO [train.py:1230] (0/4) Epoch 26, validation: loss=0.03743, ctc_loss=0.03743, cr_loss=8.662e-15, over 944034.00 frames. 2024-09-24 07:51:25,123 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 07:51:29,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=454538.0, ans=15.0 2024-09-24 07:51:45,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=454584.6666666667, ans=0.125 2024-09-24 07:52:06,030 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.313e+02 1.484e+02 1.654e+02 2.315e+02, threshold=2.969e+02, percent-clipped=0.0 2024-09-24 07:52:37,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=454724.6666666667, ans=0.2 2024-09-24 07:52:47,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=454724.6666666667, ans=0.035 2024-09-24 07:52:50,422 INFO [train.py:1198] (0/4) Epoch 26, batch 50, loss[loss=0.2096, ctc_loss=0.1407, cr_loss=0.3444, over 16979.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1331, cr_loss=0.3468, over 750443.33 frames. ], batch size: 53, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:52:52,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=454771.3333333333, ans=0.0 2024-09-24 07:52:54,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2024-09-24 07:53:38,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=454911.3333333333, ans=0.0 2024-09-24 07:53:41,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=454911.3333333333, ans=0.0 2024-09-24 07:54:06,865 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 07:54:06,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=454958.0, ans=0.125 2024-09-24 07:54:11,278 INFO [train.py:1198] (0/4) Epoch 26, batch 100, loss[loss=0.1845, ctc_loss=0.1196, cr_loss=0.3247, over 17277.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1331, cr_loss=0.3484, over 1325212.69 frames. ], batch size: 42, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:54:14,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=455004.6666666667, ans=0.125 2024-09-24 07:54:29,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.37 vs. limit=15.0 2024-09-24 07:54:30,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=455051.3333333333, ans=0.125 2024-09-24 07:54:51,770 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.228e+02 1.285e+02 1.398e+02 1.660e+02, threshold=2.570e+02, percent-clipped=0.0 2024-09-24 07:54:59,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=455144.6666666667, ans=0.125 2024-09-24 07:55:27,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=455191.3333333333, ans=0.2 2024-09-24 07:55:31,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=455238.0, ans=0.125 2024-09-24 07:55:33,074 INFO [train.py:1198] (0/4) Epoch 26, batch 150, loss[loss=0.2316, ctc_loss=0.1546, cr_loss=0.3853, over 16552.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1334, cr_loss=0.3488, over 1772056.31 frames. ], batch size: 66, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:55:36,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=455238.0, ans=0.125 2024-09-24 07:55:46,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=455238.0, ans=0.0 2024-09-24 07:55:49,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=455284.6666666667, ans=0.125 2024-09-24 07:56:44,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=455424.6666666667, ans=0.125 2024-09-24 07:56:55,946 INFO [train.py:1198] (0/4) Epoch 26, batch 200, loss[loss=0.1966, ctc_loss=0.1292, cr_loss=0.337, over 15983.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1337, cr_loss=0.3498, over 2115580.48 frames. ], batch size: 74, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:57:04,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=455471.3333333333, ans=0.0 2024-09-24 07:57:10,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=455518.0, ans=0.125 2024-09-24 07:57:37,052 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.254e+02 1.327e+02 1.395e+02 2.472e+02, threshold=2.655e+02, percent-clipped=0.0 2024-09-24 07:57:41,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=22.5 2024-09-24 07:57:48,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=455611.3333333333, ans=0.0 2024-09-24 07:57:50,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=455611.3333333333, ans=0.125 2024-09-24 07:58:15,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=455658.0, ans=0.125 2024-09-24 07:58:17,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=455658.0, ans=0.125 2024-09-24 07:58:21,704 INFO [train.py:1198] (0/4) Epoch 26, batch 250, loss[loss=0.2071, ctc_loss=0.1372, cr_loss=0.3496, over 17316.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1351, cr_loss=0.3532, over 2398133.77 frames. ], batch size: 51, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:59:11,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=455844.6666666667, ans=0.0 2024-09-24 07:59:14,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=455844.6666666667, ans=0.125 2024-09-24 07:59:25,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455844.6666666667, ans=0.125 2024-09-24 07:59:33,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=455891.3333333333, ans=0.125 2024-09-24 07:59:43,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455938.0, ans=0.1 2024-09-24 07:59:44,775 INFO [train.py:1198] (0/4) Epoch 26, batch 300, loss[loss=0.1938, ctc_loss=0.1232, cr_loss=0.3527, over 17193.00 frames. ], tot_loss[loss=0.2053, ctc_loss=0.1348, cr_loss=0.3527, over 2613267.75 frames. ], batch size: 41, lr: 4.64e-03, grad_scale: 32.0 2024-09-24 07:59:56,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2024-09-24 07:59:57,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=455938.0, ans=0.05 2024-09-24 08:00:01,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=455984.6666666667, ans=0.0 2024-09-24 08:00:22,960 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.258e+02 1.353e+02 1.431e+02 1.919e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 08:00:33,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456078.0, ans=0.1 2024-09-24 08:00:37,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=456078.0, ans=0.125 2024-09-24 08:01:04,577 INFO [train.py:1198] (0/4) Epoch 26, batch 350, loss[loss=0.2057, ctc_loss=0.1357, cr_loss=0.3502, over 17016.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1357, cr_loss=0.3538, over 2779831.02 frames. ], batch size: 51, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:02:30,050 INFO [train.py:1198] (0/4) Epoch 26, batch 400, loss[loss=0.1979, ctc_loss=0.1303, cr_loss=0.3379, over 17142.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1362, cr_loss=0.3539, over 2904221.71 frames. ], batch size: 48, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:02:58,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=456451.3333333333, ans=0.125 2024-09-24 08:03:09,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=456498.0, ans=0.125 2024-09-24 08:03:12,625 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.258e+02 1.321e+02 1.410e+02 2.001e+02, threshold=2.643e+02, percent-clipped=0.0 2024-09-24 08:03:24,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=15.0 2024-09-24 08:03:52,934 INFO [train.py:1198] (0/4) Epoch 26, batch 450, loss[loss=0.2135, ctc_loss=0.1449, cr_loss=0.3429, over 17128.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1352, cr_loss=0.3525, over 3005619.05 frames. ], batch size: 48, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:03:56,543 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2024-09-24 08:05:15,525 INFO [train.py:1198] (0/4) Epoch 26, batch 500, loss[loss=0.1947, ctc_loss=0.1272, cr_loss=0.3372, over 17310.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1351, cr_loss=0.3521, over 3070065.20 frames. ], batch size: 46, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:05:55,349 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.259e+02 1.357e+02 1.518e+02 2.705e+02, threshold=2.714e+02, percent-clipped=1.0 2024-09-24 08:06:20,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=457058.0, ans=0.125 2024-09-24 08:06:25,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=457058.0, ans=0.035 2024-09-24 08:06:26,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-09-24 08:06:37,535 INFO [train.py:1198] (0/4) Epoch 26, batch 550, loss[loss=0.2408, ctc_loss=0.1641, cr_loss=0.3838, over 16497.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1354, cr_loss=0.3524, over 3138937.63 frames. ], batch size: 66, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:06:37,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=457104.6666666667, ans=0.2 2024-09-24 08:06:51,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=22.5 2024-09-24 08:06:52,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=457151.3333333333, ans=0.0 2024-09-24 08:07:16,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.72 vs. limit=15.0 2024-09-24 08:07:21,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=457198.0, ans=0.025 2024-09-24 08:07:36,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=457244.6666666667, ans=0.2 2024-09-24 08:07:54,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=457291.3333333333, ans=0.0 2024-09-24 08:08:00,422 INFO [train.py:1198] (0/4) Epoch 26, batch 600, loss[loss=0.2134, ctc_loss=0.1401, cr_loss=0.3668, over 17004.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1357, cr_loss=0.3523, over 3173028.74 frames. ], batch size: 53, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:08:27,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=457384.6666666667, ans=0.2 2024-09-24 08:08:41,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=457431.3333333333, ans=0.125 2024-09-24 08:08:42,969 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.231e+02 1.314e+02 1.456e+02 1.940e+02, threshold=2.629e+02, percent-clipped=0.0 2024-09-24 08:08:54,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=457478.0, ans=0.125 2024-09-24 08:08:54,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=457478.0, ans=0.2 2024-09-24 08:09:14,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=457524.6666666667, ans=0.02 2024-09-24 08:09:26,036 INFO [train.py:1198] (0/4) Epoch 26, batch 650, loss[loss=0.1928, ctc_loss=0.1247, cr_loss=0.3406, over 17288.00 frames. ], tot_loss[loss=0.206, ctc_loss=0.1356, cr_loss=0.3521, over 3209652.74 frames. ], batch size: 46, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:09:26,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=457571.3333333333, ans=0.035 2024-09-24 08:09:29,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=457571.3333333333, ans=0.0 2024-09-24 08:09:45,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=457618.0, ans=0.0 2024-09-24 08:09:56,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.91 vs. limit=22.5 2024-09-24 08:10:06,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=457664.6666666667, ans=0.0 2024-09-24 08:10:09,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=457664.6666666667, ans=0.0 2024-09-24 08:10:13,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=457711.3333333333, ans=0.125 2024-09-24 08:10:20,176 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2024-09-24 08:10:46,656 INFO [train.py:1198] (0/4) Epoch 26, batch 700, loss[loss=0.1766, ctc_loss=0.1107, cr_loss=0.3293, over 16963.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.135, cr_loss=0.3522, over 3246396.37 frames. ], batch size: 42, lr: 4.63e-03, grad_scale: 32.0 2024-09-24 08:11:12,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=457851.3333333333, ans=0.0 2024-09-24 08:11:16,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2024-09-24 08:11:28,400 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.269e+02 1.380e+02 1.511e+02 2.346e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-24 08:11:39,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=457944.6666666667, ans=0.0 2024-09-24 08:11:50,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-24 08:11:55,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=457991.3333333333, ans=0.125 2024-09-24 08:12:11,813 INFO [train.py:1198] (0/4) Epoch 26, batch 750, loss[loss=0.227, ctc_loss=0.1517, cr_loss=0.3764, over 14872.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1357, cr_loss=0.3539, over 3270722.12 frames. ], batch size: 89, lr: 4.63e-03, grad_scale: 16.0 2024-09-24 08:12:15,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2024-09-24 08:12:25,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=22.5 2024-09-24 08:12:31,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=458084.6666666667, ans=0.0 2024-09-24 08:12:49,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=458131.3333333333, ans=0.125 2024-09-24 08:13:06,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=458178.0, ans=0.04949747468305833 2024-09-24 08:13:34,684 INFO [train.py:1198] (0/4) Epoch 26, batch 800, loss[loss=0.1951, ctc_loss=0.1266, cr_loss=0.3428, over 17277.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.1358, cr_loss=0.3547, over 3299899.69 frames. ], batch size: 46, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:14:04,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=22.5 2024-09-24 08:14:18,901 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.021e+02 1.286e+02 1.386e+02 1.460e+02 2.197e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-24 08:14:19,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=458364.6666666667, ans=0.0 2024-09-24 08:14:27,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=458411.3333333333, ans=0.125 2024-09-24 08:14:32,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=458411.3333333333, ans=0.125 2024-09-24 08:14:33,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=458411.3333333333, ans=0.125 2024-09-24 08:14:55,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=458504.6666666667, ans=0.125 2024-09-24 08:14:57,080 INFO [train.py:1198] (0/4) Epoch 26, batch 850, loss[loss=0.2137, ctc_loss=0.1435, cr_loss=0.3512, over 14999.00 frames. ], tot_loss[loss=0.2056, ctc_loss=0.135, cr_loss=0.353, over 3303221.04 frames. ], batch size: 89, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:15:02,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=458504.6666666667, ans=0.125 2024-09-24 08:15:18,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=458551.3333333333, ans=0.0 2024-09-24 08:15:19,152 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2024-09-24 08:15:42,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=458598.0, ans=0.0 2024-09-24 08:15:44,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-24 08:15:47,403 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:16:17,229 INFO [train.py:1198] (0/4) Epoch 26, batch 900, loss[loss=0.2201, ctc_loss=0.1431, cr_loss=0.385, over 17195.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1355, cr_loss=0.3535, over 3310610.13 frames. ], batch size: 55, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:16:43,976 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:16:45,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=458784.6666666667, ans=0.125 2024-09-24 08:17:01,225 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.265e+02 1.347e+02 1.465e+02 1.812e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-24 08:17:10,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=458878.0, ans=0.015 2024-09-24 08:17:19,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=458878.0, ans=0.125 2024-09-24 08:17:26,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=458924.6666666667, ans=0.0 2024-09-24 08:17:26,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=458924.6666666667, ans=0.125 2024-09-24 08:17:32,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=458924.6666666667, ans=0.125 2024-09-24 08:17:33,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=458924.6666666667, ans=0.125 2024-09-24 08:17:40,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=458971.3333333333, ans=0.2 2024-09-24 08:17:41,525 INFO [train.py:1198] (0/4) Epoch 26, batch 950, loss[loss=0.2279, ctc_loss=0.1509, cr_loss=0.385, over 15172.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.1358, cr_loss=0.3536, over 3314358.56 frames. ], batch size: 89, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:18:31,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459111.3333333333, ans=0.1 2024-09-24 08:18:37,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=459111.3333333333, ans=0.1 2024-09-24 08:18:47,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=459158.0, ans=0.0 2024-09-24 08:18:48,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=459158.0, ans=0.125 2024-09-24 08:19:01,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=459158.0, ans=10.0 2024-09-24 08:19:04,839 INFO [train.py:1198] (0/4) Epoch 26, batch 1000, loss[loss=0.22, ctc_loss=0.142, cr_loss=0.3895, over 17303.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1357, cr_loss=0.3526, over 3314966.13 frames. ], batch size: 51, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:19:31,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459251.3333333333, ans=0.125 2024-09-24 08:19:38,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=459298.0, ans=0.0 2024-09-24 08:19:42,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=459298.0, ans=0.2 2024-09-24 08:19:48,776 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.281e+02 1.338e+02 1.458e+02 2.157e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 08:20:01,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=459344.6666666667, ans=0.125 2024-09-24 08:20:24,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=459391.3333333333, ans=0.04949747468305833 2024-09-24 08:20:27,261 INFO [train.py:1198] (0/4) Epoch 26, batch 1050, loss[loss=0.1749, ctc_loss=0.1117, cr_loss=0.3161, over 16957.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1346, cr_loss=0.3506, over 3329784.23 frames. ], batch size: 42, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:20:50,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=459484.6666666667, ans=0.125 2024-09-24 08:21:00,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=459531.3333333333, ans=0.0 2024-09-24 08:21:07,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=459531.3333333333, ans=0.2 2024-09-24 08:21:47,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=459624.6666666667, ans=0.025 2024-09-24 08:21:50,062 INFO [train.py:1198] (0/4) Epoch 26, batch 1100, loss[loss=0.2292, ctc_loss=0.1564, cr_loss=0.3639, over 17220.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1348, cr_loss=0.3515, over 3341658.04 frames. ], batch size: 50, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:22:07,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2024-09-24 08:22:34,488 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.255e+02 1.355e+02 1.464e+02 2.621e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-24 08:22:44,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=459811.3333333333, ans=0.125 2024-09-24 08:22:54,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=12.0 2024-09-24 08:23:15,200 INFO [train.py:1198] (0/4) Epoch 26, batch 1150, loss[loss=0.2061, ctc_loss=0.1365, cr_loss=0.3479, over 17259.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.135, cr_loss=0.3523, over 3334404.22 frames. ], batch size: 44, lr: 4.62e-03, grad_scale: 32.0 2024-09-24 08:23:29,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.32 vs. limit=22.5 2024-09-24 08:23:50,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=459998.0, ans=0.125 2024-09-24 08:24:24,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2024-09-24 08:24:29,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=460091.3333333333, ans=0.1 2024-09-24 08:24:37,841 INFO [train.py:1198] (0/4) Epoch 26, batch 1200, loss[loss=0.2091, ctc_loss=0.1368, cr_loss=0.3614, over 17091.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1351, cr_loss=0.3524, over 3338195.53 frames. ], batch size: 49, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:25:19,715 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.237e+02 1.345e+02 1.469e+02 1.862e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-24 08:25:58,190 INFO [train.py:1198] (0/4) Epoch 26, batch 1250, loss[loss=0.2126, ctc_loss=0.1423, cr_loss=0.3512, over 17313.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1346, cr_loss=0.3518, over 3346430.54 frames. ], batch size: 51, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:26:08,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=460371.3333333333, ans=0.125 2024-09-24 08:26:49,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=460511.3333333333, ans=0.0 2024-09-24 08:27:20,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=460558.0, ans=0.025 2024-09-24 08:27:23,278 INFO [train.py:1198] (0/4) Epoch 26, batch 1300, loss[loss=0.253, ctc_loss=0.1738, cr_loss=0.396, over 15128.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.135, cr_loss=0.3523, over 3339711.04 frames. ], batch size: 89, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:27:33,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2024-09-24 08:27:46,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.14 vs. limit=10.0 2024-09-24 08:28:07,144 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.260e+02 1.346e+02 1.486e+02 2.192e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 08:28:36,362 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:28:41,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=460791.3333333333, ans=0.2 2024-09-24 08:28:45,546 INFO [train.py:1198] (0/4) Epoch 26, batch 1350, loss[loss=0.193, ctc_loss=0.1256, cr_loss=0.3371, over 17301.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1351, cr_loss=0.353, over 3355544.52 frames. ], batch size: 46, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:29:10,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=460884.6666666667, ans=0.125 2024-09-24 08:29:28,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=460931.3333333333, ans=0.125 2024-09-24 08:29:34,154 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-24 08:30:08,671 INFO [train.py:1198] (0/4) Epoch 26, batch 1400, loss[loss=0.2001, ctc_loss=0.1322, cr_loss=0.3396, over 17052.00 frames. ], tot_loss[loss=0.2056, ctc_loss=0.1352, cr_loss=0.3521, over 3348039.59 frames. ], batch size: 46, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:30:17,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=461071.3333333333, ans=0.125 2024-09-24 08:30:18,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=461071.3333333333, ans=0.125 2024-09-24 08:30:50,707 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.251e+02 1.327e+02 1.445e+02 2.357e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-24 08:30:54,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=461164.6666666667, ans=0.025 2024-09-24 08:31:18,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=461258.0, ans=0.1 2024-09-24 08:31:31,447 INFO [train.py:1198] (0/4) Epoch 26, batch 1450, loss[loss=0.206, ctc_loss=0.1353, cr_loss=0.3536, over 16799.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1361, cr_loss=0.3542, over 3350465.28 frames. ], batch size: 61, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:31:47,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=461351.3333333333, ans=0.025 2024-09-24 08:32:05,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=461398.0, ans=0.025 2024-09-24 08:32:07,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=461398.0, ans=0.125 2024-09-24 08:32:25,472 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-09-24 08:32:33,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=461444.6666666667, ans=0.0 2024-09-24 08:32:33,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2024-09-24 08:32:42,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=461491.3333333333, ans=0.125 2024-09-24 08:32:45,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=461491.3333333333, ans=0.125 2024-09-24 08:32:50,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=461491.3333333333, ans=0.125 2024-09-24 08:32:55,982 INFO [train.py:1198] (0/4) Epoch 26, batch 1500, loss[loss=0.2044, ctc_loss=0.1331, cr_loss=0.3564, over 17297.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1357, cr_loss=0.3534, over 3345298.76 frames. ], batch size: 51, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:33:07,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=461538.0, ans=0.125 2024-09-24 08:33:12,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=461584.6666666667, ans=0.125 2024-09-24 08:33:14,701 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-09-24 08:33:25,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=461584.6666666667, ans=0.0 2024-09-24 08:33:31,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=461631.3333333333, ans=0.125 2024-09-24 08:33:37,703 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.270e+02 1.352e+02 1.469e+02 1.916e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-24 08:33:41,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-09-24 08:34:03,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=461724.6666666667, ans=0.09899494936611666 2024-09-24 08:34:08,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=461724.6666666667, ans=0.125 2024-09-24 08:34:19,044 INFO [train.py:1198] (0/4) Epoch 26, batch 1550, loss[loss=0.2007, ctc_loss=0.13, cr_loss=0.3539, over 17237.00 frames. ], tot_loss[loss=0.2062, ctc_loss=0.1356, cr_loss=0.3527, over 3354361.53 frames. ], batch size: 47, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:34:28,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=461771.3333333333, ans=0.125 2024-09-24 08:34:46,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=461818.0, ans=0.125 2024-09-24 08:35:21,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=461958.0, ans=0.125 2024-09-24 08:35:34,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=461958.0, ans=0.125 2024-09-24 08:35:36,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=461958.0, ans=0.125 2024-09-24 08:35:38,846 INFO [train.py:1198] (0/4) Epoch 26, batch 1600, loss[loss=0.2199, ctc_loss=0.1423, cr_loss=0.3879, over 17227.00 frames. ], tot_loss[loss=0.2067, ctc_loss=0.136, cr_loss=0.3536, over 3352067.63 frames. ], batch size: 50, lr: 4.61e-03, grad_scale: 32.0 2024-09-24 08:36:22,545 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.247e+02 1.306e+02 1.406e+02 2.052e+02, threshold=2.612e+02, percent-clipped=0.0 2024-09-24 08:36:46,985 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:36:54,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=462191.3333333333, ans=0.125 2024-09-24 08:36:54,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=22.5 2024-09-24 08:37:03,805 INFO [train.py:1198] (0/4) Epoch 26, batch 1650, loss[loss=0.1985, ctc_loss=0.129, cr_loss=0.3472, over 17291.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1361, cr_loss=0.3535, over 3349627.34 frames. ], batch size: 49, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:37:19,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462284.6666666667, ans=0.125 2024-09-24 08:37:24,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=462284.6666666667, ans=0.125 2024-09-24 08:37:37,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=462331.3333333333, ans=0.0 2024-09-24 08:38:10,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=462424.6666666667, ans=0.2 2024-09-24 08:38:26,355 INFO [train.py:1198] (0/4) Epoch 26, batch 1700, loss[loss=0.1673, ctc_loss=0.1092, cr_loss=0.2909, over 16954.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.1361, cr_loss=0.3541, over 3363919.10 frames. ], batch size: 42, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:38:31,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=462471.3333333333, ans=0.1 2024-09-24 08:38:37,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=462471.3333333333, ans=0.2 2024-09-24 08:39:03,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462564.6666666667, ans=0.125 2024-09-24 08:39:10,604 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.222e+02 1.318e+02 1.444e+02 2.333e+02, threshold=2.637e+02, percent-clipped=0.0 2024-09-24 08:39:31,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=462658.0, ans=0.025 2024-09-24 08:39:41,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=462658.0, ans=0.0 2024-09-24 08:39:41,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=462658.0, ans=0.125 2024-09-24 08:39:48,989 INFO [train.py:1198] (0/4) Epoch 26, batch 1750, loss[loss=0.204, ctc_loss=0.1347, cr_loss=0.3462, over 17068.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.1352, cr_loss=0.3527, over 3372824.20 frames. ], batch size: 46, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:40:22,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=462798.0, ans=0.125 2024-09-24 08:41:07,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=462938.0, ans=0.04949747468305833 2024-09-24 08:41:08,744 INFO [train.py:1198] (0/4) Epoch 26, batch 1800, loss[loss=0.2198, ctc_loss=0.1444, cr_loss=0.377, over 16915.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1349, cr_loss=0.3517, over 3368218.12 frames. ], batch size: 58, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:41:34,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=22.5 2024-09-24 08:41:46,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=463031.3333333333, ans=0.125 2024-09-24 08:41:55,219 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.293e+02 1.406e+02 1.567e+02 1.918e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-24 08:41:57,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=463031.3333333333, ans=0.0 2024-09-24 08:41:58,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463031.3333333333, ans=0.1 2024-09-24 08:42:08,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=463078.0, ans=0.025 2024-09-24 08:42:22,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463124.6666666667, ans=0.1 2024-09-24 08:42:33,472 INFO [train.py:1198] (0/4) Epoch 26, batch 1850, loss[loss=0.1974, ctc_loss=0.1279, cr_loss=0.3474, over 17168.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1349, cr_loss=0.3518, over 3367619.40 frames. ], batch size: 45, lr: 4.60e-03, grad_scale: 16.0 2024-09-24 08:43:16,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=463264.6666666667, ans=0.0 2024-09-24 08:43:35,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463311.3333333333, ans=0.1 2024-09-24 08:43:35,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2024-09-24 08:43:41,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=463358.0, ans=0.125 2024-09-24 08:43:43,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463358.0, ans=0.1 2024-09-24 08:43:43,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=463358.0, ans=0.2 2024-09-24 08:43:49,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=463358.0, ans=0.2 2024-09-24 08:43:51,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=463358.0, ans=0.0 2024-09-24 08:43:52,690 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:43:55,586 INFO [train.py:1198] (0/4) Epoch 26, batch 1900, loss[loss=0.1754, ctc_loss=0.1132, cr_loss=0.3109, over 16958.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1344, cr_loss=0.3514, over 3367394.67 frames. ], batch size: 42, lr: 4.60e-03, grad_scale: 16.0 2024-09-24 08:44:04,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=463404.6666666667, ans=0.125 2024-09-24 08:44:38,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463498.0, ans=0.1 2024-09-24 08:44:41,027 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.057e+02 1.265e+02 1.341e+02 1.479e+02 3.030e+02, threshold=2.683e+02, percent-clipped=1.0 2024-09-24 08:44:41,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=463498.0, ans=0.0 2024-09-24 08:44:49,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=463544.6666666667, ans=0.025 2024-09-24 08:44:52,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=463544.6666666667, ans=0.2 2024-09-24 08:45:11,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=463591.3333333333, ans=0.125 2024-09-24 08:45:17,988 INFO [train.py:1198] (0/4) Epoch 26, batch 1950, loss[loss=0.2281, ctc_loss=0.1507, cr_loss=0.3867, over 17002.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.1342, cr_loss=0.3511, over 3369898.54 frames. ], batch size: 53, lr: 4.60e-03, grad_scale: 16.0 2024-09-24 08:45:23,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=463638.0, ans=0.2 2024-09-24 08:46:32,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=463824.6666666667, ans=0.0 2024-09-24 08:46:32,949 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:46:40,693 INFO [train.py:1198] (0/4) Epoch 26, batch 2000, loss[loss=0.1942, ctc_loss=0.1259, cr_loss=0.3415, over 17035.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.1342, cr_loss=0.3509, over 3364566.19 frames. ], batch size: 44, lr: 4.60e-03, grad_scale: 32.0 2024-09-24 08:46:43,590 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=8.0 2024-09-24 08:46:56,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.17 vs. limit=15.0 2024-09-24 08:47:11,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2024-09-24 08:47:16,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2024-09-24 08:47:21,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2024-09-24 08:47:26,729 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.293e+02 1.366e+02 1.489e+02 2.036e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 08:47:55,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=464058.0, ans=0.125 2024-09-24 08:48:01,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=464058.0, ans=0.125 2024-09-24 08:48:06,084 INFO [train.py:1198] (0/4) Epoch 26, batch 2050, loss[loss=0.1913, ctc_loss=0.1223, cr_loss=0.3451, over 17159.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1349, cr_loss=0.3516, over 3357844.90 frames. ], batch size: 45, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:48:23,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=464151.3333333333, ans=0.125 2024-09-24 08:48:41,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=464198.0, ans=0.0 2024-09-24 08:48:41,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=464198.0, ans=0.125 2024-09-24 08:48:54,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=464244.6666666667, ans=0.125 2024-09-24 08:49:13,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=464291.3333333333, ans=0.125 2024-09-24 08:49:21,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=464291.3333333333, ans=0.0 2024-09-24 08:49:28,856 INFO [train.py:1198] (0/4) Epoch 26, batch 2100, loss[loss=0.2317, ctc_loss=0.1565, cr_loss=0.3762, over 17213.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1355, cr_loss=0.3521, over 3351477.20 frames. ], batch size: 50, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:49:35,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=464338.0, ans=0.0 2024-09-24 08:49:42,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=464338.0, ans=0.125 2024-09-24 08:50:09,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464431.3333333333, ans=0.1 2024-09-24 08:50:12,345 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.252e+02 1.369e+02 1.506e+02 2.347e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-24 08:50:18,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.03 vs. limit=12.0 2024-09-24 08:50:34,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=464524.6666666667, ans=0.125 2024-09-24 08:50:48,961 INFO [train.py:1198] (0/4) Epoch 26, batch 2150, loss[loss=0.2181, ctc_loss=0.1447, cr_loss=0.367, over 16751.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1346, cr_loss=0.3508, over 3349917.51 frames. ], batch size: 61, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:50:59,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=12.0 2024-09-24 08:51:05,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2024-09-24 08:51:32,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=464664.6666666667, ans=0.125 2024-09-24 08:51:37,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464711.3333333333, ans=0.1 2024-09-24 08:52:04,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=464758.0, ans=15.0 2024-09-24 08:52:10,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=464758.0, ans=0.2 2024-09-24 08:52:13,762 INFO [train.py:1198] (0/4) Epoch 26, batch 2200, loss[loss=0.2012, ctc_loss=0.1285, cr_loss=0.3633, over 17012.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1345, cr_loss=0.3505, over 3350617.37 frames. ], batch size: 51, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:52:22,147 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 08:52:27,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2024-09-24 08:52:33,768 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=12.0 2024-09-24 08:52:37,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=464851.3333333333, ans=0.0 2024-09-24 08:52:56,972 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.271e+02 1.361e+02 1.483e+02 2.576e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-24 08:53:14,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=464944.6666666667, ans=0.0 2024-09-24 08:53:19,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=464991.3333333333, ans=0.0 2024-09-24 08:53:36,464 INFO [train.py:1198] (0/4) Epoch 26, batch 2250, loss[loss=0.2214, ctc_loss=0.1464, cr_loss=0.3751, over 16436.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.1344, cr_loss=0.3499, over 3357611.95 frames. ], batch size: 66, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:53:39,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465038.0, ans=0.1 2024-09-24 08:53:51,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=465084.6666666667, ans=0.025 2024-09-24 08:53:52,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=465084.6666666667, ans=0.95 2024-09-24 08:54:24,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2024-09-24 08:54:32,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=465178.0, ans=0.125 2024-09-24 08:54:48,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=22.5 2024-09-24 08:54:53,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=465224.6666666667, ans=0.125 2024-09-24 08:54:59,002 INFO [train.py:1198] (0/4) Epoch 26, batch 2300, loss[loss=0.2004, ctc_loss=0.1307, cr_loss=0.3482, over 17040.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1343, cr_loss=0.3508, over 3357712.09 frames. ], batch size: 39, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:54:59,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=465271.3333333333, ans=0.0 2024-09-24 08:55:17,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=465318.0, ans=0.0 2024-09-24 08:55:17,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=465318.0, ans=0.125 2024-09-24 08:55:42,651 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.269e+02 1.366e+02 1.477e+02 2.036e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 08:56:04,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=22.5 2024-09-24 08:56:11,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=465458.0, ans=0.125 2024-09-24 08:56:22,146 INFO [train.py:1198] (0/4) Epoch 26, batch 2350, loss[loss=0.2208, ctc_loss=0.1467, cr_loss=0.3705, over 17358.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1349, cr_loss=0.3519, over 3358581.25 frames. ], batch size: 48, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:56:30,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=465504.6666666667, ans=0.2 2024-09-24 08:56:35,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=465504.6666666667, ans=0.125 2024-09-24 08:56:48,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=465551.3333333333, ans=0.0 2024-09-24 08:57:00,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=465598.0, ans=0.05 2024-09-24 08:57:24,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=465644.6666666667, ans=0.2 2024-09-24 08:57:35,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=465691.3333333333, ans=0.015 2024-09-24 08:57:45,359 INFO [train.py:1198] (0/4) Epoch 26, batch 2400, loss[loss=0.1914, ctc_loss=0.1226, cr_loss=0.3442, over 17190.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1346, cr_loss=0.3517, over 3358180.29 frames. ], batch size: 41, lr: 4.59e-03, grad_scale: 32.0 2024-09-24 08:57:53,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.00 vs. limit=12.0 2024-09-24 08:58:09,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-09-24 08:58:32,543 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.217e+02 1.274e+02 1.391e+02 1.998e+02, threshold=2.548e+02, percent-clipped=0.0 2024-09-24 08:58:32,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=465831.3333333333, ans=0.0 2024-09-24 08:58:39,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-09-24 08:58:51,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=465924.6666666667, ans=0.1 2024-09-24 08:59:04,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-09-24 08:59:10,330 INFO [train.py:1198] (0/4) Epoch 26, batch 2450, loss[loss=0.2617, ctc_loss=0.1735, cr_loss=0.4412, over 16565.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1347, cr_loss=0.3513, over 3360121.81 frames. ], batch size: 66, lr: 4.59e-03, grad_scale: 16.0 2024-09-24 08:59:14,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-09-24 08:59:15,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=465971.3333333333, ans=0.0 2024-09-24 08:59:33,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=466018.0, ans=0.125 2024-09-24 09:00:09,677 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:00:29,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=466204.6666666667, ans=0.5 2024-09-24 09:00:30,248 INFO [train.py:1198] (0/4) Epoch 26, batch 2500, loss[loss=0.2235, ctc_loss=0.1472, cr_loss=0.3813, over 16505.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1339, cr_loss=0.3493, over 3365193.37 frames. ], batch size: 66, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:00:36,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466204.6666666667, ans=0.1 2024-09-24 09:01:05,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=466298.0, ans=0.0 2024-09-24 09:01:17,882 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.260e+02 1.339e+02 1.444e+02 2.110e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 09:01:36,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466391.3333333333, ans=0.1 2024-09-24 09:01:37,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466391.3333333333, ans=0.1 2024-09-24 09:01:51,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=466391.3333333333, ans=0.025 2024-09-24 09:01:56,198 INFO [train.py:1198] (0/4) Epoch 26, batch 2550, loss[loss=0.184, ctc_loss=0.1184, cr_loss=0.3278, over 17108.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1334, cr_loss=0.3488, over 3365225.47 frames. ], batch size: 43, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:02:14,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2024-09-24 09:03:15,522 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-100000.pt 2024-09-24 09:03:20,624 INFO [train.py:1198] (0/4) Epoch 26, batch 2600, loss[loss=0.2326, ctc_loss=0.1647, cr_loss=0.3394, over 11801.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1337, cr_loss=0.3486, over 3351303.96 frames. ], batch size: 124, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:03:24,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.94 vs. limit=6.0 2024-09-24 09:03:51,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466718.0, ans=0.1 2024-09-24 09:03:57,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2024-09-24 09:04:07,746 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.265e+02 1.355e+02 1.499e+02 2.594e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 09:04:19,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=466811.3333333333, ans=0.125 2024-09-24 09:04:27,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=466858.0, ans=0.125 2024-09-24 09:04:42,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2024-09-24 09:04:42,888 INFO [train.py:1198] (0/4) Epoch 26, batch 2650, loss[loss=0.2336, ctc_loss=0.1542, cr_loss=0.3972, over 16763.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1348, cr_loss=0.3508, over 3345923.39 frames. ], batch size: 61, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:04:45,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=22.5 2024-09-24 09:04:52,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=466904.6666666667, ans=0.0 2024-09-24 09:05:21,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=12.0 2024-09-24 09:05:32,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=467044.6666666667, ans=0.0 2024-09-24 09:05:36,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=467044.6666666667, ans=0.125 2024-09-24 09:06:04,511 INFO [train.py:1198] (0/4) Epoch 26, batch 2700, loss[loss=0.2028, ctc_loss=0.1298, cr_loss=0.3653, over 17288.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1349, cr_loss=0.3516, over 3350811.15 frames. ], batch size: 46, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:06:12,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=467138.0, ans=0.125 2024-09-24 09:06:22,456 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:06:43,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=467231.3333333333, ans=15.0 2024-09-24 09:06:49,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=467231.3333333333, ans=0.125 2024-09-24 09:06:51,908 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.262e+02 1.341e+02 1.444e+02 3.624e+02, threshold=2.682e+02, percent-clipped=1.0 2024-09-24 09:07:08,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=467278.0, ans=0.125 2024-09-24 09:07:18,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467324.6666666667, ans=0.1 2024-09-24 09:07:26,517 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-24 09:07:27,272 INFO [train.py:1198] (0/4) Epoch 26, batch 2750, loss[loss=0.1985, ctc_loss=0.1305, cr_loss=0.3397, over 17056.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1349, cr_loss=0.3509, over 3352790.56 frames. ], batch size: 46, lr: 4.58e-03, grad_scale: 16.0 2024-09-24 09:07:40,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=467371.3333333333, ans=0.125 2024-09-24 09:07:49,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=467418.0, ans=10.0 2024-09-24 09:07:52,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=467418.0, ans=0.0 2024-09-24 09:07:57,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=467418.0, ans=0.125 2024-09-24 09:08:10,672 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2024-09-24 09:08:13,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=467464.6666666667, ans=0.125 2024-09-24 09:08:23,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=12.0 2024-09-24 09:08:36,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2024-09-24 09:08:52,543 INFO [train.py:1198] (0/4) Epoch 26, batch 2800, loss[loss=0.1714, ctc_loss=0.1066, cr_loss=0.3238, over 16307.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1352, cr_loss=0.3518, over 3353508.61 frames. ], batch size: 36, lr: 4.58e-03, grad_scale: 32.0 2024-09-24 09:08:52,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=467604.6666666667, ans=0.125 2024-09-24 09:08:54,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=467604.6666666667, ans=0.035 2024-09-24 09:09:04,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=467604.6666666667, ans=0.125 2024-09-24 09:09:14,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2024-09-24 09:09:17,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=467651.3333333333, ans=0.0 2024-09-24 09:09:37,849 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.261e+02 1.357e+02 1.478e+02 1.924e+02, threshold=2.714e+02, percent-clipped=0.0 2024-09-24 09:09:55,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=467791.3333333333, ans=0.04949747468305833 2024-09-24 09:10:01,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-24 09:10:01,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=467791.3333333333, ans=0.025 2024-09-24 09:10:05,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=467791.3333333333, ans=0.0 2024-09-24 09:10:12,881 INFO [train.py:1198] (0/4) Epoch 26, batch 2850, loss[loss=0.1767, ctc_loss=0.1124, cr_loss=0.3218, over 17179.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1348, cr_loss=0.3511, over 3352438.86 frames. ], batch size: 41, lr: 4.58e-03, grad_scale: 32.0 2024-09-24 09:10:54,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467931.3333333333, ans=0.1 2024-09-24 09:11:11,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=467978.0, ans=0.0 2024-09-24 09:11:16,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=467978.0, ans=0.09899494936611666 2024-09-24 09:11:19,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=468024.6666666667, ans=0.2 2024-09-24 09:11:21,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=468024.6666666667, ans=0.125 2024-09-24 09:11:35,020 INFO [train.py:1198] (0/4) Epoch 26, batch 2900, loss[loss=0.223, ctc_loss=0.1509, cr_loss=0.3603, over 15823.00 frames. ], tot_loss[loss=0.2043, ctc_loss=0.1343, cr_loss=0.3498, over 3350858.70 frames. ], batch size: 74, lr: 4.58e-03, grad_scale: 32.0 2024-09-24 09:11:44,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=468071.3333333333, ans=0.125 2024-09-24 09:11:44,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=468071.3333333333, ans=0.2 2024-09-24 09:11:45,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=468071.3333333333, ans=0.0 2024-09-24 09:12:22,358 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.321e+02 1.382e+02 1.512e+02 2.485e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-24 09:12:30,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=468211.3333333333, ans=0.0 2024-09-24 09:12:43,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468258.0, ans=0.1 2024-09-24 09:12:43,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=468258.0, ans=0.1 2024-09-24 09:13:00,443 INFO [train.py:1198] (0/4) Epoch 26, batch 2950, loss[loss=0.2172, ctc_loss=0.142, cr_loss=0.3759, over 16998.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1339, cr_loss=0.3498, over 3349393.41 frames. ], batch size: 56, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:13:00,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=468304.6666666667, ans=0.125 2024-09-24 09:13:05,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=468304.6666666667, ans=0.0 2024-09-24 09:13:07,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=468304.6666666667, ans=0.025 2024-09-24 09:13:08,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=468304.6666666667, ans=0.125 2024-09-24 09:13:34,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=468398.0, ans=0.125 2024-09-24 09:14:05,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=468491.3333333333, ans=0.0 2024-09-24 09:14:23,228 INFO [train.py:1198] (0/4) Epoch 26, batch 3000, loss[loss=0.176, ctc_loss=0.1127, cr_loss=0.3165, over 17078.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1328, cr_loss=0.3484, over 3363719.87 frames. ], batch size: 43, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:14:23,229 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 09:14:38,570 INFO [train.py:1230] (0/4) Epoch 26, validation: loss=0.03742, ctc_loss=0.03742, cr_loss=8.706e-15, over 944034.00 frames. 2024-09-24 09:14:38,570 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 09:15:02,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=468584.6666666667, ans=0.125 2024-09-24 09:15:03,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=468584.6666666667, ans=0.125 2024-09-24 09:15:11,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=468631.3333333333, ans=0.125 2024-09-24 09:15:18,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=468631.3333333333, ans=0.125 2024-09-24 09:15:21,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=468631.3333333333, ans=0.125 2024-09-24 09:15:22,386 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.270e+02 1.354e+02 1.456e+02 4.080e+02, threshold=2.708e+02, percent-clipped=1.0 2024-09-24 09:15:25,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=468678.0, ans=0.125 2024-09-24 09:15:38,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=468678.0, ans=0.0 2024-09-24 09:15:38,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468678.0, ans=0.1 2024-09-24 09:15:56,790 INFO [train.py:1198] (0/4) Epoch 26, batch 3050, loss[loss=0.1852, ctc_loss=0.1169, cr_loss=0.3415, over 16999.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1339, cr_loss=0.3497, over 3350730.66 frames. ], batch size: 44, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:16:31,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=468864.6666666667, ans=0.025 2024-09-24 09:16:33,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468864.6666666667, ans=0.1 2024-09-24 09:16:47,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=468911.3333333333, ans=0.125 2024-09-24 09:16:58,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-09-24 09:17:04,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=468958.0, ans=0.125 2024-09-24 09:17:14,928 INFO [train.py:1198] (0/4) Epoch 26, batch 3100, loss[loss=0.2028, ctc_loss=0.132, cr_loss=0.3538, over 17028.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1333, cr_loss=0.3486, over 3358739.83 frames. ], batch size: 44, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:17:30,078 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0 2024-09-24 09:17:47,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=469098.0, ans=10.0 2024-09-24 09:18:01,341 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.241e+02 1.330e+02 1.444e+02 2.208e+02, threshold=2.660e+02, percent-clipped=0.0 2024-09-24 09:18:11,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469144.6666666667, ans=0.1 2024-09-24 09:18:35,775 INFO [train.py:1198] (0/4) Epoch 26, batch 3150, loss[loss=0.2195, ctc_loss=0.1467, cr_loss=0.3639, over 17029.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1332, cr_loss=0.3482, over 3366481.68 frames. ], batch size: 51, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:18:45,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=469238.0, ans=0.09899494936611666 2024-09-24 09:18:50,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2024-09-24 09:19:56,682 INFO [train.py:1198] (0/4) Epoch 26, batch 3200, loss[loss=0.22, ctc_loss=0.1444, cr_loss=0.3779, over 16994.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1341, cr_loss=0.3499, over 3354875.29 frames. ], batch size: 53, lr: 4.57e-03, grad_scale: 32.0 2024-09-24 09:20:10,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.71 vs. limit=10.0 2024-09-24 09:20:20,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=469518.0, ans=0.04949747468305833 2024-09-24 09:20:25,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-09-24 09:20:35,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=469564.6666666667, ans=0.0 2024-09-24 09:20:44,066 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.250e+02 1.395e+02 1.507e+02 2.152e+02, threshold=2.790e+02, percent-clipped=0.0 2024-09-24 09:20:49,204 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:20:55,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=469611.3333333333, ans=0.2 2024-09-24 09:21:00,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=469658.0, ans=0.0 2024-09-24 09:21:15,230 INFO [train.py:1198] (0/4) Epoch 26, batch 3250, loss[loss=0.2012, ctc_loss=0.1311, cr_loss=0.3509, over 17010.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1352, cr_loss=0.3518, over 3363899.17 frames. ], batch size: 53, lr: 4.57e-03, grad_scale: 16.0 2024-09-24 09:21:38,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=469751.3333333333, ans=0.125 2024-09-24 09:22:04,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2024-09-24 09:22:35,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=22.5 2024-09-24 09:22:35,469 INFO [train.py:1198] (0/4) Epoch 26, batch 3300, loss[loss=0.2252, ctc_loss=0.1456, cr_loss=0.3984, over 16861.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1351, cr_loss=0.3514, over 3358517.88 frames. ], batch size: 58, lr: 4.57e-03, grad_scale: 16.0 2024-09-24 09:23:05,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=470031.3333333333, ans=0.125 2024-09-24 09:23:24,578 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.271e+02 1.364e+02 1.523e+02 2.164e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-24 09:23:32,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=470078.0, ans=0.0 2024-09-24 09:23:51,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=470124.6666666667, ans=0.0 2024-09-24 09:23:55,639 INFO [train.py:1198] (0/4) Epoch 26, batch 3350, loss[loss=0.2357, ctc_loss=0.1568, cr_loss=0.3945, over 16992.00 frames. ], tot_loss[loss=0.2066, ctc_loss=0.136, cr_loss=0.3529, over 3349858.28 frames. ], batch size: 53, lr: 4.57e-03, grad_scale: 16.0 2024-09-24 09:24:05,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=470171.3333333333, ans=0.2 2024-09-24 09:24:13,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=470218.0, ans=0.125 2024-09-24 09:24:24,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=470218.0, ans=0.02 2024-09-24 09:24:24,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=470218.0, ans=0.2 2024-09-24 09:24:51,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2024-09-24 09:24:55,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470311.3333333333, ans=0.1 2024-09-24 09:25:08,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-09-24 09:25:09,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=470358.0, ans=0.125 2024-09-24 09:25:14,415 INFO [train.py:1198] (0/4) Epoch 26, batch 3400, loss[loss=0.2209, ctc_loss=0.1434, cr_loss=0.3875, over 17216.00 frames. ], tot_loss[loss=0.2074, ctc_loss=0.1365, cr_loss=0.3545, over 3362146.96 frames. ], batch size: 55, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:25:22,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=470404.6666666667, ans=0.2 2024-09-24 09:26:01,626 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.254e+02 1.315e+02 1.422e+02 2.277e+02, threshold=2.630e+02, percent-clipped=0.0 2024-09-24 09:26:02,608 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-24 09:26:32,738 INFO [train.py:1198] (0/4) Epoch 26, batch 3450, loss[loss=0.2207, ctc_loss=0.1477, cr_loss=0.3651, over 17367.00 frames. ], tot_loss[loss=0.207, ctc_loss=0.1363, cr_loss=0.3535, over 3354701.98 frames. ], batch size: 48, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:27:03,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=470731.3333333333, ans=0.125 2024-09-24 09:27:04,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=470731.3333333333, ans=0.2 2024-09-24 09:27:31,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470778.0, ans=0.1 2024-09-24 09:27:53,198 INFO [train.py:1198] (0/4) Epoch 26, batch 3500, loss[loss=0.2309, ctc_loss=0.1511, cr_loss=0.3994, over 16587.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1353, cr_loss=0.3518, over 3353568.76 frames. ], batch size: 66, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:27:53,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=470871.3333333333, ans=0.07 2024-09-24 09:28:04,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=470871.3333333333, ans=0.125 2024-09-24 09:28:20,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.81 vs. limit=10.0 2024-09-24 09:28:29,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2024-09-24 09:28:39,878 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.277e+02 1.357e+02 1.514e+02 2.797e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-24 09:28:52,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=471011.3333333333, ans=0.125 2024-09-24 09:28:59,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2024-09-24 09:29:09,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=471104.6666666667, ans=0.025 2024-09-24 09:29:11,281 INFO [train.py:1198] (0/4) Epoch 26, batch 3550, loss[loss=0.2188, ctc_loss=0.1391, cr_loss=0.3982, over 17308.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1352, cr_loss=0.3516, over 3332302.42 frames. ], batch size: 51, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:29:57,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=471198.0, ans=0.125 2024-09-24 09:30:17,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=471291.3333333333, ans=0.1 2024-09-24 09:30:32,070 INFO [train.py:1198] (0/4) Epoch 26, batch 3600, loss[loss=0.1742, ctc_loss=0.1103, cr_loss=0.3195, over 17193.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1348, cr_loss=0.3508, over 3339394.11 frames. ], batch size: 41, lr: 4.56e-03, grad_scale: 32.0 2024-09-24 09:30:42,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=22.5 2024-09-24 09:30:44,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471338.0, ans=0.1 2024-09-24 09:31:09,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=471431.3333333333, ans=0.0 2024-09-24 09:31:20,017 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.297e+02 1.443e+02 1.634e+02 2.383e+02, threshold=2.886e+02, percent-clipped=0.0 2024-09-24 09:31:48,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471571.3333333333, ans=0.1 2024-09-24 09:31:48,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=471571.3333333333, ans=0.2 2024-09-24 09:31:49,547 INFO [train.py:1198] (0/4) Epoch 26, batch 3650, loss[loss=0.2, ctc_loss=0.1293, cr_loss=0.3535, over 17057.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.1344, cr_loss=0.35, over 3344515.25 frames. ], batch size: 46, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:31:59,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=471571.3333333333, ans=0.125 2024-09-24 09:32:01,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=471571.3333333333, ans=0.2 2024-09-24 09:32:35,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=471664.6666666667, ans=0.025 2024-09-24 09:32:41,091 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2024-09-24 09:32:57,519 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2024-09-24 09:33:08,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=471758.0, ans=0.0 2024-09-24 09:33:10,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2024-09-24 09:33:11,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=471804.6666666667, ans=0.125 2024-09-24 09:33:12,494 INFO [train.py:1198] (0/4) Epoch 26, batch 3700, loss[loss=0.1628, ctc_loss=0.1024, cr_loss=0.302, over 16696.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1347, cr_loss=0.3505, over 3339260.55 frames. ], batch size: 37, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:33:19,048 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:33:23,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2024-09-24 09:33:53,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=471898.0, ans=0.125 2024-09-24 09:33:59,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=471944.6666666667, ans=0.025 2024-09-24 09:34:01,194 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.253e+02 1.341e+02 1.434e+02 1.966e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-24 09:34:30,807 INFO [train.py:1198] (0/4) Epoch 26, batch 3750, loss[loss=0.2373, ctc_loss=0.1577, cr_loss=0.3981, over 16583.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1346, cr_loss=0.3508, over 3341296.35 frames. ], batch size: 66, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:35:01,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=472131.3333333333, ans=0.0 2024-09-24 09:35:32,729 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2024-09-24 09:35:47,120 INFO [train.py:1198] (0/4) Epoch 26, batch 3800, loss[loss=0.1817, ctc_loss=0.1162, cr_loss=0.3271, over 17027.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1349, cr_loss=0.3512, over 3340730.62 frames. ], batch size: 44, lr: 4.56e-03, grad_scale: 16.0 2024-09-24 09:35:48,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-24 09:35:53,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=472271.3333333333, ans=0.0 2024-09-24 09:36:25,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=472364.6666666667, ans=0.0 2024-09-24 09:36:34,124 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.296e+02 1.371e+02 1.525e+02 1.900e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-24 09:36:59,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=472458.0, ans=0.125 2024-09-24 09:37:03,760 INFO [train.py:1198] (0/4) Epoch 26, batch 3850, loss[loss=0.2012, ctc_loss=0.1296, cr_loss=0.358, over 17018.00 frames. ], tot_loss[loss=0.2089, ctc_loss=0.1378, cr_loss=0.3554, over 3309038.36 frames. ], batch size: 44, lr: 4.55e-03, grad_scale: 16.0 2024-09-24 09:37:08,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=472504.6666666667, ans=0.0 2024-09-24 09:37:16,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=472504.6666666667, ans=0.0 2024-09-24 09:37:17,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2024-09-24 09:37:28,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=472551.3333333333, ans=0.125 2024-09-24 09:37:30,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=472551.3333333333, ans=0.125 2024-09-24 09:37:48,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=472644.6666666667, ans=0.07 2024-09-24 09:37:52,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=472644.6666666667, ans=0.125 2024-09-24 09:38:04,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=472691.3333333333, ans=0.025 2024-09-24 09:38:13,198 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-26.pt 2024-09-24 09:39:03,327 INFO [train.py:1198] (0/4) Epoch 27, batch 0, loss[loss=0.1968, ctc_loss=0.1271, cr_loss=0.3484, over 17078.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1271, cr_loss=0.3484, over 17078.00 frames. ], batch size: 43, lr: 4.47e-03, grad_scale: 32.0 2024-09-24 09:39:03,328 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 09:39:21,545 INFO [train.py:1230] (0/4) Epoch 27, validation: loss=0.03741, ctc_loss=0.03741, cr_loss=8.388e-15, over 944034.00 frames. 2024-09-24 09:39:21,545 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 09:40:00,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=472812.6666666667, ans=0.125 2024-09-24 09:40:01,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472812.6666666667, ans=0.1 2024-09-24 09:40:05,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=472812.6666666667, ans=0.125 2024-09-24 09:40:13,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=472859.3333333333, ans=0.07 2024-09-24 09:40:21,653 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.362e+02 1.530e+02 1.663e+02 2.257e+02, threshold=3.060e+02, percent-clipped=0.0 2024-09-24 09:40:36,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-09-24 09:40:41,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=472906.0, ans=0.125 2024-09-24 09:40:45,634 INFO [train.py:1198] (0/4) Epoch 27, batch 50, loss[loss=0.1961, ctc_loss=0.1295, cr_loss=0.3332, over 16758.00 frames. ], tot_loss[loss=0.2065, ctc_loss=0.1357, cr_loss=0.3539, over 747341.98 frames. ], batch size: 61, lr: 4.47e-03, grad_scale: 32.0 2024-09-24 09:40:48,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-24 09:41:44,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=473092.6666666667, ans=0.1 2024-09-24 09:41:51,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2024-09-24 09:41:54,698 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-09-24 09:41:57,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=473139.3333333333, ans=0.0 2024-09-24 09:42:00,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473139.3333333333, ans=0.1 2024-09-24 09:42:01,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=473139.3333333333, ans=0.0 2024-09-24 09:42:04,850 INFO [train.py:1198] (0/4) Epoch 27, batch 100, loss[loss=0.2139, ctc_loss=0.1414, cr_loss=0.3621, over 17132.00 frames. ], tot_loss[loss=0.2068, ctc_loss=0.1356, cr_loss=0.3557, over 1324244.58 frames. ], batch size: 48, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:42:28,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=473232.6666666667, ans=0.125 2024-09-24 09:42:57,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473326.0, ans=0.125 2024-09-24 09:43:03,942 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.215e+02 1.307e+02 1.417e+02 1.891e+02, threshold=2.615e+02, percent-clipped=0.0 2024-09-24 09:43:21,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=473372.6666666667, ans=0.125 2024-09-24 09:43:22,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-09-24 09:43:28,092 INFO [train.py:1198] (0/4) Epoch 27, batch 150, loss[loss=0.2029, ctc_loss=0.1321, cr_loss=0.3537, over 17137.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.134, cr_loss=0.3536, over 1783281.30 frames. ], batch size: 45, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:43:28,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473419.3333333333, ans=0.1 2024-09-24 09:44:01,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.91 vs. limit=10.0 2024-09-24 09:44:22,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=473559.3333333333, ans=0.125 2024-09-24 09:44:46,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=473606.0, ans=0.1 2024-09-24 09:44:53,618 INFO [train.py:1198] (0/4) Epoch 27, batch 200, loss[loss=0.2182, ctc_loss=0.1479, cr_loss=0.3515, over 17346.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1346, cr_loss=0.3528, over 2127577.25 frames. ], batch size: 48, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:44:57,441 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-09-24 09:45:01,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=473652.6666666667, ans=0.125 2024-09-24 09:45:22,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=473699.3333333333, ans=0.0 2024-09-24 09:45:48,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473792.6666666667, ans=0.1 2024-09-24 09:45:52,247 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.251e+02 1.322e+02 1.422e+02 2.046e+02, threshold=2.645e+02, percent-clipped=0.0 2024-09-24 09:46:00,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=473839.3333333333, ans=0.125 2024-09-24 09:46:02,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=473839.3333333333, ans=0.0 2024-09-24 09:46:03,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473839.3333333333, ans=0.1 2024-09-24 09:46:16,199 INFO [train.py:1198] (0/4) Epoch 27, batch 250, loss[loss=0.1896, ctc_loss=0.1265, cr_loss=0.3158, over 17022.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1347, cr_loss=0.3523, over 2402450.59 frames. ], batch size: 56, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:46:48,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=473979.3333333333, ans=0.125 2024-09-24 09:46:53,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=473979.3333333333, ans=0.125 2024-09-24 09:47:20,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=474072.6666666667, ans=0.125 2024-09-24 09:47:21,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=474072.6666666667, ans=0.1 2024-09-24 09:47:38,637 INFO [train.py:1198] (0/4) Epoch 27, batch 300, loss[loss=0.2242, ctc_loss=0.1507, cr_loss=0.3677, over 17011.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1348, cr_loss=0.3528, over 2616320.71 frames. ], batch size: 56, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:47:40,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474119.3333333333, ans=0.1 2024-09-24 09:47:47,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=474119.3333333333, ans=0.2 2024-09-24 09:47:53,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=474166.0, ans=0.0 2024-09-24 09:48:02,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=474166.0, ans=0.025 2024-09-24 09:48:26,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=474259.3333333333, ans=0.125 2024-09-24 09:48:34,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=474259.3333333333, ans=0.1 2024-09-24 09:48:35,476 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.322e+02 1.414e+02 1.594e+02 2.687e+02, threshold=2.828e+02, percent-clipped=1.0 2024-09-24 09:48:59,275 INFO [train.py:1198] (0/4) Epoch 27, batch 350, loss[loss=0.2244, ctc_loss=0.1506, cr_loss=0.369, over 17206.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1338, cr_loss=0.3509, over 2787617.59 frames. ], batch size: 55, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:49:07,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=474352.6666666667, ans=0.125 2024-09-24 09:49:08,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.0 2024-09-24 09:49:49,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=474446.0, ans=0.125 2024-09-24 09:49:51,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=22.5 2024-09-24 09:49:55,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=474492.6666666667, ans=0.04949747468305833 2024-09-24 09:50:03,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=474492.6666666667, ans=0.07 2024-09-24 09:50:27,285 INFO [train.py:1198] (0/4) Epoch 27, batch 400, loss[loss=0.1982, ctc_loss=0.1281, cr_loss=0.3505, over 17149.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1336, cr_loss=0.3507, over 2911228.93 frames. ], batch size: 45, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:50:37,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=474586.0, ans=0.125 2024-09-24 09:50:50,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-24 09:51:23,175 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.248e+02 1.347e+02 1.469e+02 2.188e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-24 09:51:23,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=474726.0, ans=0.0 2024-09-24 09:51:23,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=474726.0, ans=0.125 2024-09-24 09:51:23,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=474726.0, ans=0.025 2024-09-24 09:51:28,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=474726.0, ans=0.05 2024-09-24 09:51:31,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=474772.6666666667, ans=0.125 2024-09-24 09:51:34,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=474772.6666666667, ans=0.125 2024-09-24 09:51:47,498 INFO [train.py:1198] (0/4) Epoch 27, batch 450, loss[loss=0.2088, ctc_loss=0.1369, cr_loss=0.3596, over 17301.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1336, cr_loss=0.3516, over 3014110.06 frames. ], batch size: 51, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:51:47,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=474819.3333333333, ans=0.125 2024-09-24 09:52:13,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=474866.0, ans=0.125 2024-09-24 09:52:17,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=474912.6666666667, ans=0.0 2024-09-24 09:52:27,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=474912.6666666667, ans=0.0 2024-09-24 09:52:52,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=475006.0, ans=0.04949747468305833 2024-09-24 09:53:09,797 INFO [train.py:1198] (0/4) Epoch 27, batch 500, loss[loss=0.23, ctc_loss=0.1541, cr_loss=0.3797, over 15971.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1342, cr_loss=0.3521, over 3084597.70 frames. ], batch size: 74, lr: 4.46e-03, grad_scale: 32.0 2024-09-24 09:53:39,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=475099.3333333333, ans=0.125 2024-09-24 09:54:06,519 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.284e+02 1.368e+02 1.499e+02 2.424e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-24 09:54:14,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=475239.3333333333, ans=0.025 2024-09-24 09:54:33,117 INFO [train.py:1198] (0/4) Epoch 27, batch 550, loss[loss=0.221, ctc_loss=0.1413, cr_loss=0.3982, over 17353.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1345, cr_loss=0.3523, over 3146644.14 frames. ], batch size: 48, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 09:54:42,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=475286.0, ans=0.0 2024-09-24 09:54:56,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=475332.6666666667, ans=0.125 2024-09-24 09:55:57,941 INFO [train.py:1198] (0/4) Epoch 27, batch 600, loss[loss=0.2102, ctc_loss=0.1381, cr_loss=0.3609, over 16997.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1344, cr_loss=0.3527, over 3198909.78 frames. ], batch size: 56, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 09:55:58,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-24 09:56:00,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2024-09-24 09:56:20,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=475566.0, ans=0.125 2024-09-24 09:56:21,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=475566.0, ans=0.0 2024-09-24 09:56:23,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=475566.0, ans=0.125 2024-09-24 09:56:25,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=475566.0, ans=0.2 2024-09-24 09:56:53,670 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.260e+02 1.326e+02 1.393e+02 1.864e+02, threshold=2.652e+02, percent-clipped=0.0 2024-09-24 09:57:13,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=475706.0, ans=0.05 2024-09-24 09:57:17,886 INFO [train.py:1198] (0/4) Epoch 27, batch 650, loss[loss=0.1594, ctc_loss=0.09951, cr_loss=0.2997, over 16765.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1341, cr_loss=0.3517, over 3238353.52 frames. ], batch size: 37, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 09:57:56,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=475846.0, ans=0.025 2024-09-24 09:58:04,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475846.0, ans=0.1 2024-09-24 09:58:17,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=475892.6666666667, ans=0.04949747468305833 2024-09-24 09:58:20,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=475892.6666666667, ans=0.125 2024-09-24 09:58:35,375 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 09:58:39,737 INFO [train.py:1198] (0/4) Epoch 27, batch 700, loss[loss=0.2204, ctc_loss=0.1436, cr_loss=0.3837, over 16946.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1342, cr_loss=0.3521, over 3272284.61 frames. ], batch size: 58, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 09:58:44,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=12.0 2024-09-24 09:58:45,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475986.0, ans=0.1 2024-09-24 09:58:59,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476032.6666666667, ans=0.1 2024-09-24 09:59:07,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476032.6666666667, ans=0.1 2024-09-24 09:59:24,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=476079.3333333333, ans=0.125 2024-09-24 09:59:24,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=476079.3333333333, ans=0.125 2024-09-24 09:59:30,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=476126.0, ans=0.0 2024-09-24 09:59:40,876 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.256e+02 1.371e+02 1.501e+02 2.687e+02, threshold=2.742e+02, percent-clipped=1.0 2024-09-24 10:00:04,600 INFO [train.py:1198] (0/4) Epoch 27, batch 750, loss[loss=0.1967, ctc_loss=0.1291, cr_loss=0.338, over 17296.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1348, cr_loss=0.3529, over 3283769.05 frames. ], batch size: 49, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:00:14,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2024-09-24 10:00:15,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-09-24 10:00:52,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=15.0 2024-09-24 10:00:53,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2024-09-24 10:00:57,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-09-24 10:01:05,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=476359.3333333333, ans=0.025 2024-09-24 10:01:27,181 INFO [train.py:1198] (0/4) Epoch 27, batch 800, loss[loss=0.2016, ctc_loss=0.1316, cr_loss=0.3504, over 17096.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1344, cr_loss=0.3514, over 3290571.12 frames. ], batch size: 43, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:01:29,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=476452.6666666667, ans=0.05 2024-09-24 10:01:42,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-09-24 10:02:09,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=476546.0, ans=0.0 2024-09-24 10:02:23,978 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.235e+02 1.338e+02 1.405e+02 1.662e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 10:02:32,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476639.3333333333, ans=0.1 2024-09-24 10:02:42,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.85 vs. limit=10.0 2024-09-24 10:02:50,860 INFO [train.py:1198] (0/4) Epoch 27, batch 850, loss[loss=0.1697, ctc_loss=0.1113, cr_loss=0.2918, over 16953.00 frames. ], tot_loss[loss=0.2043, ctc_loss=0.134, cr_loss=0.3511, over 3313627.44 frames. ], batch size: 42, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:04:10,719 INFO [train.py:1198] (0/4) Epoch 27, batch 900, loss[loss=0.2343, ctc_loss=0.1574, cr_loss=0.3843, over 17290.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1342, cr_loss=0.3516, over 3324073.64 frames. ], batch size: 49, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:04:20,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=476919.3333333333, ans=0.125 2024-09-24 10:04:43,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476966.0, ans=0.1 2024-09-24 10:04:45,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=476966.0, ans=0.0 2024-09-24 10:04:46,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=477012.6666666667, ans=0.2 2024-09-24 10:05:13,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=477059.3333333333, ans=0.125 2024-09-24 10:05:14,404 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.238e+02 1.345e+02 1.503e+02 3.181e+02, threshold=2.691e+02, percent-clipped=1.0 2024-09-24 10:05:38,936 INFO [train.py:1198] (0/4) Epoch 27, batch 950, loss[loss=0.2005, ctc_loss=0.1286, cr_loss=0.3595, over 17180.00 frames. ], tot_loss[loss=0.2043, ctc_loss=0.1341, cr_loss=0.3511, over 3334821.11 frames. ], batch size: 45, lr: 4.45e-03, grad_scale: 32.0 2024-09-24 10:06:29,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.60 vs. limit=15.0 2024-09-24 10:06:52,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=477339.3333333333, ans=0.0 2024-09-24 10:06:58,798 INFO [train.py:1198] (0/4) Epoch 27, batch 1000, loss[loss=0.2142, ctc_loss=0.1408, cr_loss=0.3667, over 16712.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1333, cr_loss=0.3497, over 3344537.34 frames. ], batch size: 61, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:07:02,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=477386.0, ans=0.2 2024-09-24 10:07:42,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=477479.3333333333, ans=0.1 2024-09-24 10:08:00,887 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.228e+02 1.308e+02 1.402e+02 1.814e+02, threshold=2.617e+02, percent-clipped=0.0 2024-09-24 10:08:06,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=477572.6666666667, ans=0.0 2024-09-24 10:08:17,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=477572.6666666667, ans=0.125 2024-09-24 10:08:19,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477572.6666666667, ans=0.125 2024-09-24 10:08:22,252 INFO [train.py:1198] (0/4) Epoch 27, batch 1050, loss[loss=0.2104, ctc_loss=0.1365, cr_loss=0.3693, over 17013.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1343, cr_loss=0.352, over 3341336.70 frames. ], batch size: 51, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:08:38,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=477666.0, ans=0.0 2024-09-24 10:08:40,475 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.77 vs. limit=22.5 2024-09-24 10:08:46,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=477666.0, ans=0.0 2024-09-24 10:08:52,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=477712.6666666667, ans=0.125 2024-09-24 10:09:07,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=477712.6666666667, ans=0.0 2024-09-24 10:09:26,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=477759.3333333333, ans=0.125 2024-09-24 10:09:39,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=477806.0, ans=0.02 2024-09-24 10:09:47,486 INFO [train.py:1198] (0/4) Epoch 27, batch 1100, loss[loss=0.2028, ctc_loss=0.1336, cr_loss=0.3463, over 17004.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1345, cr_loss=0.3522, over 3342543.15 frames. ], batch size: 44, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:10:16,292 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:10:21,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477946.0, ans=0.0 2024-09-24 10:10:26,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=8.0 2024-09-24 10:10:27,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=477946.0, ans=0.2 2024-09-24 10:10:39,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-24 10:10:49,361 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.279e+02 1.377e+02 1.520e+02 1.966e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-24 10:10:54,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=478039.3333333333, ans=0.0 2024-09-24 10:11:04,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=478039.3333333333, ans=0.125 2024-09-24 10:11:10,488 INFO [train.py:1198] (0/4) Epoch 27, batch 1150, loss[loss=0.2072, ctc_loss=0.1363, cr_loss=0.3543, over 17018.00 frames. ], tot_loss[loss=0.2046, ctc_loss=0.1343, cr_loss=0.3513, over 3334117.73 frames. ], batch size: 53, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:11:20,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=478086.0, ans=12.0 2024-09-24 10:11:57,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=478226.0, ans=0.125 2024-09-24 10:12:18,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=22.5 2024-09-24 10:12:28,423 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.60 vs. limit=12.0 2024-09-24 10:12:33,337 INFO [train.py:1198] (0/4) Epoch 27, batch 1200, loss[loss=0.2348, ctc_loss=0.1599, cr_loss=0.3746, over 12205.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1339, cr_loss=0.3508, over 3340731.80 frames. ], batch size: 125, lr: 4.44e-03, grad_scale: 16.0 2024-09-24 10:12:36,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=478319.3333333333, ans=0.125 2024-09-24 10:12:46,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478319.3333333333, ans=0.0 2024-09-24 10:12:49,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=478366.0, ans=0.05 2024-09-24 10:13:08,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=478412.6666666667, ans=0.02 2024-09-24 10:13:32,437 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.237e+02 1.313e+02 1.395e+02 2.791e+02, threshold=2.627e+02, percent-clipped=1.0 2024-09-24 10:13:53,065 INFO [train.py:1198] (0/4) Epoch 27, batch 1250, loss[loss=0.2364, ctc_loss=0.1559, cr_loss=0.4025, over 17025.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1338, cr_loss=0.3499, over 3345972.49 frames. ], batch size: 53, lr: 4.44e-03, grad_scale: 16.0 2024-09-24 10:14:48,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=478692.6666666667, ans=0.2 2024-09-24 10:15:07,882 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:15:21,233 INFO [train.py:1198] (0/4) Epoch 27, batch 1300, loss[loss=0.2121, ctc_loss=0.1406, cr_loss=0.3576, over 17023.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1344, cr_loss=0.3516, over 3350684.89 frames. ], batch size: 52, lr: 4.44e-03, grad_scale: 16.0 2024-09-24 10:15:39,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=478832.6666666667, ans=0.125 2024-09-24 10:15:52,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=478879.3333333333, ans=0.125 2024-09-24 10:16:11,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=478926.0, ans=0.125 2024-09-24 10:16:17,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=478926.0, ans=0.1 2024-09-24 10:16:19,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=478926.0, ans=0.1 2024-09-24 10:16:22,437 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.264e+02 1.365e+02 1.488e+02 1.950e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-24 10:16:29,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=478972.6666666667, ans=0.05 2024-09-24 10:16:40,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=479019.3333333333, ans=0.125 2024-09-24 10:16:41,763 INFO [train.py:1198] (0/4) Epoch 27, batch 1350, loss[loss=0.2149, ctc_loss=0.1405, cr_loss=0.372, over 16593.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1338, cr_loss=0.3511, over 3357415.10 frames. ], batch size: 66, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:16:42,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=479019.3333333333, ans=0.125 2024-09-24 10:17:09,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-09-24 10:17:31,950 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:17:45,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479159.3333333333, ans=0.1 2024-09-24 10:17:56,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=479206.0, ans=0.125 2024-09-24 10:18:04,479 INFO [train.py:1198] (0/4) Epoch 27, batch 1400, loss[loss=0.195, ctc_loss=0.1273, cr_loss=0.3385, over 17070.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1331, cr_loss=0.3497, over 3359516.87 frames. ], batch size: 46, lr: 4.44e-03, grad_scale: 8.0 2024-09-24 10:18:15,007 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-09-24 10:19:02,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=479392.6666666667, ans=0.125 2024-09-24 10:19:08,018 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.257e+02 1.359e+02 1.482e+02 2.377e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-24 10:19:11,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=479439.3333333333, ans=0.125 2024-09-24 10:19:17,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=479439.3333333333, ans=0.125 2024-09-24 10:19:27,141 INFO [train.py:1198] (0/4) Epoch 27, batch 1450, loss[loss=0.2105, ctc_loss=0.138, cr_loss=0.3625, over 17145.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1343, cr_loss=0.351, over 3342025.62 frames. ], batch size: 48, lr: 4.43e-03, grad_scale: 8.0 2024-09-24 10:19:36,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=479486.0, ans=0.125 2024-09-24 10:19:36,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479486.0, ans=0.1 2024-09-24 10:19:39,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=479486.0, ans=0.125 2024-09-24 10:19:41,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=479486.0, ans=0.0 2024-09-24 10:19:56,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.89 vs. limit=15.0 2024-09-24 10:20:05,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=479579.3333333333, ans=0.0 2024-09-24 10:20:41,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=479672.6666666667, ans=0.2 2024-09-24 10:20:51,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=479719.3333333333, ans=0.025 2024-09-24 10:20:52,662 INFO [train.py:1198] (0/4) Epoch 27, batch 1500, loss[loss=0.2079, ctc_loss=0.1336, cr_loss=0.3717, over 17072.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.134, cr_loss=0.3508, over 3345018.51 frames. ], batch size: 46, lr: 4.43e-03, grad_scale: 8.0 2024-09-24 10:20:53,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2024-09-24 10:21:09,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=479766.0, ans=0.07 2024-09-24 10:21:30,527 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:21:34,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2024-09-24 10:21:36,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=12.0 2024-09-24 10:21:38,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=479812.6666666667, ans=0.0 2024-09-24 10:21:46,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=479859.3333333333, ans=0.025 2024-09-24 10:21:54,051 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.289e+02 1.371e+02 1.496e+02 2.046e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-24 10:21:59,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479906.0, ans=0.1 2024-09-24 10:22:13,096 INFO [train.py:1198] (0/4) Epoch 27, batch 1550, loss[loss=0.1879, ctc_loss=0.1172, cr_loss=0.3538, over 17035.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1327, cr_loss=0.3485, over 3349504.15 frames. ], batch size: 39, lr: 4.43e-03, grad_scale: 8.0 2024-09-24 10:22:24,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=479952.6666666667, ans=0.0 2024-09-24 10:22:35,679 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2024-09-24 10:22:47,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=480046.0, ans=0.125 2024-09-24 10:22:57,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=480046.0, ans=0.035 2024-09-24 10:23:07,172 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:23:36,312 INFO [train.py:1198] (0/4) Epoch 27, batch 1600, loss[loss=0.2334, ctc_loss=0.1529, cr_loss=0.4025, over 16911.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1336, cr_loss=0.35, over 3342548.58 frames. ], batch size: 58, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:23:44,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=480186.0, ans=0.0 2024-09-24 10:24:02,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=480232.6666666667, ans=0.035 2024-09-24 10:24:19,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-09-24 10:24:20,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=480279.3333333333, ans=0.07 2024-09-24 10:24:24,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-09-24 10:24:27,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2024-09-24 10:24:35,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=480326.0, ans=10.0 2024-09-24 10:24:41,898 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.022e+02 1.244e+02 1.330e+02 1.456e+02 2.026e+02, threshold=2.660e+02, percent-clipped=0.0 2024-09-24 10:24:47,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=480372.6666666667, ans=0.04949747468305833 2024-09-24 10:24:54,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=480372.6666666667, ans=0.95 2024-09-24 10:25:03,406 INFO [train.py:1198] (0/4) Epoch 27, batch 1650, loss[loss=0.216, ctc_loss=0.1434, cr_loss=0.3629, over 17314.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.134, cr_loss=0.3503, over 3350582.49 frames. ], batch size: 49, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:25:16,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=480419.3333333333, ans=0.125 2024-09-24 10:26:06,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=480606.0, ans=0.125 2024-09-24 10:26:17,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=480606.0, ans=0.1 2024-09-24 10:26:23,698 INFO [train.py:1198] (0/4) Epoch 27, batch 1700, loss[loss=0.2341, ctc_loss=0.1535, cr_loss=0.4031, over 17006.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1333, cr_loss=0.3496, over 3354372.20 frames. ], batch size: 56, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:26:35,633 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=12.0 2024-09-24 10:27:07,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=480746.0, ans=0.125 2024-09-24 10:27:08,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=480746.0, ans=0.125 2024-09-24 10:27:16,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=480792.6666666667, ans=0.0 2024-09-24 10:27:26,753 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.231e+02 1.318e+02 1.430e+02 1.905e+02, threshold=2.636e+02, percent-clipped=0.0 2024-09-24 10:27:36,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=480839.3333333333, ans=0.125 2024-09-24 10:27:45,876 INFO [train.py:1198] (0/4) Epoch 27, batch 1750, loss[loss=0.2142, ctc_loss=0.1425, cr_loss=0.3583, over 17222.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1335, cr_loss=0.3497, over 3358222.78 frames. ], batch size: 47, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:27:55,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=480886.0, ans=0.0 2024-09-24 10:27:57,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=480886.0, ans=0.1 2024-09-24 10:27:58,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=480886.0, ans=0.125 2024-09-24 10:28:14,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=480932.6666666667, ans=0.025 2024-09-24 10:28:21,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=480979.3333333333, ans=0.0 2024-09-24 10:28:59,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=481072.6666666667, ans=0.125 2024-09-24 10:29:02,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=481072.6666666667, ans=0.125 2024-09-24 10:29:08,830 INFO [train.py:1198] (0/4) Epoch 27, batch 1800, loss[loss=0.1814, ctc_loss=0.1193, cr_loss=0.3105, over 17098.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1349, cr_loss=0.3511, over 3341628.38 frames. ], batch size: 43, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:29:18,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=481119.3333333333, ans=0.125 2024-09-24 10:29:35,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=481166.0, ans=0.1 2024-09-24 10:30:14,563 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.286e+02 1.379e+02 1.526e+02 2.061e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-24 10:30:18,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=481306.0, ans=0.07 2024-09-24 10:30:33,713 INFO [train.py:1198] (0/4) Epoch 27, batch 1850, loss[loss=0.1598, ctc_loss=0.1012, cr_loss=0.2932, over 17063.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1336, cr_loss=0.3493, over 3355147.76 frames. ], batch size: 39, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:31:01,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=481399.3333333333, ans=0.0 2024-09-24 10:31:36,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481539.3333333333, ans=0.1 2024-09-24 10:31:53,596 INFO [train.py:1198] (0/4) Epoch 27, batch 1900, loss[loss=0.2547, ctc_loss=0.18, cr_loss=0.3735, over 11589.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1336, cr_loss=0.3492, over 3356237.49 frames. ], batch size: 123, lr: 4.43e-03, grad_scale: 16.0 2024-09-24 10:32:51,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-24 10:32:57,450 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.257e+02 1.350e+02 1.464e+02 2.291e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 10:33:15,800 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2024-09-24 10:33:16,475 INFO [train.py:1198] (0/4) Epoch 27, batch 1950, loss[loss=0.2429, ctc_loss=0.1609, cr_loss=0.4096, over 17060.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1334, cr_loss=0.3487, over 3359012.63 frames. ], batch size: 52, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:33:16,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=481819.3333333333, ans=0.125 2024-09-24 10:33:26,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481819.3333333333, ans=0.1 2024-09-24 10:33:42,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=481866.0, ans=0.125 2024-09-24 10:33:59,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=481912.6666666667, ans=0.125 2024-09-24 10:34:06,837 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-09-24 10:34:17,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=481959.3333333333, ans=0.0 2024-09-24 10:34:42,090 INFO [train.py:1198] (0/4) Epoch 27, batch 2000, loss[loss=0.1867, ctc_loss=0.1242, cr_loss=0.3123, over 17170.00 frames. ], tot_loss[loss=0.2043, ctc_loss=0.1342, cr_loss=0.3504, over 3355199.66 frames. ], batch size: 45, lr: 4.42e-03, grad_scale: 32.0 2024-09-24 10:35:35,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2024-09-24 10:35:46,434 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.275e+02 1.345e+02 1.450e+02 1.969e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-24 10:36:02,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=482286.0, ans=0.125 2024-09-24 10:36:04,146 INFO [train.py:1198] (0/4) Epoch 27, batch 2050, loss[loss=0.1912, ctc_loss=0.1257, cr_loss=0.3277, over 17086.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1332, cr_loss=0.3495, over 3366592.63 frames. ], batch size: 43, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:36:22,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=482332.6666666667, ans=0.0 2024-09-24 10:36:32,145 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2024-09-24 10:36:52,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=482426.0, ans=0.1 2024-09-24 10:37:27,022 INFO [train.py:1198] (0/4) Epoch 27, batch 2100, loss[loss=0.1619, ctc_loss=0.1046, cr_loss=0.2868, over 17160.00 frames. ], tot_loss[loss=0.2037, ctc_loss=0.1336, cr_loss=0.3503, over 3372625.20 frames. ], batch size: 41, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:37:37,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482519.3333333333, ans=0.1 2024-09-24 10:38:26,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=482659.3333333333, ans=0.0 2024-09-24 10:38:29,423 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.227e+02 1.308e+02 1.433e+02 1.787e+02, threshold=2.617e+02, percent-clipped=0.0 2024-09-24 10:38:36,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=482706.0, ans=0.125 2024-09-24 10:38:47,143 INFO [train.py:1198] (0/4) Epoch 27, batch 2150, loss[loss=0.1783, ctc_loss=0.1122, cr_loss=0.3309, over 16679.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1333, cr_loss=0.3503, over 3380672.21 frames. ], batch size: 37, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:38:53,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=482752.6666666667, ans=0.2 2024-09-24 10:38:57,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=482752.6666666667, ans=0.125 2024-09-24 10:39:24,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=15.0 2024-09-24 10:39:52,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=482892.6666666667, ans=0.2 2024-09-24 10:40:14,293 INFO [train.py:1198] (0/4) Epoch 27, batch 2200, loss[loss=0.2195, ctc_loss=0.148, cr_loss=0.3579, over 17065.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1329, cr_loss=0.3496, over 3375558.20 frames. ], batch size: 46, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:40:44,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=483079.3333333333, ans=0.0 2024-09-24 10:41:16,367 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.255e+02 1.303e+02 1.378e+02 1.644e+02, threshold=2.606e+02, percent-clipped=0.0 2024-09-24 10:41:34,085 INFO [train.py:1198] (0/4) Epoch 27, batch 2250, loss[loss=0.2193, ctc_loss=0.1465, cr_loss=0.3641, over 17009.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1332, cr_loss=0.3497, over 3379267.91 frames. ], batch size: 51, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:41:54,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=483266.0, ans=0.125 2024-09-24 10:42:09,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=483312.6666666667, ans=0.0 2024-09-24 10:42:13,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=483312.6666666667, ans=0.0 2024-09-24 10:42:22,203 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:42:36,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=483359.3333333333, ans=0.05 2024-09-24 10:42:41,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=483406.0, ans=0.125 2024-09-24 10:42:44,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2024-09-24 10:42:49,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=483406.0, ans=0.125 2024-09-24 10:42:56,653 INFO [train.py:1198] (0/4) Epoch 27, batch 2300, loss[loss=0.2275, ctc_loss=0.1511, cr_loss=0.3822, over 15163.00 frames. ], tot_loss[loss=0.2044, ctc_loss=0.1342, cr_loss=0.3513, over 3375601.07 frames. ], batch size: 89, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:43:40,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2024-09-24 10:43:52,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=483592.6666666667, ans=0.125 2024-09-24 10:43:58,587 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.254e+02 1.343e+02 1.462e+02 2.563e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-24 10:44:11,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=483639.3333333333, ans=0.0 2024-09-24 10:44:16,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2024-09-24 10:44:18,920 INFO [train.py:1198] (0/4) Epoch 27, batch 2350, loss[loss=0.182, ctc_loss=0.1157, cr_loss=0.3313, over 17209.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1348, cr_loss=0.352, over 3370261.17 frames. ], batch size: 41, lr: 4.42e-03, grad_scale: 16.0 2024-09-24 10:44:50,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=483732.6666666667, ans=0.0 2024-09-24 10:45:12,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.25 vs. limit=10.0 2024-09-24 10:45:18,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=483826.0, ans=0.0 2024-09-24 10:45:22,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=12.0 2024-09-24 10:45:23,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=483826.0, ans=0.125 2024-09-24 10:45:43,653 INFO [train.py:1198] (0/4) Epoch 27, batch 2400, loss[loss=0.2223, ctc_loss=0.15, cr_loss=0.3617, over 16926.00 frames. ], tot_loss[loss=0.2049, ctc_loss=0.1345, cr_loss=0.352, over 3364856.37 frames. ], batch size: 58, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:45:48,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=483919.3333333333, ans=0.125 2024-09-24 10:46:01,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-24 10:46:07,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=483966.0, ans=0.125 2024-09-24 10:46:12,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=483966.0, ans=0.0 2024-09-24 10:46:31,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=484059.3333333333, ans=0.09899494936611666 2024-09-24 10:46:36,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484059.3333333333, ans=0.1 2024-09-24 10:46:45,619 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.266e+02 1.332e+02 1.424e+02 3.115e+02, threshold=2.664e+02, percent-clipped=1.0 2024-09-24 10:46:46,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=484106.0, ans=0.125 2024-09-24 10:47:03,208 INFO [train.py:1198] (0/4) Epoch 27, batch 2450, loss[loss=0.2089, ctc_loss=0.1381, cr_loss=0.3539, over 17357.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1343, cr_loss=0.3517, over 3364685.11 frames. ], batch size: 48, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:47:03,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=484152.6666666667, ans=0.125 2024-09-24 10:47:25,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=484199.3333333333, ans=0.2 2024-09-24 10:47:47,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=484246.0, ans=0.0 2024-09-24 10:48:25,695 INFO [train.py:1198] (0/4) Epoch 27, batch 2500, loss[loss=0.1861, ctc_loss=0.1187, cr_loss=0.3368, over 17270.00 frames. ], tot_loss[loss=0.2051, ctc_loss=0.1346, cr_loss=0.3526, over 3367789.67 frames. ], batch size: 42, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:48:35,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=484386.0, ans=0.125 2024-09-24 10:48:39,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484386.0, ans=0.1 2024-09-24 10:49:05,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=484479.3333333333, ans=0.025 2024-09-24 10:49:30,488 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.304e+02 1.383e+02 1.470e+02 1.966e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-24 10:49:50,887 INFO [train.py:1198] (0/4) Epoch 27, batch 2550, loss[loss=0.1863, ctc_loss=0.1225, cr_loss=0.319, over 17298.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1337, cr_loss=0.3509, over 3364624.09 frames. ], batch size: 49, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:49:55,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-24 10:50:36,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=484712.6666666667, ans=0.125 2024-09-24 10:50:41,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=484759.3333333333, ans=0.0 2024-09-24 10:51:09,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.93 vs. limit=10.0 2024-09-24 10:51:13,299 INFO [train.py:1198] (0/4) Epoch 27, batch 2600, loss[loss=0.2104, ctc_loss=0.1405, cr_loss=0.3494, over 17027.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1339, cr_loss=0.351, over 3362516.51 frames. ], batch size: 53, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:52:15,857 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.258e+02 1.335e+02 1.464e+02 4.634e+02, threshold=2.669e+02, percent-clipped=1.0 2024-09-24 10:52:33,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=485039.3333333333, ans=0.0 2024-09-24 10:52:36,181 INFO [train.py:1198] (0/4) Epoch 27, batch 2650, loss[loss=0.1995, ctc_loss=0.1329, cr_loss=0.3331, over 17149.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1331, cr_loss=0.3494, over 3361760.27 frames. ], batch size: 48, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:52:45,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=485086.0, ans=10.0 2024-09-24 10:52:52,728 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 10:52:59,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=485132.6666666667, ans=0.125 2024-09-24 10:53:08,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=485179.3333333333, ans=0.025 2024-09-24 10:53:52,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=485272.6666666667, ans=0.0 2024-09-24 10:53:55,758 INFO [train.py:1198] (0/4) Epoch 27, batch 2700, loss[loss=0.2127, ctc_loss=0.1432, cr_loss=0.3476, over 17141.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1331, cr_loss=0.3493, over 3357075.99 frames. ], batch size: 48, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:53:59,333 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-104000.pt 2024-09-24 10:54:26,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=485366.0, ans=0.5 2024-09-24 10:54:30,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-09-24 10:54:47,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=485412.6666666667, ans=0.125 2024-09-24 10:54:54,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=485459.3333333333, ans=0.1 2024-09-24 10:54:55,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=12.0 2024-09-24 10:55:08,213 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.258e+02 1.322e+02 1.410e+02 2.487e+02, threshold=2.644e+02, percent-clipped=0.0 2024-09-24 10:55:25,595 INFO [train.py:1198] (0/4) Epoch 27, batch 2750, loss[loss=0.1898, ctc_loss=0.1263, cr_loss=0.3174, over 17041.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1325, cr_loss=0.3484, over 3358628.73 frames. ], batch size: 56, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:55:29,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=485552.6666666667, ans=0.0 2024-09-24 10:55:47,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=485599.3333333333, ans=0.0 2024-09-24 10:56:00,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=485646.0, ans=0.0 2024-09-24 10:56:07,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=485646.0, ans=0.025 2024-09-24 10:56:15,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=485692.6666666667, ans=0.0 2024-09-24 10:56:40,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=485739.3333333333, ans=0.125 2024-09-24 10:56:45,245 INFO [train.py:1198] (0/4) Epoch 27, batch 2800, loss[loss=0.2125, ctc_loss=0.1435, cr_loss=0.3449, over 15983.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.133, cr_loss=0.3496, over 3366511.36 frames. ], batch size: 74, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:57:00,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=15.0 2024-09-24 10:57:11,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=485832.6666666667, ans=0.05 2024-09-24 10:57:32,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=485879.3333333333, ans=0.0 2024-09-24 10:57:35,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2024-09-24 10:57:49,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485926.0, ans=0.1 2024-09-24 10:57:50,314 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.262e+02 1.376e+02 1.500e+02 2.364e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-24 10:58:07,805 INFO [train.py:1198] (0/4) Epoch 27, batch 2850, loss[loss=0.1681, ctc_loss=0.1089, cr_loss=0.2959, over 16705.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1338, cr_loss=0.3507, over 3358892.25 frames. ], batch size: 37, lr: 4.41e-03, grad_scale: 32.0 2024-09-24 10:58:19,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=486019.3333333333, ans=0.025 2024-09-24 10:58:27,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2024-09-24 10:58:54,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=486159.3333333333, ans=0.125 2024-09-24 10:59:19,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=486206.0, ans=0.0 2024-09-24 10:59:32,995 INFO [train.py:1198] (0/4) Epoch 27, batch 2900, loss[loss=0.1775, ctc_loss=0.1147, cr_loss=0.3138, over 17074.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1329, cr_loss=0.3488, over 3348027.84 frames. ], batch size: 43, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 10:59:33,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=486252.6666666667, ans=0.125 2024-09-24 10:59:33,868 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2024-09-24 10:59:34,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486252.6666666667, ans=0.1 2024-09-24 10:59:38,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=486252.6666666667, ans=0.125 2024-09-24 11:00:09,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=486346.0, ans=0.025 2024-09-24 11:00:31,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=486392.6666666667, ans=0.125 2024-09-24 11:00:36,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=486392.6666666667, ans=0.0 2024-09-24 11:00:37,811 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.250e+02 1.339e+02 1.437e+02 2.331e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 11:00:44,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=486439.3333333333, ans=0.125 2024-09-24 11:00:55,786 INFO [train.py:1198] (0/4) Epoch 27, batch 2950, loss[loss=0.1939, ctc_loss=0.1262, cr_loss=0.3384, over 17263.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1331, cr_loss=0.3495, over 3345609.54 frames. ], batch size: 44, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:00:58,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-09-24 11:01:00,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=486486.0, ans=0.025 2024-09-24 11:01:06,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=22.5 2024-09-24 11:01:53,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=486626.0, ans=0.0 2024-09-24 11:01:53,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=486626.0, ans=0.125 2024-09-24 11:01:54,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=486626.0, ans=0.025 2024-09-24 11:02:14,942 INFO [train.py:1198] (0/4) Epoch 27, batch 3000, loss[loss=0.1948, ctc_loss=0.1275, cr_loss=0.3363, over 16937.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1339, cr_loss=0.3508, over 3332213.56 frames. ], batch size: 58, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:02:14,942 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 11:02:30,463 INFO [train.py:1230] (0/4) Epoch 27, validation: loss=0.03681, ctc_loss=0.03681, cr_loss=8.353e-15, over 944034.00 frames. 2024-09-24 11:02:30,464 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 11:03:18,083 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 11:03:24,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=486859.3333333333, ans=0.125 2024-09-24 11:03:32,006 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.263e+02 1.339e+02 1.435e+02 2.051e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 11:03:47,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=486952.6666666667, ans=0.0 2024-09-24 11:03:49,184 INFO [train.py:1198] (0/4) Epoch 27, batch 3050, loss[loss=0.2323, ctc_loss=0.1519, cr_loss=0.4019, over 17025.00 frames. ], tot_loss[loss=0.2039, ctc_loss=0.1338, cr_loss=0.3508, over 3335707.67 frames. ], batch size: 52, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:03:52,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=486952.6666666667, ans=0.0 2024-09-24 11:04:06,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486999.3333333333, ans=0.1 2024-09-24 11:04:08,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2024-09-24 11:04:27,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487046.0, ans=0.1 2024-09-24 11:04:29,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2024-09-24 11:04:30,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=487046.0, ans=0.125 2024-09-24 11:04:38,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487092.6666666667, ans=0.1 2024-09-24 11:04:41,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=487092.6666666667, ans=0.125 2024-09-24 11:04:53,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=487139.3333333333, ans=0.125 2024-09-24 11:05:07,555 INFO [train.py:1198] (0/4) Epoch 27, batch 3100, loss[loss=0.2084, ctc_loss=0.1388, cr_loss=0.3478, over 16632.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1342, cr_loss=0.3515, over 3343891.35 frames. ], batch size: 61, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:05:13,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-09-24 11:05:37,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487232.6666666667, ans=0.1 2024-09-24 11:05:39,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=487232.6666666667, ans=0.1 2024-09-24 11:05:42,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=487279.3333333333, ans=0.125 2024-09-24 11:06:00,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=487326.0, ans=0.0 2024-09-24 11:06:04,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=487326.0, ans=0.07 2024-09-24 11:06:07,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487326.0, ans=0.1 2024-09-24 11:06:11,765 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.251e+02 1.335e+02 1.455e+02 2.261e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 11:06:24,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=487372.6666666667, ans=0.125 2024-09-24 11:06:28,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2024-09-24 11:06:28,981 INFO [train.py:1198] (0/4) Epoch 27, batch 3150, loss[loss=0.2078, ctc_loss=0.1367, cr_loss=0.3557, over 17197.00 frames. ], tot_loss[loss=0.2041, ctc_loss=0.1338, cr_loss=0.3514, over 3356552.88 frames. ], batch size: 55, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:06:41,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=487419.3333333333, ans=0.0 2024-09-24 11:06:55,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487466.0, ans=0.1 2024-09-24 11:07:16,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=487559.3333333333, ans=0.025 2024-09-24 11:07:33,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=487559.3333333333, ans=0.0 2024-09-24 11:07:36,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2024-09-24 11:07:37,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=487606.0, ans=0.0 2024-09-24 11:07:51,272 INFO [train.py:1198] (0/4) Epoch 27, batch 3200, loss[loss=0.221, ctc_loss=0.1444, cr_loss=0.383, over 17037.00 frames. ], tot_loss[loss=0.2059, ctc_loss=0.1351, cr_loss=0.3538, over 3350829.06 frames. ], batch size: 39, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:08:02,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=487652.6666666667, ans=0.125 2024-09-24 11:08:04,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=487652.6666666667, ans=0.0 2024-09-24 11:08:05,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=487699.3333333333, ans=0.125 2024-09-24 11:08:17,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=487699.3333333333, ans=0.025 2024-09-24 11:08:52,488 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.262e+02 1.380e+02 1.502e+02 1.892e+02, threshold=2.760e+02, percent-clipped=0.0 2024-09-24 11:09:09,786 INFO [train.py:1198] (0/4) Epoch 27, batch 3250, loss[loss=0.1972, ctc_loss=0.1269, cr_loss=0.3516, over 17063.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1347, cr_loss=0.3534, over 3352979.48 frames. ], batch size: 43, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:09:10,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=487886.0, ans=0.025 2024-09-24 11:09:38,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=22.5 2024-09-24 11:09:55,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=488026.0, ans=0.125 2024-09-24 11:10:12,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488072.6666666667, ans=0.1 2024-09-24 11:10:27,842 INFO [train.py:1198] (0/4) Epoch 27, batch 3300, loss[loss=0.2329, ctc_loss=0.1579, cr_loss=0.3753, over 17211.00 frames. ], tot_loss[loss=0.2064, ctc_loss=0.1356, cr_loss=0.3541, over 3336958.44 frames. ], batch size: 50, lr: 4.40e-03, grad_scale: 32.0 2024-09-24 11:10:31,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=488119.3333333333, ans=0.125 2024-09-24 11:10:34,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=488119.3333333333, ans=0.2 2024-09-24 11:10:43,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488166.0, ans=0.1 2024-09-24 11:10:44,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=488166.0, ans=0.2 2024-09-24 11:10:54,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=488166.0, ans=0.125 2024-09-24 11:11:23,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=488259.3333333333, ans=0.125 2024-09-24 11:11:28,980 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.251e+02 1.338e+02 1.445e+02 2.480e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 11:11:34,771 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2024-09-24 11:11:45,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=488352.6666666667, ans=0.04949747468305833 2024-09-24 11:11:46,443 INFO [train.py:1198] (0/4) Epoch 27, batch 3350, loss[loss=0.2016, ctc_loss=0.128, cr_loss=0.3678, over 17209.00 frames. ], tot_loss[loss=0.2061, ctc_loss=0.1353, cr_loss=0.354, over 3346974.40 frames. ], batch size: 47, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:11:53,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=488352.6666666667, ans=0.0 2024-09-24 11:11:55,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=488352.6666666667, ans=0.0 2024-09-24 11:11:57,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=488352.6666666667, ans=0.0 2024-09-24 11:12:13,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=22.5 2024-09-24 11:12:57,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=488539.3333333333, ans=0.125 2024-09-24 11:13:04,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=488539.3333333333, ans=0.125 2024-09-24 11:13:07,113 INFO [train.py:1198] (0/4) Epoch 27, batch 3400, loss[loss=0.2139, ctc_loss=0.1403, cr_loss=0.3683, over 17021.00 frames. ], tot_loss[loss=0.2052, ctc_loss=0.1346, cr_loss=0.353, over 3354110.52 frames. ], batch size: 52, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:13:13,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488586.0, ans=0.1 2024-09-24 11:13:13,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=488586.0, ans=0.2 2024-09-24 11:13:43,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=488679.3333333333, ans=0.125 2024-09-24 11:13:49,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=488679.3333333333, ans=0.125 2024-09-24 11:14:08,003 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.290e+02 1.371e+02 1.502e+02 1.850e+02, threshold=2.743e+02, percent-clipped=0.0 2024-09-24 11:14:11,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=488772.6666666667, ans=0.0 2024-09-24 11:14:12,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=488772.6666666667, ans=0.0 2024-09-24 11:14:14,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2024-09-24 11:14:25,088 INFO [train.py:1198] (0/4) Epoch 27, batch 3450, loss[loss=0.2517, ctc_loss=0.1706, cr_loss=0.4055, over 17199.00 frames. ], tot_loss[loss=0.2053, ctc_loss=0.1346, cr_loss=0.3533, over 3359144.93 frames. ], batch size: 55, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:14:38,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=488819.3333333333, ans=0.05 2024-09-24 11:14:58,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=488912.6666666667, ans=0.125 2024-09-24 11:14:59,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=488912.6666666667, ans=0.0 2024-09-24 11:15:44,985 INFO [train.py:1198] (0/4) Epoch 27, batch 3500, loss[loss=0.2115, ctc_loss=0.1391, cr_loss=0.362, over 17020.00 frames. ], tot_loss[loss=0.2054, ctc_loss=0.1347, cr_loss=0.3536, over 3358030.94 frames. ], batch size: 44, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:15:55,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=489052.6666666667, ans=0.125 2024-09-24 11:16:06,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=489099.3333333333, ans=0.0 2024-09-24 11:16:06,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2024-09-24 11:16:38,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-24 11:16:46,472 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.271e+02 1.363e+02 1.500e+02 3.531e+02, threshold=2.727e+02, percent-clipped=1.0 2024-09-24 11:16:46,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=489239.3333333333, ans=0.125 2024-09-24 11:16:51,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=489239.3333333333, ans=0.2 2024-09-24 11:16:53,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=489239.3333333333, ans=10.0 2024-09-24 11:17:05,735 INFO [train.py:1198] (0/4) Epoch 27, batch 3550, loss[loss=0.2209, ctc_loss=0.1448, cr_loss=0.3808, over 17232.00 frames. ], tot_loss[loss=0.2058, ctc_loss=0.135, cr_loss=0.354, over 3350447.64 frames. ], batch size: 50, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:17:07,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=489286.0, ans=0.025 2024-09-24 11:17:09,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=489286.0, ans=0.125 2024-09-24 11:17:32,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-09-24 11:17:49,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=489379.3333333333, ans=0.125 2024-09-24 11:17:51,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=489379.3333333333, ans=0.125 2024-09-24 11:18:25,762 INFO [train.py:1198] (0/4) Epoch 27, batch 3600, loss[loss=0.2274, ctc_loss=0.1517, cr_loss=0.3786, over 16725.00 frames. ], tot_loss[loss=0.2048, ctc_loss=0.1343, cr_loss=0.3525, over 3358219.70 frames. ], batch size: 61, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:18:33,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=489519.3333333333, ans=0.025 2024-09-24 11:18:48,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=489566.0, ans=0.125 2024-09-24 11:19:00,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-09-24 11:19:08,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=489612.6666666667, ans=0.125 2024-09-24 11:19:08,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=489612.6666666667, ans=0.2 2024-09-24 11:19:10,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.40 vs. limit=15.0 2024-09-24 11:19:26,810 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.270e+02 1.354e+02 1.435e+02 1.974e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 11:19:39,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=489706.0, ans=0.0 2024-09-24 11:19:44,336 INFO [train.py:1198] (0/4) Epoch 27, batch 3650, loss[loss=0.1905, ctc_loss=0.1273, cr_loss=0.3158, over 17292.00 frames. ], tot_loss[loss=0.2047, ctc_loss=0.1342, cr_loss=0.3525, over 3362817.57 frames. ], batch size: 46, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:19:46,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=489752.6666666667, ans=0.0 2024-09-24 11:19:46,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2024-09-24 11:19:59,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2024-09-24 11:20:16,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=489846.0, ans=0.1 2024-09-24 11:20:27,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-24 11:20:35,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=489892.6666666667, ans=0.125 2024-09-24 11:20:35,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=489892.6666666667, ans=0.0 2024-09-24 11:20:35,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=489892.6666666667, ans=0.04949747468305833 2024-09-24 11:20:38,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=489892.6666666667, ans=0.0 2024-09-24 11:20:42,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=489892.6666666667, ans=0.0 2024-09-24 11:20:44,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2024-09-24 11:20:59,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=489939.3333333333, ans=0.125 2024-09-24 11:21:03,532 INFO [train.py:1198] (0/4) Epoch 27, batch 3700, loss[loss=0.1938, ctc_loss=0.1228, cr_loss=0.3552, over 17294.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1344, cr_loss=0.353, over 3358249.42 frames. ], batch size: 46, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:21:03,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=489986.0, ans=0.125 2024-09-24 11:21:17,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490032.6666666667, ans=0.1 2024-09-24 11:21:36,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=490079.3333333333, ans=0.0 2024-09-24 11:21:57,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=490126.0, ans=0.0 2024-09-24 11:22:05,031 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.302e+02 1.450e+02 1.561e+02 3.629e+02, threshold=2.900e+02, percent-clipped=2.0 2024-09-24 11:22:21,897 INFO [train.py:1198] (0/4) Epoch 27, batch 3750, loss[loss=0.236, ctc_loss=0.1553, cr_loss=0.4034, over 16126.00 frames. ], tot_loss[loss=0.2055, ctc_loss=0.1348, cr_loss=0.3534, over 3344839.54 frames. ], batch size: 74, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:22:41,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=490266.0, ans=0.0 2024-09-24 11:23:27,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490406.0, ans=0.1 2024-09-24 11:23:39,568 INFO [train.py:1198] (0/4) Epoch 27, batch 3800, loss[loss=0.2071, ctc_loss=0.1383, cr_loss=0.344, over 15228.00 frames. ], tot_loss[loss=0.2083, ctc_loss=0.1369, cr_loss=0.3568, over 3313563.70 frames. ], batch size: 89, lr: 4.39e-03, grad_scale: 32.0 2024-09-24 11:24:40,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=490592.6666666667, ans=0.125 2024-09-24 11:24:41,922 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.297e+02 1.423e+02 1.568e+02 2.554e+02, threshold=2.846e+02, percent-clipped=0.0 2024-09-24 11:24:59,149 INFO [train.py:1198] (0/4) Epoch 27, batch 3850, loss[loss=0.1851, ctc_loss=0.1177, cr_loss=0.337, over 16778.00 frames. ], tot_loss[loss=0.2109, ctc_loss=0.1391, cr_loss=0.3589, over 3261702.34 frames. ], batch size: 37, lr: 4.38e-03, grad_scale: 32.0 2024-09-24 11:25:55,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=490826.0, ans=0.125 2024-09-24 11:25:55,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=22.5 2024-09-24 11:26:04,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=490872.6666666667, ans=0.125 2024-09-24 11:26:09,860 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-27.pt 2024-09-24 11:27:00,315 INFO [train.py:1198] (0/4) Epoch 28, batch 0, loss[loss=0.19, ctc_loss=0.1234, cr_loss=0.333, over 17138.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1234, cr_loss=0.333, over 17138.00 frames. ], batch size: 45, lr: 4.30e-03, grad_scale: 32.0 2024-09-24 11:27:00,316 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 11:27:15,946 INFO [train.py:1230] (0/4) Epoch 28, validation: loss=0.03666, ctc_loss=0.03666, cr_loss=9.126e-15, over 944034.00 frames. 2024-09-24 11:27:15,947 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 11:27:41,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=490947.3333333333, ans=10.0 2024-09-24 11:27:49,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=490994.0, ans=0.0 2024-09-24 11:27:49,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=490994.0, ans=0.125 2024-09-24 11:27:59,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=490994.0, ans=0.125 2024-09-24 11:28:11,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=491040.6666666667, ans=0.125 2024-09-24 11:28:27,521 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.316e+02 1.494e+02 1.657e+02 3.455e+02, threshold=2.987e+02, percent-clipped=1.0 2024-09-24 11:28:31,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=491087.3333333333, ans=0.0 2024-09-24 11:28:38,882 INFO [train.py:1198] (0/4) Epoch 28, batch 50, loss[loss=0.1992, ctc_loss=0.1292, cr_loss=0.3497, over 17235.00 frames. ], tot_loss[loss=0.2076, ctc_loss=0.1366, cr_loss=0.355, over 745509.42 frames. ], batch size: 55, lr: 4.30e-03, grad_scale: 32.0 2024-09-24 11:28:53,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=491180.6666666667, ans=0.2 2024-09-24 11:29:02,200 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-09-24 11:29:09,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2024-09-24 11:29:14,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491227.3333333333, ans=0.1 2024-09-24 11:29:26,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=491274.0, ans=0.125 2024-09-24 11:29:27,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=491274.0, ans=0.07 2024-09-24 11:29:33,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-09-24 11:29:58,610 INFO [train.py:1198] (0/4) Epoch 28, batch 100, loss[loss=0.1613, ctc_loss=0.1021, cr_loss=0.2963, over 17204.00 frames. ], tot_loss[loss=0.2069, ctc_loss=0.136, cr_loss=0.3548, over 1323847.01 frames. ], batch size: 41, lr: 4.30e-03, grad_scale: 32.0 2024-09-24 11:30:16,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491414.0, ans=0.1 2024-09-24 11:30:22,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=491414.0, ans=0.125 2024-09-24 11:30:22,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=491414.0, ans=0.125 2024-09-24 11:30:24,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=491414.0, ans=0.07 2024-09-24 11:30:57,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=491507.3333333333, ans=0.125 2024-09-24 11:31:06,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491554.0, ans=0.1 2024-09-24 11:31:11,239 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.221e+02 1.303e+02 1.406e+02 2.036e+02, threshold=2.607e+02, percent-clipped=0.0 2024-09-24 11:31:20,725 INFO [train.py:1198] (0/4) Epoch 28, batch 150, loss[loss=0.1763, ctc_loss=0.1163, cr_loss=0.3001, over 17213.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1338, cr_loss=0.3493, over 1764846.59 frames. ], batch size: 47, lr: 4.30e-03, grad_scale: 32.0 2024-09-24 11:31:22,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=491600.6666666667, ans=0.125 2024-09-24 11:31:40,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=491647.3333333333, ans=0.0 2024-09-24 11:31:55,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=491694.0, ans=0.0 2024-09-24 11:31:59,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=491694.0, ans=0.025 2024-09-24 11:32:42,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=491787.3333333333, ans=0.0 2024-09-24 11:32:48,352 INFO [train.py:1198] (0/4) Epoch 28, batch 200, loss[loss=0.222, ctc_loss=0.1455, cr_loss=0.3826, over 16474.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1322, cr_loss=0.3471, over 2125961.60 frames. ], batch size: 66, lr: 4.30e-03, grad_scale: 16.0 2024-09-24 11:32:50,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=491834.0, ans=0.0 2024-09-24 11:33:07,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=491880.6666666667, ans=0.125 2024-09-24 11:33:17,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=491880.6666666667, ans=0.025 2024-09-24 11:33:52,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=492020.6666666667, ans=0.2 2024-09-24 11:33:52,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=492020.6666666667, ans=0.0 2024-09-24 11:33:57,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2024-09-24 11:34:00,088 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.241e+02 1.331e+02 1.443e+02 1.741e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-24 11:34:02,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492020.6666666667, ans=0.1 2024-09-24 11:34:06,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=492067.3333333333, ans=0.0 2024-09-24 11:34:08,076 INFO [train.py:1198] (0/4) Epoch 28, batch 250, loss[loss=0.2132, ctc_loss=0.1401, cr_loss=0.3653, over 17078.00 frames. ], tot_loss[loss=0.2026, ctc_loss=0.1329, cr_loss=0.3486, over 2391026.13 frames. ], batch size: 46, lr: 4.30e-03, grad_scale: 16.0 2024-09-24 11:34:51,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-24 11:35:18,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=22.5 2024-09-24 11:35:26,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=492254.0, ans=0.02 2024-09-24 11:35:30,771 INFO [train.py:1198] (0/4) Epoch 28, batch 300, loss[loss=0.1709, ctc_loss=0.1115, cr_loss=0.2975, over 17204.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.1331, cr_loss=0.3498, over 2610666.95 frames. ], batch size: 41, lr: 4.30e-03, grad_scale: 16.0 2024-09-24 11:35:37,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=492300.6666666667, ans=0.125 2024-09-24 11:35:45,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=492347.3333333333, ans=0.025 2024-09-24 11:36:03,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-24 11:36:05,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-09-24 11:36:07,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492394.0, ans=0.1 2024-09-24 11:36:38,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=492487.3333333333, ans=0.125 2024-09-24 11:36:42,616 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.243e+02 1.313e+02 1.459e+02 2.703e+02, threshold=2.626e+02, percent-clipped=1.0 2024-09-24 11:36:50,619 INFO [train.py:1198] (0/4) Epoch 28, batch 350, loss[loss=0.2114, ctc_loss=0.1408, cr_loss=0.3529, over 17262.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1334, cr_loss=0.3509, over 2764018.75 frames. ], batch size: 44, lr: 4.30e-03, grad_scale: 16.0 2024-09-24 11:37:09,971 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2024-09-24 11:37:15,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-09-24 11:37:16,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=492580.6666666667, ans=0.125 2024-09-24 11:37:22,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=492580.6666666667, ans=0.125 2024-09-24 11:37:46,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=492674.0, ans=0.5 2024-09-24 11:37:46,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=492674.0, ans=0.125 2024-09-24 11:38:05,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=492720.6666666667, ans=0.125 2024-09-24 11:38:07,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=492720.6666666667, ans=0.1 2024-09-24 11:38:18,685 INFO [train.py:1198] (0/4) Epoch 28, batch 400, loss[loss=0.2146, ctc_loss=0.1413, cr_loss=0.3664, over 17299.00 frames. ], tot_loss[loss=0.2032, ctc_loss=0.1332, cr_loss=0.3504, over 2906097.59 frames. ], batch size: 49, lr: 4.29e-03, grad_scale: 32.0 2024-09-24 11:38:22,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=22.5 2024-09-24 11:38:33,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=12.0 2024-09-24 11:38:59,786 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2024-09-24 11:39:00,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=492860.6666666667, ans=0.125 2024-09-24 11:39:10,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=492907.3333333333, ans=0.0 2024-09-24 11:39:16,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=492907.3333333333, ans=0.07 2024-09-24 11:39:30,617 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.218e+02 1.284e+02 1.430e+02 3.216e+02, threshold=2.568e+02, percent-clipped=1.0 2024-09-24 11:39:38,504 INFO [train.py:1198] (0/4) Epoch 28, batch 450, loss[loss=0.1899, ctc_loss=0.1225, cr_loss=0.3367, over 17333.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1332, cr_loss=0.3506, over 3009130.52 frames. ], batch size: 48, lr: 4.29e-03, grad_scale: 32.0 2024-09-24 11:39:53,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=493047.3333333333, ans=0.125 2024-09-24 11:40:08,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-09-24 11:40:32,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=493140.6666666667, ans=0.125 2024-09-24 11:40:37,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=12.0 2024-09-24 11:40:50,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493187.3333333333, ans=0.125 2024-09-24 11:41:00,830 INFO [train.py:1198] (0/4) Epoch 28, batch 500, loss[loss=0.1961, ctc_loss=0.1267, cr_loss=0.3472, over 17218.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1329, cr_loss=0.3502, over 3092394.60 frames. ], batch size: 50, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:41:07,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=493234.0, ans=0.2 2024-09-24 11:41:24,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=493280.6666666667, ans=0.125 2024-09-24 11:41:42,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=493327.3333333333, ans=0.125 2024-09-24 11:41:44,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=493327.3333333333, ans=0.125 2024-09-24 11:41:53,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=493374.0, ans=0.125 2024-09-24 11:42:06,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=493374.0, ans=0.0 2024-09-24 11:42:21,660 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.063e+02 1.292e+02 1.373e+02 1.533e+02 1.928e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-24 11:42:25,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493420.6666666667, ans=0.1 2024-09-24 11:42:27,936 INFO [train.py:1198] (0/4) Epoch 28, batch 550, loss[loss=0.2404, ctc_loss=0.1588, cr_loss=0.4084, over 17024.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.132, cr_loss=0.3489, over 3163463.36 frames. ], batch size: 56, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:42:33,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=493467.3333333333, ans=0.125 2024-09-24 11:43:11,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=493560.6666666667, ans=0.1 2024-09-24 11:43:33,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=493654.0, ans=0.125 2024-09-24 11:43:39,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-09-24 11:43:47,659 INFO [train.py:1198] (0/4) Epoch 28, batch 600, loss[loss=0.2222, ctc_loss=0.1461, cr_loss=0.3808, over 17213.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1326, cr_loss=0.3496, over 3210406.15 frames. ], batch size: 55, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:43:54,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=493700.6666666667, ans=0.125 2024-09-24 11:44:05,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=493747.3333333333, ans=0.2 2024-09-24 11:44:31,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=493794.0, ans=0.025 2024-09-24 11:44:34,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=493840.6666666667, ans=0.95 2024-09-24 11:45:04,824 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.264e+02 1.373e+02 1.490e+02 2.458e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-24 11:45:06,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=493887.3333333333, ans=0.0 2024-09-24 11:45:11,170 INFO [train.py:1198] (0/4) Epoch 28, batch 650, loss[loss=0.2153, ctc_loss=0.1425, cr_loss=0.3643, over 17360.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1328, cr_loss=0.3501, over 3234836.19 frames. ], batch size: 48, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:45:21,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493934.0, ans=0.125 2024-09-24 11:46:31,267 INFO [train.py:1198] (0/4) Epoch 28, batch 700, loss[loss=0.2061, ctc_loss=0.136, cr_loss=0.3503, over 17359.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.132, cr_loss=0.3484, over 3267489.20 frames. ], batch size: 48, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:46:33,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=494167.3333333333, ans=0.04949747468305833 2024-09-24 11:46:47,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=494214.0, ans=10.0 2024-09-24 11:47:52,209 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.257e+02 1.354e+02 1.480e+02 2.179e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 11:47:58,828 INFO [train.py:1198] (0/4) Epoch 28, batch 750, loss[loss=0.2299, ctc_loss=0.1544, cr_loss=0.3775, over 16558.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.133, cr_loss=0.3504, over 3293691.01 frames. ], batch size: 66, lr: 4.29e-03, grad_scale: 16.0 2024-09-24 11:48:02,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=494400.6666666667, ans=0.125 2024-09-24 11:48:03,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=494400.6666666667, ans=0.0 2024-09-24 11:48:13,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=494447.3333333333, ans=0.0 2024-09-24 11:48:32,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=494494.0, ans=0.025 2024-09-24 11:48:45,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=494540.6666666667, ans=0.0 2024-09-24 11:48:51,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494540.6666666667, ans=0.1 2024-09-24 11:49:18,698 INFO [train.py:1198] (0/4) Epoch 28, batch 800, loss[loss=0.1981, ctc_loss=0.1271, cr_loss=0.355, over 17030.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.1322, cr_loss=0.3487, over 3305679.62 frames. ], batch size: 39, lr: 4.29e-03, grad_scale: 32.0 2024-09-24 11:49:38,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=494680.6666666667, ans=0.125 2024-09-24 11:50:02,616 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 11:50:04,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=494727.3333333333, ans=0.0 2024-09-24 11:50:10,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=494774.0, ans=0.2 2024-09-24 11:50:21,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=494774.0, ans=0.125 2024-09-24 11:50:22,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=494774.0, ans=15.0 2024-09-24 11:50:33,711 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.239e+02 1.301e+02 1.424e+02 2.595e+02, threshold=2.602e+02, percent-clipped=0.0 2024-09-24 11:50:40,253 INFO [train.py:1198] (0/4) Epoch 28, batch 850, loss[loss=0.2034, ctc_loss=0.1335, cr_loss=0.3493, over 17061.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1323, cr_loss=0.3491, over 3318061.60 frames. ], batch size: 39, lr: 4.29e-03, grad_scale: 32.0 2024-09-24 11:50:48,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=494867.3333333333, ans=0.125 2024-09-24 11:51:09,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=494914.0, ans=0.125 2024-09-24 11:51:25,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=494960.6666666667, ans=0.0 2024-09-24 11:51:34,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=495007.3333333333, ans=0.0 2024-09-24 11:51:45,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=495054.0, ans=0.125 2024-09-24 11:52:04,922 INFO [train.py:1198] (0/4) Epoch 28, batch 900, loss[loss=0.176, ctc_loss=0.1143, cr_loss=0.3086, over 17083.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1324, cr_loss=0.3498, over 3333249.21 frames. ], batch size: 40, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:52:18,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=495100.6666666667, ans=0.125 2024-09-24 11:52:42,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=495194.0, ans=0.125 2024-09-24 11:52:53,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=495240.6666666667, ans=0.125 2024-09-24 11:52:53,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=495240.6666666667, ans=0.0 2024-09-24 11:53:20,249 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.272e+02 1.349e+02 1.497e+02 2.522e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-24 11:53:25,044 INFO [train.py:1198] (0/4) Epoch 28, batch 950, loss[loss=0.2271, ctc_loss=0.1526, cr_loss=0.3723, over 16997.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1333, cr_loss=0.3513, over 3327097.43 frames. ], batch size: 56, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:54:09,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2024-09-24 11:54:16,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=495474.0, ans=0.0 2024-09-24 11:54:37,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=495520.6666666667, ans=0.025 2024-09-24 11:54:40,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=495520.6666666667, ans=0.125 2024-09-24 11:54:47,631 INFO [train.py:1198] (0/4) Epoch 28, batch 1000, loss[loss=0.1839, ctc_loss=0.1169, cr_loss=0.3351, over 17302.00 frames. ], tot_loss[loss=0.2035, ctc_loss=0.1332, cr_loss=0.3516, over 3338386.99 frames. ], batch size: 42, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:55:05,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=495614.0, ans=0.125 2024-09-24 11:55:23,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=495660.6666666667, ans=0.0 2024-09-24 11:55:47,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=495707.3333333333, ans=0.125 2024-09-24 11:55:49,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=22.5 2024-09-24 11:55:53,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495754.0, ans=0.1 2024-09-24 11:56:02,812 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.229e+02 1.298e+02 1.405e+02 1.761e+02, threshold=2.595e+02, percent-clipped=0.0 2024-09-24 11:56:04,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-09-24 11:56:07,660 INFO [train.py:1198] (0/4) Epoch 28, batch 1050, loss[loss=0.2149, ctc_loss=0.142, cr_loss=0.3648, over 16655.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1331, cr_loss=0.3511, over 3339658.93 frames. ], batch size: 66, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:56:20,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=495800.6666666667, ans=0.125 2024-09-24 11:56:50,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=495894.0, ans=0.025 2024-09-24 11:57:32,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=495987.3333333333, ans=0.125 2024-09-24 11:57:35,041 INFO [train.py:1198] (0/4) Epoch 28, batch 1100, loss[loss=0.1862, ctc_loss=0.1172, cr_loss=0.345, over 17269.00 frames. ], tot_loss[loss=0.2036, ctc_loss=0.1333, cr_loss=0.3516, over 3338061.05 frames. ], batch size: 42, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:57:46,556 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 11:58:05,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=496127.3333333333, ans=0.2 2024-09-24 11:58:11,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=496127.3333333333, ans=0.0 2024-09-24 11:58:14,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=496127.3333333333, ans=0.125 2024-09-24 11:58:24,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=496174.0, ans=0.125 2024-09-24 11:58:29,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=496174.0, ans=0.0 2024-09-24 11:58:50,154 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.267e+02 1.365e+02 1.472e+02 3.756e+02, threshold=2.729e+02, percent-clipped=1.0 2024-09-24 11:58:54,888 INFO [train.py:1198] (0/4) Epoch 28, batch 1150, loss[loss=0.1786, ctc_loss=0.1156, cr_loss=0.3148, over 17174.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1342, cr_loss=0.3537, over 3344518.81 frames. ], batch size: 41, lr: 4.28e-03, grad_scale: 16.0 2024-09-24 11:58:58,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=496267.3333333333, ans=0.1 2024-09-24 11:59:03,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=496267.3333333333, ans=0.0 2024-09-24 11:59:38,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496360.6666666667, ans=0.1 2024-09-24 11:59:45,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=496407.3333333333, ans=0.0 2024-09-24 12:00:14,881 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-09-24 12:00:17,103 INFO [train.py:1198] (0/4) Epoch 28, batch 1200, loss[loss=0.1497, ctc_loss=0.09328, cr_loss=0.2819, over 17275.00 frames. ], tot_loss[loss=0.205, ctc_loss=0.1343, cr_loss=0.3537, over 3352127.65 frames. ], batch size: 42, lr: 4.28e-03, grad_scale: 32.0 2024-09-24 12:00:23,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=496500.6666666667, ans=0.0 2024-09-24 12:00:30,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=496500.6666666667, ans=0.0 2024-09-24 12:00:32,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=496547.3333333333, ans=0.2 2024-09-24 12:00:38,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.68 vs. limit=22.5 2024-09-24 12:00:57,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=496594.0, ans=0.0 2024-09-24 12:00:58,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=496594.0, ans=0.125 2024-09-24 12:01:21,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.64 vs. limit=15.0 2024-09-24 12:01:31,411 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=22.5 2024-09-24 12:01:32,307 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.289e+02 1.355e+02 1.454e+02 3.122e+02, threshold=2.710e+02, percent-clipped=1.0 2024-09-24 12:01:37,160 INFO [train.py:1198] (0/4) Epoch 28, batch 1250, loss[loss=0.212, ctc_loss=0.1391, cr_loss=0.3645, over 17342.00 frames. ], tot_loss[loss=0.2053, ctc_loss=0.1345, cr_loss=0.354, over 3349469.43 frames. ], batch size: 48, lr: 4.28e-03, grad_scale: 32.0 2024-09-24 12:03:03,732 INFO [train.py:1198] (0/4) Epoch 28, batch 1300, loss[loss=0.2043, ctc_loss=0.134, cr_loss=0.3516, over 17148.00 frames. ], tot_loss[loss=0.2057, ctc_loss=0.1348, cr_loss=0.3544, over 3349678.68 frames. ], batch size: 48, lr: 4.28e-03, grad_scale: 32.0 2024-09-24 12:03:22,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=497014.0, ans=0.125 2024-09-24 12:03:29,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=497014.0, ans=0.125 2024-09-24 12:03:55,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0 2024-09-24 12:04:12,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=497154.0, ans=0.125 2024-09-24 12:04:17,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-09-24 12:04:18,679 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.268e+02 1.342e+02 1.439e+02 2.491e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-24 12:04:23,560 INFO [train.py:1198] (0/4) Epoch 28, batch 1350, loss[loss=0.1894, ctc_loss=0.1268, cr_loss=0.3129, over 16915.00 frames. ], tot_loss[loss=0.2045, ctc_loss=0.1341, cr_loss=0.3523, over 3353389.44 frames. ], batch size: 58, lr: 4.28e-03, grad_scale: 32.0 2024-09-24 12:04:27,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=22.5 2024-09-24 12:04:41,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=497247.3333333333, ans=0.1 2024-09-24 12:04:46,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497247.3333333333, ans=0.1 2024-09-24 12:04:58,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=497294.0, ans=0.09899494936611666 2024-09-24 12:05:41,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=497387.3333333333, ans=0.0 2024-09-24 12:05:44,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=497434.0, ans=0.125 2024-09-24 12:05:44,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=497434.0, ans=0.125 2024-09-24 12:05:45,900 INFO [train.py:1198] (0/4) Epoch 28, batch 1400, loss[loss=0.2107, ctc_loss=0.1402, cr_loss=0.3523, over 17011.00 frames. ], tot_loss[loss=0.2042, ctc_loss=0.1338, cr_loss=0.3521, over 3361818.98 frames. ], batch size: 53, lr: 4.27e-03, grad_scale: 16.0 2024-09-24 12:05:50,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=497434.0, ans=0.025 2024-09-24 12:06:07,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497480.6666666667, ans=0.1 2024-09-24 12:06:11,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=497480.6666666667, ans=0.125 2024-09-24 12:06:29,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=497527.3333333333, ans=0.125 2024-09-24 12:06:38,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=497574.0, ans=0.0 2024-09-24 12:06:52,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2024-09-24 12:07:00,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=497620.6666666667, ans=0.2 2024-09-24 12:07:08,058 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.287e+02 1.409e+02 1.543e+02 2.036e+02, threshold=2.818e+02, percent-clipped=0.0 2024-09-24 12:07:10,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=497667.3333333333, ans=0.125 2024-09-24 12:07:11,203 INFO [train.py:1198] (0/4) Epoch 28, batch 1450, loss[loss=0.1862, ctc_loss=0.1187, cr_loss=0.3375, over 17061.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1327, cr_loss=0.3498, over 3368477.27 frames. ], batch size: 46, lr: 4.27e-03, grad_scale: 16.0 2024-09-24 12:07:16,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=497667.3333333333, ans=0.125 2024-09-24 12:07:31,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-09-24 12:07:35,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=497714.0, ans=0.0 2024-09-24 12:07:40,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=497714.0, ans=10.0 2024-09-24 12:07:41,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=497760.6666666667, ans=0.0 2024-09-24 12:07:46,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=497760.6666666667, ans=0.125 2024-09-24 12:08:07,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=497807.3333333333, ans=0.0 2024-09-24 12:08:18,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=497854.0, ans=0.125 2024-09-24 12:08:21,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-09-24 12:08:26,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=497854.0, ans=0.125 2024-09-24 12:08:30,997 INFO [train.py:1198] (0/4) Epoch 28, batch 1500, loss[loss=0.2088, ctc_loss=0.1361, cr_loss=0.3634, over 17225.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1322, cr_loss=0.349, over 3370923.28 frames. ], batch size: 50, lr: 4.27e-03, grad_scale: 16.0 2024-09-24 12:09:01,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=497994.0, ans=0.125 2024-09-24 12:09:39,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=498087.3333333333, ans=0.125 2024-09-24 12:09:49,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=498087.3333333333, ans=0.125 2024-09-24 12:09:50,698 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.264e+02 1.369e+02 1.478e+02 2.054e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-24 12:09:50,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=498087.3333333333, ans=0.125 2024-09-24 12:09:53,890 INFO [train.py:1198] (0/4) Epoch 28, batch 1550, loss[loss=0.1806, ctc_loss=0.1151, cr_loss=0.3274, over 17060.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1304, cr_loss=0.3465, over 3380641.17 frames. ], batch size: 39, lr: 4.27e-03, grad_scale: 16.0 2024-09-24 12:10:25,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.21 vs. limit=12.0 2024-09-24 12:10:48,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=498274.0, ans=0.0 2024-09-24 12:10:50,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=498274.0, ans=0.125 2024-09-24 12:10:50,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=22.5 2024-09-24 12:10:53,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=498274.0, ans=0.125 2024-09-24 12:11:13,887 INFO [train.py:1198] (0/4) Epoch 28, batch 1600, loss[loss=0.2389, ctc_loss=0.1586, cr_loss=0.4012, over 14810.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.1302, cr_loss=0.3457, over 3375515.09 frames. ], batch size: 89, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:11:28,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=498414.0, ans=0.0 2024-09-24 12:11:52,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-09-24 12:12:31,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-09-24 12:12:37,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2024-09-24 12:12:38,198 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.239e+02 1.329e+02 1.427e+02 2.041e+02, threshold=2.658e+02, percent-clipped=0.0 2024-09-24 12:12:41,512 INFO [train.py:1198] (0/4) Epoch 28, batch 1650, loss[loss=0.2395, ctc_loss=0.1568, cr_loss=0.4134, over 17024.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3479, over 3371353.07 frames. ], batch size: 53, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:12:43,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=498600.6666666667, ans=0.125 2024-09-24 12:12:46,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=498600.6666666667, ans=0.125 2024-09-24 12:12:51,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=498600.6666666667, ans=0.125 2024-09-24 12:13:10,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-09-24 12:13:11,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=498694.0, ans=0.125 2024-09-24 12:13:13,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=498694.0, ans=0.125 2024-09-24 12:13:23,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2024-09-24 12:13:39,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=498740.6666666667, ans=0.125 2024-09-24 12:13:57,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=498787.3333333333, ans=0.125 2024-09-24 12:13:57,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=498787.3333333333, ans=0.125 2024-09-24 12:14:00,724 INFO [train.py:1198] (0/4) Epoch 28, batch 1700, loss[loss=0.1818, ctc_loss=0.1161, cr_loss=0.3285, over 17303.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3483, over 3366681.43 frames. ], batch size: 49, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:14:22,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=498880.6666666667, ans=0.0 2024-09-24 12:14:25,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=498880.6666666667, ans=0.2 2024-09-24 12:14:32,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2024-09-24 12:14:37,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=498927.3333333333, ans=0.125 2024-09-24 12:15:11,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=499020.6666666667, ans=0.035 2024-09-24 12:15:19,451 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.240e+02 1.314e+02 1.418e+02 1.778e+02, threshold=2.628e+02, percent-clipped=0.0 2024-09-24 12:15:20,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-09-24 12:15:22,616 INFO [train.py:1198] (0/4) Epoch 28, batch 1750, loss[loss=0.1916, ctc_loss=0.1255, cr_loss=0.3307, over 16926.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1305, cr_loss=0.3468, over 3373398.75 frames. ], batch size: 42, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:15:27,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=499067.3333333333, ans=0.125 2024-09-24 12:15:29,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=499067.3333333333, ans=0.2 2024-09-24 12:15:32,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=499067.3333333333, ans=0.125 2024-09-24 12:15:43,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=499114.0, ans=0.125 2024-09-24 12:16:47,613 INFO [train.py:1198] (0/4) Epoch 28, batch 1800, loss[loss=0.1917, ctc_loss=0.1224, cr_loss=0.3466, over 17300.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1303, cr_loss=0.3465, over 3375288.46 frames. ], batch size: 46, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:16:51,316 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=22.5 2024-09-24 12:17:26,685 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-09-24 12:17:41,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=499440.6666666667, ans=0.125 2024-09-24 12:18:03,604 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.272e+02 1.347e+02 1.424e+02 2.124e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-24 12:18:06,859 INFO [train.py:1198] (0/4) Epoch 28, batch 1850, loss[loss=0.2263, ctc_loss=0.1486, cr_loss=0.3886, over 16906.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1311, cr_loss=0.347, over 3370198.17 frames. ], batch size: 58, lr: 4.27e-03, grad_scale: 32.0 2024-09-24 12:18:14,130 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=12.0 2024-09-24 12:18:21,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=499580.6666666667, ans=0.125 2024-09-24 12:18:34,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=499580.6666666667, ans=0.0 2024-09-24 12:18:35,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499580.6666666667, ans=0.1 2024-09-24 12:18:47,029 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:18:48,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.35 vs. limit=10.0 2024-09-24 12:19:15,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=499720.6666666667, ans=0.125 2024-09-24 12:19:29,609 INFO [train.py:1198] (0/4) Epoch 28, batch 1900, loss[loss=0.2182, ctc_loss=0.1419, cr_loss=0.3817, over 17152.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3479, over 3376861.35 frames. ], batch size: 48, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:19:47,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=499814.0, ans=0.0 2024-09-24 12:19:54,478 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2024-09-24 12:20:00,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499860.6666666667, ans=0.1 2024-09-24 12:20:46,567 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.256e+02 1.349e+02 1.450e+02 2.234e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-24 12:20:49,742 INFO [train.py:1198] (0/4) Epoch 28, batch 1950, loss[loss=0.2123, ctc_loss=0.1396, cr_loss=0.3636, over 17067.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1313, cr_loss=0.3479, over 3376417.24 frames. ], batch size: 46, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:20:50,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=500000.6666666667, ans=0.2 2024-09-24 12:20:53,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=500000.6666666667, ans=0.0 2024-09-24 12:21:14,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=500047.3333333333, ans=0.0 2024-09-24 12:21:36,553 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-24 12:21:37,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=500094.0, ans=0.125 2024-09-24 12:21:38,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-09-24 12:21:49,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=500140.6666666667, ans=0.125 2024-09-24 12:21:49,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=500140.6666666667, ans=0.125 2024-09-24 12:21:54,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=22.5 2024-09-24 12:22:02,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=500187.3333333333, ans=0.2 2024-09-24 12:22:11,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=500187.3333333333, ans=0.0 2024-09-24 12:22:17,928 INFO [train.py:1198] (0/4) Epoch 28, batch 2000, loss[loss=0.2188, ctc_loss=0.1432, cr_loss=0.3778, over 17365.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1317, cr_loss=0.3488, over 3373458.58 frames. ], batch size: 48, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:23:01,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=500327.3333333333, ans=0.0 2024-09-24 12:23:05,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=500374.0, ans=0.025 2024-09-24 12:23:15,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2024-09-24 12:23:35,007 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.274e+02 1.350e+02 1.459e+02 2.226e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 12:23:38,217 INFO [train.py:1198] (0/4) Epoch 28, batch 2050, loss[loss=0.2293, ctc_loss=0.1512, cr_loss=0.3909, over 17049.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1311, cr_loss=0.3476, over 3368069.78 frames. ], batch size: 52, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:23:39,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2024-09-24 12:23:46,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=500467.3333333333, ans=0.125 2024-09-24 12:24:08,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=500560.6666666667, ans=0.1 2024-09-24 12:24:09,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=500560.6666666667, ans=0.125 2024-09-24 12:24:12,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2024-09-24 12:24:13,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=500560.6666666667, ans=0.2 2024-09-24 12:24:40,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500607.3333333333, ans=0.1 2024-09-24 12:24:44,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=500654.0, ans=0.125 2024-09-24 12:24:47,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=500654.0, ans=0.07 2024-09-24 12:24:49,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=500654.0, ans=0.0 2024-09-24 12:24:49,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=500654.0, ans=0.125 2024-09-24 12:25:00,544 INFO [train.py:1198] (0/4) Epoch 28, batch 2100, loss[loss=0.2231, ctc_loss=0.1457, cr_loss=0.3867, over 17041.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1307, cr_loss=0.3469, over 3379292.60 frames. ], batch size: 52, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:25:05,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500700.6666666667, ans=0.1 2024-09-24 12:25:08,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=500700.6666666667, ans=0.0 2024-09-24 12:25:12,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=22.5 2024-09-24 12:25:48,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500840.6666666667, ans=0.1 2024-09-24 12:26:04,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=500887.3333333333, ans=0.0 2024-09-24 12:26:05,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=500887.3333333333, ans=0.125 2024-09-24 12:26:16,783 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.293e+02 1.395e+02 1.538e+02 2.200e+02, threshold=2.790e+02, percent-clipped=0.0 2024-09-24 12:26:25,165 INFO [train.py:1198] (0/4) Epoch 28, batch 2150, loss[loss=0.226, ctc_loss=0.151, cr_loss=0.3754, over 16455.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1314, cr_loss=0.3485, over 3371007.09 frames. ], batch size: 66, lr: 4.26e-03, grad_scale: 32.0 2024-09-24 12:26:55,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=500980.6666666667, ans=10.0 2024-09-24 12:27:25,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=501074.0, ans=0.1 2024-09-24 12:27:28,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=501074.0, ans=0.04949747468305833 2024-09-24 12:27:47,642 INFO [train.py:1198] (0/4) Epoch 28, batch 2200, loss[loss=0.1983, ctc_loss=0.1315, cr_loss=0.3337, over 17346.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.132, cr_loss=0.3491, over 3370748.20 frames. ], batch size: 48, lr: 4.26e-03, grad_scale: 16.0 2024-09-24 12:27:51,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=501167.3333333333, ans=0.2 2024-09-24 12:28:10,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501214.0, ans=0.1 2024-09-24 12:28:24,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=501260.6666666667, ans=0.2 2024-09-24 12:28:29,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=501260.6666666667, ans=0.125 2024-09-24 12:28:58,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=501354.0, ans=0.0 2024-09-24 12:29:06,267 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.288e+02 1.410e+02 1.537e+02 2.040e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-24 12:29:10,565 INFO [train.py:1198] (0/4) Epoch 28, batch 2250, loss[loss=0.2507, ctc_loss=0.1673, cr_loss=0.4168, over 14992.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1321, cr_loss=0.3487, over 3366819.50 frames. ], batch size: 89, lr: 4.26e-03, grad_scale: 16.0 2024-09-24 12:29:52,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=501494.0, ans=0.0 2024-09-24 12:30:05,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=501540.6666666667, ans=0.0 2024-09-24 12:30:29,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=501634.0, ans=0.2 2024-09-24 12:30:30,753 INFO [train.py:1198] (0/4) Epoch 28, batch 2300, loss[loss=0.215, ctc_loss=0.1405, cr_loss=0.3724, over 17208.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1315, cr_loss=0.3477, over 3372075.60 frames. ], batch size: 50, lr: 4.26e-03, grad_scale: 16.0 2024-09-24 12:30:42,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=501634.0, ans=0.025 2024-09-24 12:31:49,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-24 12:31:50,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=501820.6666666667, ans=0.025 2024-09-24 12:31:56,841 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.285e+02 1.365e+02 1.487e+02 3.397e+02, threshold=2.730e+02, percent-clipped=1.0 2024-09-24 12:31:58,416 INFO [train.py:1198] (0/4) Epoch 28, batch 2350, loss[loss=0.203, ctc_loss=0.1329, cr_loss=0.3506, over 17224.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1315, cr_loss=0.3478, over 3367465.87 frames. ], batch size: 47, lr: 4.26e-03, grad_scale: 16.0 2024-09-24 12:32:03,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=501867.3333333333, ans=0.125 2024-09-24 12:32:17,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=501914.0, ans=0.125 2024-09-24 12:32:24,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=501914.0, ans=0.5 2024-09-24 12:32:32,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=501960.6666666667, ans=0.0 2024-09-24 12:33:17,993 INFO [train.py:1198] (0/4) Epoch 28, batch 2400, loss[loss=0.2147, ctc_loss=0.1418, cr_loss=0.3646, over 17003.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1314, cr_loss=0.3479, over 3364380.36 frames. ], batch size: 53, lr: 4.25e-03, grad_scale: 32.0 2024-09-24 12:33:19,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502100.6666666667, ans=0.1 2024-09-24 12:34:39,437 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.025e+02 1.258e+02 1.381e+02 1.475e+02 2.172e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-24 12:34:41,086 INFO [train.py:1198] (0/4) Epoch 28, batch 2450, loss[loss=0.2434, ctc_loss=0.1685, cr_loss=0.3744, over 12172.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1316, cr_loss=0.3475, over 3345034.23 frames. ], batch size: 123, lr: 4.25e-03, grad_scale: 32.0 2024-09-24 12:34:54,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=502334.0, ans=0.0 2024-09-24 12:34:57,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2024-09-24 12:35:18,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=502427.3333333333, ans=0.125 2024-09-24 12:36:00,606 INFO [train.py:1198] (0/4) Epoch 28, batch 2500, loss[loss=0.1906, ctc_loss=0.1231, cr_loss=0.3374, over 17068.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1321, cr_loss=0.3485, over 3347960.20 frames. ], batch size: 46, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:36:08,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=502567.3333333333, ans=0.125 2024-09-24 12:36:39,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=502660.6666666667, ans=0.09899494936611666 2024-09-24 12:37:28,345 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.269e+02 1.387e+02 1.478e+02 2.068e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-24 12:37:28,374 INFO [train.py:1198] (0/4) Epoch 28, batch 2550, loss[loss=0.1761, ctc_loss=0.1098, cr_loss=0.3312, over 17088.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1319, cr_loss=0.3487, over 3356293.79 frames. ], batch size: 43, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:37:48,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=502847.3333333333, ans=0.0 2024-09-24 12:37:57,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=502847.3333333333, ans=0.1 2024-09-24 12:38:02,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=502894.0, ans=0.125 2024-09-24 12:38:07,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=502894.0, ans=0.125 2024-09-24 12:38:21,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=502940.6666666667, ans=0.0 2024-09-24 12:38:32,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=502987.3333333333, ans=0.125 2024-09-24 12:38:48,320 INFO [train.py:1198] (0/4) Epoch 28, batch 2600, loss[loss=0.2396, ctc_loss=0.1607, cr_loss=0.3943, over 14881.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.132, cr_loss=0.3491, over 3351351.67 frames. ], batch size: 89, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:39:29,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=503127.3333333333, ans=0.0 2024-09-24 12:40:03,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=503220.6666666667, ans=0.125 2024-09-24 12:40:10,912 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.254e+02 1.337e+02 1.417e+02 2.017e+02, threshold=2.675e+02, percent-clipped=0.0 2024-09-24 12:40:10,936 INFO [train.py:1198] (0/4) Epoch 28, batch 2650, loss[loss=0.1953, ctc_loss=0.1279, cr_loss=0.3369, over 17228.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1325, cr_loss=0.3504, over 3353001.09 frames. ], batch size: 50, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:40:11,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=503267.3333333333, ans=0.0 2024-09-24 12:40:12,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=503267.3333333333, ans=0.125 2024-09-24 12:40:14,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=503267.3333333333, ans=0.125 2024-09-24 12:40:15,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=503267.3333333333, ans=0.0 2024-09-24 12:40:20,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503267.3333333333, ans=0.1 2024-09-24 12:40:32,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=503314.0, ans=0.2 2024-09-24 12:40:36,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503314.0, ans=0.1 2024-09-24 12:40:44,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=503360.6666666667, ans=0.2 2024-09-24 12:40:53,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.38 vs. limit=5.0 2024-09-24 12:41:17,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=12.0 2024-09-24 12:41:37,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=503500.6666666667, ans=0.05 2024-09-24 12:41:38,426 INFO [train.py:1198] (0/4) Epoch 28, batch 2700, loss[loss=0.2174, ctc_loss=0.1445, cr_loss=0.3647, over 17228.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1327, cr_loss=0.351, over 3352306.68 frames. ], batch size: 55, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:41:38,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=503500.6666666667, ans=0.125 2024-09-24 12:41:53,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=503547.3333333333, ans=0.09899494936611666 2024-09-24 12:41:54,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503547.3333333333, ans=0.1 2024-09-24 12:41:54,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503547.3333333333, ans=0.125 2024-09-24 12:41:59,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=503547.3333333333, ans=0.025 2024-09-24 12:42:04,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=503547.3333333333, ans=0.2 2024-09-24 12:42:17,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=503594.0, ans=0.125 2024-09-24 12:42:55,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=503687.3333333333, ans=0.0 2024-09-24 12:42:58,695 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.278e+02 1.354e+02 1.486e+02 3.287e+02, threshold=2.708e+02, percent-clipped=2.0 2024-09-24 12:42:58,720 INFO [train.py:1198] (0/4) Epoch 28, batch 2750, loss[loss=0.2427, ctc_loss=0.1643, cr_loss=0.3923, over 17009.00 frames. ], tot_loss[loss=0.204, ctc_loss=0.1336, cr_loss=0.3522, over 3341884.18 frames. ], batch size: 53, lr: 4.25e-03, grad_scale: 16.0 2024-09-24 12:43:08,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=503734.0, ans=0.025 2024-09-24 12:43:13,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=503780.6666666667, ans=0.125 2024-09-24 12:43:14,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=503780.6666666667, ans=0.2 2024-09-24 12:43:22,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=503780.6666666667, ans=0.1 2024-09-24 12:43:58,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=503874.0, ans=0.0 2024-09-24 12:44:02,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=503874.0, ans=0.125 2024-09-24 12:44:20,598 INFO [train.py:1198] (0/4) Epoch 28, batch 2800, loss[loss=0.213, ctc_loss=0.1388, cr_loss=0.3708, over 17361.00 frames. ], tot_loss[loss=0.2031, ctc_loss=0.133, cr_loss=0.3508, over 3353258.41 frames. ], batch size: 48, lr: 4.25e-03, grad_scale: 32.0 2024-09-24 12:44:30,546 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-108000.pt 2024-09-24 12:44:59,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.88 vs. limit=10.0 2024-09-24 12:45:01,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=504060.6666666667, ans=0.125 2024-09-24 12:45:01,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=504060.6666666667, ans=0.125 2024-09-24 12:45:16,825 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-09-24 12:45:39,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=22.5 2024-09-24 12:45:43,156 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.267e+02 1.390e+02 1.555e+02 2.170e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-24 12:45:43,182 INFO [train.py:1198] (0/4) Epoch 28, batch 2850, loss[loss=0.2167, ctc_loss=0.139, cr_loss=0.3882, over 17229.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1335, cr_loss=0.3515, over 3343102.20 frames. ], batch size: 50, lr: 4.25e-03, grad_scale: 32.0 2024-09-24 12:45:45,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=504200.6666666667, ans=0.125 2024-09-24 12:45:49,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=504200.6666666667, ans=0.025 2024-09-24 12:45:56,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=504200.6666666667, ans=0.025 2024-09-24 12:45:57,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=504247.3333333333, ans=0.025 2024-09-24 12:46:03,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=12.0 2024-09-24 12:46:06,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504247.3333333333, ans=0.1 2024-09-24 12:46:16,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.57 vs. limit=5.0 2024-09-24 12:46:27,095 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:46:34,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=504294.0, ans=0.0 2024-09-24 12:46:52,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=504340.6666666667, ans=0.125 2024-09-24 12:46:53,725 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:47:10,934 INFO [train.py:1198] (0/4) Epoch 28, batch 2900, loss[loss=0.2161, ctc_loss=0.1393, cr_loss=0.3843, over 16795.00 frames. ], tot_loss[loss=0.2026, ctc_loss=0.1327, cr_loss=0.3495, over 3344792.38 frames. ], batch size: 61, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:47:22,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-24 12:47:25,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=504480.6666666667, ans=0.125 2024-09-24 12:47:27,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=504480.6666666667, ans=0.125 2024-09-24 12:47:32,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=504480.6666666667, ans=0.125 2024-09-24 12:47:59,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=504574.0, ans=0.0 2024-09-24 12:48:21,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=504620.6666666667, ans=0.125 2024-09-24 12:48:26,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504620.6666666667, ans=0.1 2024-09-24 12:48:28,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=504620.6666666667, ans=0.2 2024-09-24 12:48:31,272 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.289e+02 1.394e+02 1.570e+02 3.133e+02, threshold=2.787e+02, percent-clipped=1.0 2024-09-24 12:48:31,296 INFO [train.py:1198] (0/4) Epoch 28, batch 2950, loss[loss=0.2071, ctc_loss=0.1365, cr_loss=0.353, over 17106.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1329, cr_loss=0.3499, over 3352014.26 frames. ], batch size: 49, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:48:38,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=504667.3333333333, ans=0.0 2024-09-24 12:49:03,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.28 vs. limit=10.0 2024-09-24 12:49:05,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=504760.6666666667, ans=0.125 2024-09-24 12:49:06,129 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=12.0 2024-09-24 12:49:14,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=22.5 2024-09-24 12:49:36,621 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2024-09-24 12:49:53,075 INFO [train.py:1198] (0/4) Epoch 28, batch 3000, loss[loss=0.2443, ctc_loss=0.1652, cr_loss=0.3956, over 16047.00 frames. ], tot_loss[loss=0.2027, ctc_loss=0.1327, cr_loss=0.35, over 3353045.03 frames. ], batch size: 74, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:49:53,076 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 12:50:08,098 INFO [train.py:1230] (0/4) Epoch 28, validation: loss=0.03718, ctc_loss=0.03718, cr_loss=8.452e-15, over 944034.00 frames. 2024-09-24 12:50:08,099 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 12:50:10,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=12.0 2024-09-24 12:50:35,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=504947.3333333333, ans=0.125 2024-09-24 12:50:35,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=504947.3333333333, ans=15.0 2024-09-24 12:50:37,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=504947.3333333333, ans=0.025 2024-09-24 12:50:50,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=504994.0, ans=0.125 2024-09-24 12:51:06,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=505040.6666666667, ans=0.0 2024-09-24 12:51:08,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=505040.6666666667, ans=0.125 2024-09-24 12:51:12,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=505087.3333333333, ans=0.0 2024-09-24 12:51:14,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505087.3333333333, ans=0.1 2024-09-24 12:51:17,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=505087.3333333333, ans=0.125 2024-09-24 12:51:26,596 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.294e+02 1.362e+02 1.471e+02 2.139e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-24 12:51:26,623 INFO [train.py:1198] (0/4) Epoch 28, batch 3050, loss[loss=0.1996, ctc_loss=0.1335, cr_loss=0.3307, over 17058.00 frames. ], tot_loss[loss=0.2038, ctc_loss=0.1336, cr_loss=0.351, over 3348360.79 frames. ], batch size: 56, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:51:34,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=505134.0, ans=0.0 2024-09-24 12:51:58,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=505227.3333333333, ans=0.125 2024-09-24 12:52:01,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=505227.3333333333, ans=0.0 2024-09-24 12:52:14,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2024-09-24 12:52:48,944 INFO [train.py:1198] (0/4) Epoch 28, batch 3100, loss[loss=0.1899, ctc_loss=0.1203, cr_loss=0.3478, over 17015.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1329, cr_loss=0.3499, over 3348741.47 frames. ], batch size: 44, lr: 4.24e-03, grad_scale: 16.0 2024-09-24 12:52:49,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505367.3333333333, ans=0.125 2024-09-24 12:52:52,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=505367.3333333333, ans=0.125 2024-09-24 12:53:24,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=505460.6666666667, ans=0.0 2024-09-24 12:54:09,364 INFO [train.py:1198] (0/4) Epoch 28, batch 3150, loss[loss=0.1948, ctc_loss=0.125, cr_loss=0.3488, over 17087.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.1322, cr_loss=0.3488, over 3349254.70 frames. ], batch size: 43, lr: 4.24e-03, grad_scale: 16.0 2024-09-24 12:54:10,888 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.260e+02 1.340e+02 1.441e+02 3.228e+02, threshold=2.680e+02, percent-clipped=2.0 2024-09-24 12:54:39,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505694.0, ans=0.1 2024-09-24 12:55:04,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505740.6666666667, ans=0.1 2024-09-24 12:55:18,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=505787.3333333333, ans=0.125 2024-09-24 12:55:27,146 INFO [train.py:1198] (0/4) Epoch 28, batch 3200, loss[loss=0.2106, ctc_loss=0.1381, cr_loss=0.3624, over 17303.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.133, cr_loss=0.3496, over 3347206.00 frames. ], batch size: 51, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:55:39,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=505834.0, ans=0.125 2024-09-24 12:55:52,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=505880.6666666667, ans=0.2 2024-09-24 12:56:22,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=505974.0, ans=0.125 2024-09-24 12:56:23,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=505974.0, ans=0.125 2024-09-24 12:56:25,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=505974.0, ans=0.125 2024-09-24 12:56:41,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2024-09-24 12:56:45,211 INFO [train.py:1198] (0/4) Epoch 28, batch 3250, loss[loss=0.224, ctc_loss=0.1457, cr_loss=0.3911, over 17207.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1327, cr_loss=0.3494, over 3353093.46 frames. ], batch size: 55, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:56:46,876 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.036e+02 1.261e+02 1.336e+02 1.430e+02 2.422e+02, threshold=2.672e+02, percent-clipped=0.0 2024-09-24 12:56:50,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=506067.3333333333, ans=0.2 2024-09-24 12:56:58,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=506067.3333333333, ans=0.2 2024-09-24 12:57:23,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=506160.6666666667, ans=0.5 2024-09-24 12:57:37,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=506207.3333333333, ans=0.2 2024-09-24 12:57:53,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-24 12:58:04,253 INFO [train.py:1198] (0/4) Epoch 28, batch 3300, loss[loss=0.1973, ctc_loss=0.1287, cr_loss=0.3427, over 17101.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1319, cr_loss=0.3479, over 3354392.30 frames. ], batch size: 49, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:58:06,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=506300.6666666667, ans=0.025 2024-09-24 12:58:21,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=506347.3333333333, ans=0.2 2024-09-24 12:58:49,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2024-09-24 12:58:56,806 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 12:59:15,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.02 vs. limit=22.5 2024-09-24 12:59:21,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=506487.3333333333, ans=0.125 2024-09-24 12:59:24,426 INFO [train.py:1198] (0/4) Epoch 28, batch 3350, loss[loss=0.1651, ctc_loss=0.1054, cr_loss=0.2987, over 17163.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1319, cr_loss=0.348, over 3355455.92 frames. ], batch size: 41, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 12:59:24,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=506534.0, ans=0.0 2024-09-24 12:59:25,944 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.246e+02 1.321e+02 1.387e+02 1.674e+02, threshold=2.642e+02, percent-clipped=0.0 2024-09-24 12:59:45,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=506580.6666666667, ans=0.125 2024-09-24 12:59:54,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=506627.3333333333, ans=0.2 2024-09-24 13:00:05,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=506627.3333333333, ans=0.0 2024-09-24 13:00:06,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=506627.3333333333, ans=0.125 2024-09-24 13:00:09,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=506674.0, ans=0.125 2024-09-24 13:00:17,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=506674.0, ans=0.05 2024-09-24 13:00:40,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-09-24 13:00:41,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=506767.3333333333, ans=0.2 2024-09-24 13:00:42,397 INFO [train.py:1198] (0/4) Epoch 28, batch 3400, loss[loss=0.1519, ctc_loss=0.09788, cr_loss=0.2703, over 16971.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1321, cr_loss=0.3493, over 3355283.64 frames. ], batch size: 42, lr: 4.24e-03, grad_scale: 32.0 2024-09-24 13:00:54,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506767.3333333333, ans=0.1 2024-09-24 13:01:17,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=506860.6666666667, ans=0.025 2024-09-24 13:01:18,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=506860.6666666667, ans=0.025 2024-09-24 13:01:20,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=506860.6666666667, ans=0.0 2024-09-24 13:01:26,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=506860.6666666667, ans=0.2 2024-09-24 13:01:43,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=22.5 2024-09-24 13:02:00,540 INFO [train.py:1198] (0/4) Epoch 28, batch 3450, loss[loss=0.214, ctc_loss=0.1396, cr_loss=0.3717, over 17099.00 frames. ], tot_loss[loss=0.2029, ctc_loss=0.1328, cr_loss=0.3503, over 3358701.89 frames. ], batch size: 49, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:02:02,018 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.326e+02 1.438e+02 1.586e+02 2.934e+02, threshold=2.877e+02, percent-clipped=1.0 2024-09-24 13:02:35,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=507094.0, ans=0.5 2024-09-24 13:02:36,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=507094.0, ans=0.125 2024-09-24 13:02:37,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=507094.0, ans=0.125 2024-09-24 13:02:49,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=507140.6666666667, ans=0.0 2024-09-24 13:02:53,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=507140.6666666667, ans=0.0 2024-09-24 13:03:25,066 INFO [train.py:1198] (0/4) Epoch 28, batch 3500, loss[loss=0.1872, ctc_loss=0.1205, cr_loss=0.3335, over 17169.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1323, cr_loss=0.3501, over 3355867.81 frames. ], batch size: 45, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:03:25,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=507234.0, ans=0.125 2024-09-24 13:03:26,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=507234.0, ans=0.2 2024-09-24 13:03:39,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=507280.6666666667, ans=0.125 2024-09-24 13:03:44,153 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:03:47,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=507280.6666666667, ans=0.07 2024-09-24 13:03:47,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=507280.6666666667, ans=0.0 2024-09-24 13:03:53,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=507280.6666666667, ans=0.95 2024-09-24 13:03:58,187 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:03:59,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=507327.3333333333, ans=0.125 2024-09-24 13:04:32,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507420.6666666667, ans=0.1 2024-09-24 13:04:36,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=507420.6666666667, ans=0.125 2024-09-24 13:04:42,943 INFO [train.py:1198] (0/4) Epoch 28, batch 3550, loss[loss=0.1636, ctc_loss=0.1051, cr_loss=0.2927, over 17093.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1321, cr_loss=0.3495, over 3361089.88 frames. ], batch size: 43, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:04:44,467 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.292e+02 1.411e+02 1.555e+02 1.879e+02, threshold=2.822e+02, percent-clipped=0.0 2024-09-24 13:04:47,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=507467.3333333333, ans=0.0 2024-09-24 13:05:16,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507560.6666666667, ans=0.1 2024-09-24 13:06:00,780 INFO [train.py:1198] (0/4) Epoch 28, batch 3600, loss[loss=0.1858, ctc_loss=0.1189, cr_loss=0.3347, over 16961.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1318, cr_loss=0.3484, over 3369378.43 frames. ], batch size: 42, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:06:12,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=507700.6666666667, ans=0.0 2024-09-24 13:06:17,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=507747.3333333333, ans=0.0 2024-09-24 13:06:29,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=507747.3333333333, ans=0.2 2024-09-24 13:06:37,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507794.0, ans=0.1 2024-09-24 13:06:42,991 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=12.0 2024-09-24 13:06:46,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=507840.6666666667, ans=15.0 2024-09-24 13:06:51,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=507840.6666666667, ans=0.125 2024-09-24 13:07:19,237 INFO [train.py:1198] (0/4) Epoch 28, batch 3650, loss[loss=0.189, ctc_loss=0.1228, cr_loss=0.331, over 17310.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1315, cr_loss=0.3472, over 3367420.07 frames. ], batch size: 46, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:07:20,723 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.240e+02 1.329e+02 1.473e+02 2.274e+02, threshold=2.658e+02, percent-clipped=0.0 2024-09-24 13:08:37,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.58 vs. limit=6.0 2024-09-24 13:08:40,142 INFO [train.py:1198] (0/4) Epoch 28, batch 3700, loss[loss=0.1948, ctc_loss=0.1304, cr_loss=0.3221, over 17358.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1315, cr_loss=0.3474, over 3368774.71 frames. ], batch size: 48, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:08:48,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=508167.3333333333, ans=0.125 2024-09-24 13:08:49,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=508167.3333333333, ans=0.025 2024-09-24 13:08:54,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508214.0, ans=0.1 2024-09-24 13:09:00,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=508214.0, ans=0.125 2024-09-24 13:09:05,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=508214.0, ans=0.125 2024-09-24 13:09:07,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=508214.0, ans=0.125 2024-09-24 13:09:13,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=508260.6666666667, ans=0.0 2024-09-24 13:09:19,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.71 vs. limit=5.0 2024-09-24 13:09:30,928 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:09:46,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=508354.0, ans=0.025 2024-09-24 13:09:53,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=508354.0, ans=0.0 2024-09-24 13:09:58,946 INFO [train.py:1198] (0/4) Epoch 28, batch 3750, loss[loss=0.2222, ctc_loss=0.1484, cr_loss=0.369, over 16515.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1321, cr_loss=0.3484, over 3348990.88 frames. ], batch size: 66, lr: 4.23e-03, grad_scale: 32.0 2024-09-24 13:10:00,434 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.248e+02 1.336e+02 1.422e+02 1.841e+02, threshold=2.673e+02, percent-clipped=0.0 2024-09-24 13:10:49,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=508540.6666666667, ans=0.0 2024-09-24 13:10:52,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=508540.6666666667, ans=0.125 2024-09-24 13:10:54,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=508540.6666666667, ans=0.125 2024-09-24 13:11:11,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=508587.3333333333, ans=0.125 2024-09-24 13:11:17,814 INFO [train.py:1198] (0/4) Epoch 28, batch 3800, loss[loss=0.1999, ctc_loss=0.1319, cr_loss=0.3402, over 16942.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1315, cr_loss=0.3462, over 3334661.39 frames. ], batch size: 42, lr: 4.23e-03, grad_scale: 16.0 2024-09-24 13:11:27,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=508634.0, ans=0.125 2024-09-24 13:11:31,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=12.0 2024-09-24 13:11:43,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=508680.6666666667, ans=0.125 2024-09-24 13:11:49,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=508727.3333333333, ans=0.125 2024-09-24 13:12:14,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-24 13:12:26,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2024-09-24 13:12:38,541 INFO [train.py:1198] (0/4) Epoch 28, batch 3850, loss[loss=0.1863, ctc_loss=0.1201, cr_loss=0.331, over 17318.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1319, cr_loss=0.3455, over 3295366.22 frames. ], batch size: 51, lr: 4.23e-03, grad_scale: 16.0 2024-09-24 13:12:42,163 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.266e+02 1.354e+02 1.491e+02 2.044e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 13:12:47,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=508867.3333333333, ans=0.125 2024-09-24 13:12:52,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=508867.3333333333, ans=0.125 2024-09-24 13:12:58,980 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2024-09-24 13:12:59,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=508914.0, ans=0.025 2024-09-24 13:13:03,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508914.0, ans=0.1 2024-09-24 13:13:04,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=508914.0, ans=0.125 2024-09-24 13:13:23,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=508960.6666666667, ans=0.125 2024-09-24 13:13:30,071 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2024-09-24 13:13:32,717 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2024-09-24 13:13:36,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=509007.3333333333, ans=0.125 2024-09-24 13:13:49,964 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-28.pt 2024-09-24 13:14:43,487 INFO [train.py:1198] (0/4) Epoch 29, batch 0, loss[loss=0.2227, ctc_loss=0.1438, cr_loss=0.3947, over 16447.00 frames. ], tot_loss[loss=0.2227, ctc_loss=0.1438, cr_loss=0.3947, over 16447.00 frames. ], batch size: 66, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:14:43,488 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 13:14:55,923 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1864, 5.0078, 4.3053, 4.9563], device='cuda:0') 2024-09-24 13:14:58,955 INFO [train.py:1230] (0/4) Epoch 29, validation: loss=0.03615, ctc_loss=0.03615, cr_loss=9.405e-15, over 944034.00 frames. 2024-09-24 13:14:58,955 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 13:14:59,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509082.0, ans=0.1 2024-09-24 13:16:14,085 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=12.0 2024-09-24 13:16:21,071 INFO [train.py:1198] (0/4) Epoch 29, batch 50, loss[loss=0.1659, ctc_loss=0.104, cr_loss=0.3092, over 17269.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1286, cr_loss=0.344, over 765674.09 frames. ], batch size: 44, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:16:24,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=509315.3333333333, ans=0.125 2024-09-24 13:16:30,903 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.316e+02 1.466e+02 1.618e+02 2.901e+02, threshold=2.933e+02, percent-clipped=1.0 2024-09-24 13:16:39,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-09-24 13:16:40,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=509362.0, ans=0.125 2024-09-24 13:16:58,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=509408.6666666667, ans=0.125 2024-09-24 13:17:16,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=509455.3333333333, ans=0.2 2024-09-24 13:17:44,489 INFO [train.py:1198] (0/4) Epoch 29, batch 100, loss[loss=0.1925, ctc_loss=0.1221, cr_loss=0.3521, over 17157.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1319, cr_loss=0.3491, over 1334420.15 frames. ], batch size: 48, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:18:09,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=509595.3333333333, ans=0.125 2024-09-24 13:18:18,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=509642.0, ans=0.125 2024-09-24 13:18:27,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=509642.0, ans=0.2 2024-09-24 13:18:47,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509688.6666666667, ans=0.1 2024-09-24 13:19:09,212 INFO [train.py:1198] (0/4) Epoch 29, batch 150, loss[loss=0.1966, ctc_loss=0.127, cr_loss=0.3479, over 17176.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1314, cr_loss=0.3477, over 1779424.27 frames. ], batch size: 45, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:19:15,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=509782.0, ans=0.125 2024-09-24 13:19:18,664 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.009e+02 1.258e+02 1.349e+02 1.479e+02 2.103e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 13:19:41,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=509875.3333333333, ans=0.0 2024-09-24 13:20:17,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=509968.6666666667, ans=0.0 2024-09-24 13:20:31,475 INFO [train.py:1198] (0/4) Epoch 29, batch 200, loss[loss=0.195, ctc_loss=0.1246, cr_loss=0.352, over 17289.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1317, cr_loss=0.3491, over 2121882.11 frames. ], batch size: 51, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:21:00,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=510062.0, ans=0.125 2024-09-24 13:21:09,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=510108.6666666667, ans=0.04949747468305833 2024-09-24 13:21:27,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=510155.3333333333, ans=0.125 2024-09-24 13:21:48,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=510202.0, ans=0.125 2024-09-24 13:21:51,165 INFO [train.py:1198] (0/4) Epoch 29, batch 250, loss[loss=0.1628, ctc_loss=0.103, cr_loss=0.2987, over 16188.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1311, cr_loss=0.3484, over 2399909.83 frames. ], batch size: 36, lr: 4.15e-03, grad_scale: 32.0 2024-09-24 13:21:52,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2024-09-24 13:22:00,734 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.248e+02 1.376e+02 1.477e+02 2.189e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-24 13:22:24,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=510342.0, ans=0.2 2024-09-24 13:22:54,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=510388.6666666667, ans=10.0 2024-09-24 13:23:01,563 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-09-24 13:23:06,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.72 vs. limit=6.0 2024-09-24 13:23:16,527 INFO [train.py:1198] (0/4) Epoch 29, batch 300, loss[loss=0.2137, ctc_loss=0.1408, cr_loss=0.3642, over 16415.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1316, cr_loss=0.3498, over 2612873.30 frames. ], batch size: 66, lr: 4.14e-03, grad_scale: 16.0 2024-09-24 13:23:19,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=510482.0, ans=0.0 2024-09-24 13:23:23,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=510482.0, ans=0.125 2024-09-24 13:23:42,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=510528.6666666667, ans=0.125 2024-09-24 13:23:49,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=510575.3333333333, ans=0.025 2024-09-24 13:23:58,245 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:24:01,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=510575.3333333333, ans=0.2 2024-09-24 13:24:07,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510622.0, ans=0.1 2024-09-24 13:24:09,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=510622.0, ans=0.125 2024-09-24 13:24:13,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=22.5 2024-09-24 13:24:17,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=510622.0, ans=0.125 2024-09-24 13:24:24,087 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-09-24 13:24:31,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=22.5 2024-09-24 13:24:39,068 INFO [train.py:1198] (0/4) Epoch 29, batch 350, loss[loss=0.1749, ctc_loss=0.1078, cr_loss=0.3355, over 17084.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1305, cr_loss=0.3476, over 2787024.76 frames. ], batch size: 40, lr: 4.14e-03, grad_scale: 16.0 2024-09-24 13:24:49,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=510715.3333333333, ans=0.025 2024-09-24 13:24:50,232 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.267e+02 1.359e+02 1.490e+02 2.133e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-24 13:25:32,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=510855.3333333333, ans=0.0 2024-09-24 13:25:34,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2024-09-24 13:25:40,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=510855.3333333333, ans=0.2 2024-09-24 13:25:48,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=510902.0, ans=0.125 2024-09-24 13:26:01,405 INFO [train.py:1198] (0/4) Epoch 29, batch 400, loss[loss=0.1668, ctc_loss=0.1051, cr_loss=0.3085, over 16260.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1309, cr_loss=0.3472, over 2904731.52 frames. ], batch size: 36, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:26:30,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=510995.3333333333, ans=0.0 2024-09-24 13:26:38,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=511042.0, ans=0.125 2024-09-24 13:26:51,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=511088.6666666667, ans=0.015 2024-09-24 13:27:15,499 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-09-24 13:27:16,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=511135.3333333333, ans=0.125 2024-09-24 13:27:21,140 INFO [train.py:1198] (0/4) Epoch 29, batch 450, loss[loss=0.1959, ctc_loss=0.128, cr_loss=0.3395, over 17023.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1318, cr_loss=0.3488, over 3007175.53 frames. ], batch size: 39, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:27:35,217 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.269e+02 1.351e+02 1.464e+02 1.902e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-24 13:27:38,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=511228.6666666667, ans=0.0 2024-09-24 13:27:48,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511228.6666666667, ans=0.1 2024-09-24 13:28:20,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.18 vs. limit=10.0 2024-09-24 13:28:46,770 INFO [train.py:1198] (0/4) Epoch 29, batch 500, loss[loss=0.2024, ctc_loss=0.1341, cr_loss=0.341, over 17194.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1324, cr_loss=0.35, over 3091800.19 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:29:56,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=511602.0, ans=0.125 2024-09-24 13:30:06,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=511602.0, ans=0.09899494936611666 2024-09-24 13:30:09,201 INFO [train.py:1198] (0/4) Epoch 29, batch 550, loss[loss=0.2336, ctc_loss=0.1583, cr_loss=0.3762, over 16889.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.132, cr_loss=0.3485, over 3149655.98 frames. ], batch size: 58, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:30:20,288 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.249e+02 1.312e+02 1.438e+02 1.848e+02, threshold=2.623e+02, percent-clipped=0.0 2024-09-24 13:30:56,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=511742.0, ans=0.125 2024-09-24 13:30:57,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=511788.6666666667, ans=0.0 2024-09-24 13:31:02,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=511788.6666666667, ans=0.0 2024-09-24 13:31:15,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-09-24 13:31:30,728 INFO [train.py:1198] (0/4) Epoch 29, batch 600, loss[loss=0.2075, ctc_loss=0.1374, cr_loss=0.3507, over 17014.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1317, cr_loss=0.3487, over 3205472.01 frames. ], batch size: 44, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:31:53,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.75 vs. limit=15.0 2024-09-24 13:31:54,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=511928.6666666667, ans=0.0 2024-09-24 13:32:09,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=511975.3333333333, ans=0.0 2024-09-24 13:32:20,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-09-24 13:32:43,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=512068.6666666667, ans=0.125 2024-09-24 13:32:46,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2024-09-24 13:32:47,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=512068.6666666667, ans=0.0 2024-09-24 13:32:53,244 INFO [train.py:1198] (0/4) Epoch 29, batch 650, loss[loss=0.2208, ctc_loss=0.148, cr_loss=0.3638, over 17213.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1314, cr_loss=0.3476, over 3238450.99 frames. ], batch size: 55, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:33:04,451 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.266e+02 1.354e+02 1.448e+02 2.374e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 13:33:06,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=512115.3333333333, ans=0.025 2024-09-24 13:33:08,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-24 13:33:15,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=512162.0, ans=0.2 2024-09-24 13:33:24,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=512162.0, ans=0.125 2024-09-24 13:33:41,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=512208.6666666667, ans=0.2 2024-09-24 13:34:14,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=512302.0, ans=0.2 2024-09-24 13:34:19,041 INFO [train.py:1198] (0/4) Epoch 29, batch 700, loss[loss=0.201, ctc_loss=0.1308, cr_loss=0.3509, over 17317.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1312, cr_loss=0.3476, over 3260787.52 frames. ], batch size: 52, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:34:25,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=512348.6666666667, ans=0.125 2024-09-24 13:35:08,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=512488.6666666667, ans=0.0 2024-09-24 13:35:41,604 INFO [train.py:1198] (0/4) Epoch 29, batch 750, loss[loss=0.2033, ctc_loss=0.1306, cr_loss=0.3635, over 17351.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1309, cr_loss=0.3471, over 3284497.36 frames. ], batch size: 48, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:35:52,788 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.251e+02 1.333e+02 1.428e+02 1.733e+02, threshold=2.666e+02, percent-clipped=0.0 2024-09-24 13:36:39,616 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.27 vs. limit=10.0 2024-09-24 13:36:55,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=512768.6666666667, ans=0.125 2024-09-24 13:36:55,375 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:36:58,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=512768.6666666667, ans=0.0 2024-09-24 13:37:00,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2024-09-24 13:37:01,405 INFO [train.py:1198] (0/4) Epoch 29, batch 800, loss[loss=0.1816, ctc_loss=0.1188, cr_loss=0.3141, over 17301.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1307, cr_loss=0.3461, over 3306612.46 frames. ], batch size: 51, lr: 4.14e-03, grad_scale: 32.0 2024-09-24 13:37:31,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=512862.0, ans=0.125 2024-09-24 13:37:32,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-24 13:37:51,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=512955.3333333333, ans=0.125 2024-09-24 13:38:27,133 INFO [train.py:1198] (0/4) Epoch 29, batch 850, loss[loss=0.1918, ctc_loss=0.125, cr_loss=0.3339, over 17245.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1304, cr_loss=0.3467, over 3323208.70 frames. ], batch size: 44, lr: 4.13e-03, grad_scale: 32.0 2024-09-24 13:38:30,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513048.6666666667, ans=0.1 2024-09-24 13:38:38,336 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.286e+02 1.364e+02 1.497e+02 3.898e+02, threshold=2.729e+02, percent-clipped=1.0 2024-09-24 13:38:53,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=513095.3333333333, ans=0.0 2024-09-24 13:38:53,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=513095.3333333333, ans=0.025 2024-09-24 13:39:01,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=513142.0, ans=0.125 2024-09-24 13:39:03,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=513142.0, ans=0.125 2024-09-24 13:39:14,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=513142.0, ans=0.125 2024-09-24 13:39:17,194 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.39 vs. limit=15.0 2024-09-24 13:39:18,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-09-24 13:39:41,860 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-24 13:39:47,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=513282.0, ans=0.0 2024-09-24 13:39:49,254 INFO [train.py:1198] (0/4) Epoch 29, batch 900, loss[loss=0.1544, ctc_loss=0.09841, cr_loss=0.2799, over 16271.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1296, cr_loss=0.3456, over 3337424.00 frames. ], batch size: 36, lr: 4.13e-03, grad_scale: 32.0 2024-09-24 13:40:30,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=513375.3333333333, ans=0.125 2024-09-24 13:40:30,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=513375.3333333333, ans=0.125 2024-09-24 13:40:37,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=513375.3333333333, ans=0.0 2024-09-24 13:41:07,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=513468.6666666667, ans=0.025 2024-09-24 13:41:12,250 INFO [train.py:1198] (0/4) Epoch 29, batch 950, loss[loss=0.2017, ctc_loss=0.1315, cr_loss=0.351, over 17251.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.13, cr_loss=0.3467, over 3346645.95 frames. ], batch size: 44, lr: 4.13e-03, grad_scale: 32.0 2024-09-24 13:41:23,505 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.274e+02 1.388e+02 1.487e+02 2.628e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-24 13:42:33,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=513748.6666666667, ans=0.035 2024-09-24 13:42:35,066 INFO [train.py:1198] (0/4) Epoch 29, batch 1000, loss[loss=0.2132, ctc_loss=0.142, cr_loss=0.356, over 17304.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1303, cr_loss=0.347, over 3358836.42 frames. ], batch size: 51, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:42:36,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513748.6666666667, ans=0.1 2024-09-24 13:42:52,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=513795.3333333333, ans=0.2 2024-09-24 13:43:06,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=513842.0, ans=0.125 2024-09-24 13:43:11,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-09-24 13:43:59,529 INFO [train.py:1198] (0/4) Epoch 29, batch 1050, loss[loss=0.212, ctc_loss=0.1392, cr_loss=0.3641, over 17008.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1305, cr_loss=0.3465, over 3365445.19 frames. ], batch size: 51, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:44:12,020 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.285e+02 1.349e+02 1.436e+02 3.120e+02, threshold=2.698e+02, percent-clipped=1.0 2024-09-24 13:44:33,623 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2024-09-24 13:45:05,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-24 13:45:21,711 INFO [train.py:1198] (0/4) Epoch 29, batch 1100, loss[loss=0.2705, ctc_loss=0.1861, cr_loss=0.4219, over 12228.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1312, cr_loss=0.348, over 3359714.58 frames. ], batch size: 123, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:45:29,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=514215.3333333333, ans=0.125 2024-09-24 13:45:41,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=514262.0, ans=0.125 2024-09-24 13:45:44,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=514262.0, ans=0.0 2024-09-24 13:46:03,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-24 13:46:04,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.37 vs. limit=10.0 2024-09-24 13:46:13,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514355.3333333333, ans=0.1 2024-09-24 13:46:13,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=514355.3333333333, ans=0.0 2024-09-24 13:46:23,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=514355.3333333333, ans=0.1 2024-09-24 13:46:31,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=514402.0, ans=0.125 2024-09-24 13:46:35,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2024-09-24 13:46:42,099 INFO [train.py:1198] (0/4) Epoch 29, batch 1150, loss[loss=0.2111, ctc_loss=0.1383, cr_loss=0.3643, over 17070.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1311, cr_loss=0.3478, over 3361368.16 frames. ], batch size: 46, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:46:54,883 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.257e+02 1.322e+02 1.401e+02 2.069e+02, threshold=2.644e+02, percent-clipped=0.0 2024-09-24 13:46:59,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514495.3333333333, ans=0.1 2024-09-24 13:47:15,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=514542.0, ans=0.125 2024-09-24 13:47:30,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=514588.6666666667, ans=0.2 2024-09-24 13:47:42,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=514588.6666666667, ans=0.2 2024-09-24 13:48:04,034 INFO [train.py:1198] (0/4) Epoch 29, batch 1200, loss[loss=0.2136, ctc_loss=0.1408, cr_loss=0.3639, over 17156.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1316, cr_loss=0.3487, over 3357989.19 frames. ], batch size: 45, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:48:26,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2024-09-24 13:48:41,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=514775.3333333333, ans=0.1 2024-09-24 13:48:45,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2024-09-24 13:49:11,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=514868.6666666667, ans=0.0 2024-09-24 13:49:28,542 INFO [train.py:1198] (0/4) Epoch 29, batch 1250, loss[loss=0.2261, ctc_loss=0.1461, cr_loss=0.3998, over 17016.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.131, cr_loss=0.3474, over 3357917.63 frames. ], batch size: 53, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:49:28,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=514915.3333333333, ans=0.0 2024-09-24 13:49:42,754 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.263e+02 1.323e+02 1.423e+02 2.408e+02, threshold=2.646e+02, percent-clipped=0.0 2024-09-24 13:49:49,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=514962.0, ans=0.025 2024-09-24 13:50:13,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2024-09-24 13:50:31,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515055.3333333333, ans=0.1 2024-09-24 13:50:46,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=515102.0, ans=0.125 2024-09-24 13:50:50,683 INFO [train.py:1198] (0/4) Epoch 29, batch 1300, loss[loss=0.1905, ctc_loss=0.1247, cr_loss=0.3292, over 17265.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1312, cr_loss=0.3477, over 3362435.91 frames. ], batch size: 44, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:51:02,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=515148.6666666667, ans=0.1 2024-09-24 13:51:02,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515148.6666666667, ans=0.1 2024-09-24 13:51:05,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=515195.3333333333, ans=0.0 2024-09-24 13:51:10,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=515195.3333333333, ans=0.05 2024-09-24 13:51:13,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=515195.3333333333, ans=0.05 2024-09-24 13:51:19,511 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-24 13:51:30,033 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 13:51:31,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=515242.0, ans=0.0 2024-09-24 13:52:10,988 INFO [train.py:1198] (0/4) Epoch 29, batch 1350, loss[loss=0.2421, ctc_loss=0.1619, cr_loss=0.4011, over 17018.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1311, cr_loss=0.3471, over 3356223.28 frames. ], batch size: 53, lr: 4.13e-03, grad_scale: 16.0 2024-09-24 13:52:23,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2024-09-24 13:52:25,431 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.241e+02 1.323e+02 1.445e+02 2.039e+02, threshold=2.645e+02, percent-clipped=0.0 2024-09-24 13:52:39,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=515428.6666666667, ans=0.125 2024-09-24 13:53:01,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=515522.0, ans=0.125 2024-09-24 13:53:10,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=22.5 2024-09-24 13:53:30,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2024-09-24 13:53:35,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=515615.3333333333, ans=12.0 2024-09-24 13:53:35,818 INFO [train.py:1198] (0/4) Epoch 29, batch 1400, loss[loss=0.1745, ctc_loss=0.1153, cr_loss=0.296, over 16935.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1314, cr_loss=0.3479, over 3367989.60 frames. ], batch size: 42, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 13:53:39,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=515615.3333333333, ans=0.0 2024-09-24 13:54:21,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=515708.6666666667, ans=0.2 2024-09-24 13:54:26,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=515755.3333333333, ans=0.125 2024-09-24 13:54:57,873 INFO [train.py:1198] (0/4) Epoch 29, batch 1450, loss[loss=0.2002, ctc_loss=0.1298, cr_loss=0.3516, over 17137.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1322, cr_loss=0.35, over 3364171.17 frames. ], batch size: 48, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 13:55:10,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515848.6666666667, ans=0.125 2024-09-24 13:55:15,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=515895.3333333333, ans=0.0 2024-09-24 13:55:16,337 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.288e+02 1.365e+02 1.470e+02 2.158e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-24 13:55:19,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515895.3333333333, ans=0.1 2024-09-24 13:55:31,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=515942.0, ans=0.125 2024-09-24 13:56:19,906 INFO [train.py:1198] (0/4) Epoch 29, batch 1500, loss[loss=0.1678, ctc_loss=0.1062, cr_loss=0.3079, over 17178.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1319, cr_loss=0.3489, over 3353569.13 frames. ], batch size: 41, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 13:56:21,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=516082.0, ans=0.125 2024-09-24 13:56:36,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=516128.6666666667, ans=0.125 2024-09-24 13:57:00,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=516175.3333333333, ans=0.0 2024-09-24 13:57:21,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=516222.0, ans=0.125 2024-09-24 13:57:42,733 INFO [train.py:1198] (0/4) Epoch 29, batch 1550, loss[loss=0.1897, ctc_loss=0.125, cr_loss=0.3234, over 17057.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1322, cr_loss=0.3493, over 3357078.58 frames. ], batch size: 46, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 13:57:55,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=516315.3333333333, ans=0.125 2024-09-24 13:57:58,708 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.272e+02 1.340e+02 1.430e+02 4.940e+02, threshold=2.681e+02, percent-clipped=1.0 2024-09-24 13:58:38,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=516455.3333333333, ans=0.125 2024-09-24 13:58:51,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-24 13:59:07,817 INFO [train.py:1198] (0/4) Epoch 29, batch 1600, loss[loss=0.1975, ctc_loss=0.1293, cr_loss=0.3406, over 17081.00 frames. ], tot_loss[loss=0.2017, ctc_loss=0.1319, cr_loss=0.3489, over 3353923.85 frames. ], batch size: 46, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 13:59:36,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=516595.3333333333, ans=0.05 2024-09-24 13:59:51,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=516642.0, ans=0.0 2024-09-24 13:59:55,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=516688.6666666667, ans=0.0 2024-09-24 14:00:24,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.82 vs. limit=22.5 2024-09-24 14:00:30,029 INFO [train.py:1198] (0/4) Epoch 29, batch 1650, loss[loss=0.1811, ctc_loss=0.1236, cr_loss=0.2875, over 17142.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1324, cr_loss=0.3493, over 3350831.21 frames. ], batch size: 48, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 14:00:45,987 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.269e+02 1.357e+02 1.497e+02 2.178e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 14:01:10,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516875.3333333333, ans=0.1 2024-09-24 14:01:11,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=516875.3333333333, ans=0.07 2024-09-24 14:01:23,060 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:01:43,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=516968.6666666667, ans=0.125 2024-09-24 14:01:43,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=516968.6666666667, ans=0.125 2024-09-24 14:01:49,854 INFO [train.py:1198] (0/4) Epoch 29, batch 1700, loss[loss=0.204, ctc_loss=0.1364, cr_loss=0.3376, over 16176.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.1322, cr_loss=0.3497, over 3354258.53 frames. ], batch size: 74, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 14:01:56,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=517015.3333333333, ans=0.125 2024-09-24 14:01:58,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=517015.3333333333, ans=0.2 2024-09-24 14:02:04,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=517062.0, ans=0.125 2024-09-24 14:02:40,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=517155.3333333333, ans=0.0 2024-09-24 14:02:41,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=517155.3333333333, ans=0.125 2024-09-24 14:03:01,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=517202.0, ans=0.0 2024-09-24 14:03:03,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517202.0, ans=0.1 2024-09-24 14:03:14,199 INFO [train.py:1198] (0/4) Epoch 29, batch 1750, loss[loss=0.1807, ctc_loss=0.1167, cr_loss=0.3197, over 17082.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.132, cr_loss=0.3497, over 3356679.86 frames. ], batch size: 43, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 14:03:30,185 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.263e+02 1.353e+02 1.492e+02 2.426e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 14:03:41,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=517295.3333333333, ans=0.125 2024-09-24 14:03:56,234 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=12.0 2024-09-24 14:04:01,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=517342.0, ans=0.125 2024-09-24 14:04:05,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-09-24 14:04:36,192 INFO [train.py:1198] (0/4) Epoch 29, batch 1800, loss[loss=0.2441, ctc_loss=0.1603, cr_loss=0.4188, over 17015.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1322, cr_loss=0.3502, over 3359994.78 frames. ], batch size: 56, lr: 4.12e-03, grad_scale: 16.0 2024-09-24 14:05:07,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517528.6666666667, ans=0.1 2024-09-24 14:05:16,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=517575.3333333333, ans=0.0 2024-09-24 14:05:25,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=517622.0, ans=0.0 2024-09-24 14:05:39,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=517622.0, ans=0.125 2024-09-24 14:05:58,951 INFO [train.py:1198] (0/4) Epoch 29, batch 1850, loss[loss=0.2364, ctc_loss=0.1568, cr_loss=0.3977, over 15873.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1318, cr_loss=0.3492, over 3367148.72 frames. ], batch size: 74, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 14:06:03,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=517715.3333333333, ans=0.125 2024-09-24 14:06:16,451 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.270e+02 1.382e+02 1.482e+02 2.420e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-24 14:06:16,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=517762.0, ans=0.125 2024-09-24 14:06:31,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=517808.6666666667, ans=0.0 2024-09-24 14:06:32,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=517808.6666666667, ans=0.07 2024-09-24 14:06:34,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=517808.6666666667, ans=0.035 2024-09-24 14:06:44,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=517808.6666666667, ans=0.125 2024-09-24 14:06:52,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=517855.3333333333, ans=0.0 2024-09-24 14:06:55,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=517855.3333333333, ans=0.0 2024-09-24 14:07:00,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=517855.3333333333, ans=0.125 2024-09-24 14:07:09,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=517902.0, ans=0.125 2024-09-24 14:07:17,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=517948.6666666667, ans=0.125 2024-09-24 14:07:21,465 INFO [train.py:1198] (0/4) Epoch 29, batch 1900, loss[loss=0.2273, ctc_loss=0.1527, cr_loss=0.3731, over 17022.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1311, cr_loss=0.3478, over 3370628.09 frames. ], batch size: 51, lr: 4.12e-03, grad_scale: 8.0 2024-09-24 14:07:21,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=517948.6666666667, ans=0.125 2024-09-24 14:07:28,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=517948.6666666667, ans=0.0 2024-09-24 14:08:07,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518042.0, ans=0.1 2024-09-24 14:08:20,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=518088.6666666667, ans=0.125 2024-09-24 14:08:43,830 INFO [train.py:1198] (0/4) Epoch 29, batch 1950, loss[loss=0.2016, ctc_loss=0.1288, cr_loss=0.364, over 17267.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1314, cr_loss=0.3481, over 3366772.74 frames. ], batch size: 44, lr: 4.11e-03, grad_scale: 8.0 2024-09-24 14:08:59,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=518182.0, ans=0.0 2024-09-24 14:09:03,914 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.275e+02 1.365e+02 1.439e+02 2.080e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-24 14:09:06,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=518228.6666666667, ans=0.0 2024-09-24 14:09:23,861 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-09-24 14:09:56,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=518368.6666666667, ans=0.125 2024-09-24 14:10:08,973 INFO [train.py:1198] (0/4) Epoch 29, batch 2000, loss[loss=0.1539, ctc_loss=0.09555, cr_loss=0.2917, over 16343.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1316, cr_loss=0.3479, over 3353446.75 frames. ], batch size: 36, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:10:33,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=518462.0, ans=0.0 2024-09-24 14:10:58,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=518555.3333333333, ans=0.0 2024-09-24 14:11:16,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=518602.0, ans=0.2 2024-09-24 14:11:28,975 INFO [train.py:1198] (0/4) Epoch 29, batch 2050, loss[loss=0.213, ctc_loss=0.1402, cr_loss=0.3641, over 17047.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.131, cr_loss=0.3476, over 3360894.60 frames. ], batch size: 52, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:11:46,525 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.263e+02 1.339e+02 1.463e+02 3.835e+02, threshold=2.678e+02, percent-clipped=1.0 2024-09-24 14:11:54,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=518695.3333333333, ans=0.0 2024-09-24 14:12:18,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2024-09-24 14:12:54,003 INFO [train.py:1198] (0/4) Epoch 29, batch 2100, loss[loss=0.2036, ctc_loss=0.1328, cr_loss=0.3543, over 17101.00 frames. ], tot_loss[loss=0.2012, ctc_loss=0.1315, cr_loss=0.3488, over 3359198.13 frames. ], batch size: 49, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:13:32,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=518975.3333333333, ans=0.0 2024-09-24 14:13:34,208 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:13:34,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=518975.3333333333, ans=0.0 2024-09-24 14:14:15,828 INFO [train.py:1198] (0/4) Epoch 29, batch 2150, loss[loss=0.2198, ctc_loss=0.1451, cr_loss=0.3739, over 17163.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.131, cr_loss=0.3483, over 3364016.77 frames. ], batch size: 45, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:14:22,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=519115.3333333333, ans=0.0 2024-09-24 14:14:26,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=519115.3333333333, ans=0.125 2024-09-24 14:14:30,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=519162.0, ans=0.2 2024-09-24 14:14:33,683 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.281e+02 1.376e+02 1.508e+02 1.841e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-24 14:14:45,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.35 vs. limit=10.0 2024-09-24 14:15:02,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=519208.6666666667, ans=0.125 2024-09-24 14:15:05,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=519255.3333333333, ans=0.125 2024-09-24 14:15:15,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=519255.3333333333, ans=0.125 2024-09-24 14:15:34,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=519302.0, ans=0.0 2024-09-24 14:15:38,899 INFO [train.py:1198] (0/4) Epoch 29, batch 2200, loss[loss=0.2095, ctc_loss=0.1382, cr_loss=0.3564, over 17293.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1315, cr_loss=0.3493, over 3363149.49 frames. ], batch size: 49, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:15:47,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=519348.6666666667, ans=0.125 2024-09-24 14:15:53,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=519395.3333333333, ans=0.1 2024-09-24 14:16:23,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=8.0 2024-09-24 14:16:32,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2024-09-24 14:16:38,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=519488.6666666667, ans=0.125 2024-09-24 14:16:59,445 INFO [train.py:1198] (0/4) Epoch 29, batch 2250, loss[loss=0.1634, ctc_loss=0.1033, cr_loss=0.3005, over 17216.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1318, cr_loss=0.35, over 3369874.59 frames. ], batch size: 41, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:17:09,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=15.0 2024-09-24 14:17:18,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=519628.6666666667, ans=0.0 2024-09-24 14:17:19,626 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.297e+02 1.378e+02 1.443e+02 1.753e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 14:18:05,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-09-24 14:18:23,698 INFO [train.py:1198] (0/4) Epoch 29, batch 2300, loss[loss=0.2006, ctc_loss=0.1296, cr_loss=0.3549, over 17184.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1321, cr_loss=0.3511, over 3375494.58 frames. ], batch size: 45, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:18:48,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=519862.0, ans=0.0 2024-09-24 14:19:27,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=519955.3333333333, ans=0.0 2024-09-24 14:19:40,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=520002.0, ans=0.125 2024-09-24 14:19:46,205 INFO [train.py:1198] (0/4) Epoch 29, batch 2350, loss[loss=0.1645, ctc_loss=0.1067, cr_loss=0.289, over 16734.00 frames. ], tot_loss[loss=0.2021, ctc_loss=0.1319, cr_loss=0.3508, over 3376471.24 frames. ], batch size: 37, lr: 4.11e-03, grad_scale: 16.0 2024-09-24 14:19:55,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=520048.6666666667, ans=0.125 2024-09-24 14:20:06,253 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.272e+02 1.348e+02 1.492e+02 2.219e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 14:20:12,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=520095.3333333333, ans=0.125 2024-09-24 14:20:30,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=520142.0, ans=0.025 2024-09-24 14:20:42,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.84 vs. limit=10.0 2024-09-24 14:20:49,253 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2024-09-24 14:20:50,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=520188.6666666667, ans=0.125 2024-09-24 14:21:05,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=520235.3333333333, ans=0.125 2024-09-24 14:21:08,866 INFO [train.py:1198] (0/4) Epoch 29, batch 2400, loss[loss=0.2262, ctc_loss=0.1487, cr_loss=0.3877, over 14729.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1319, cr_loss=0.3502, over 3364119.80 frames. ], batch size: 89, lr: 4.11e-03, grad_scale: 32.0 2024-09-24 14:21:09,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=520282.0, ans=0.0 2024-09-24 14:21:15,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=520282.0, ans=0.025 2024-09-24 14:22:31,471 INFO [train.py:1198] (0/4) Epoch 29, batch 2450, loss[loss=0.2049, ctc_loss=0.1343, cr_loss=0.3531, over 16994.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.1319, cr_loss=0.3502, over 3364135.30 frames. ], batch size: 53, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:22:33,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=520515.3333333333, ans=0.0 2024-09-24 14:22:53,219 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.297e+02 1.387e+02 1.500e+02 2.305e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-24 14:22:53,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520562.0, ans=0.1 2024-09-24 14:23:03,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520562.0, ans=0.1 2024-09-24 14:23:08,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=520608.6666666667, ans=0.025 2024-09-24 14:23:35,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=520655.3333333333, ans=0.125 2024-09-24 14:23:56,405 INFO [train.py:1198] (0/4) Epoch 29, batch 2500, loss[loss=0.2332, ctc_loss=0.1568, cr_loss=0.382, over 15153.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1317, cr_loss=0.3495, over 3348838.89 frames. ], batch size: 89, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:23:57,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2024-09-24 14:24:19,670 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.71 vs. limit=10.0 2024-09-24 14:24:21,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2024-09-24 14:24:24,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=520795.3333333333, ans=0.125 2024-09-24 14:24:27,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=520842.0, ans=0.2 2024-09-24 14:24:30,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=520842.0, ans=10.0 2024-09-24 14:24:31,003 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=22.5 2024-09-24 14:24:33,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=520842.0, ans=0.0 2024-09-24 14:24:55,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=520888.6666666667, ans=0.0 2024-09-24 14:24:57,032 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:25:08,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=520935.3333333333, ans=0.2 2024-09-24 14:25:19,000 INFO [train.py:1198] (0/4) Epoch 29, batch 2550, loss[loss=0.2111, ctc_loss=0.1376, cr_loss=0.3676, over 17313.00 frames. ], tot_loss[loss=0.203, ctc_loss=0.1327, cr_loss=0.3512, over 3349869.93 frames. ], batch size: 49, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:25:19,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2024-09-24 14:25:24,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=520982.0, ans=0.125 2024-09-24 14:25:26,378 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=15.0 2024-09-24 14:25:38,172 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.280e+02 1.353e+02 1.438e+02 1.912e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 14:25:42,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2024-09-24 14:25:49,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=22.5 2024-09-24 14:25:50,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521075.3333333333, ans=0.1 2024-09-24 14:26:13,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=521122.0, ans=0.025 2024-09-24 14:26:17,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=521122.0, ans=0.125 2024-09-24 14:26:21,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521168.6666666667, ans=0.1 2024-09-24 14:26:21,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=521168.6666666667, ans=0.125 2024-09-24 14:26:36,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521215.3333333333, ans=0.125 2024-09-24 14:26:36,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521215.3333333333, ans=0.125 2024-09-24 14:26:38,267 INFO [train.py:1198] (0/4) Epoch 29, batch 2600, loss[loss=0.1727, ctc_loss=0.1089, cr_loss=0.3194, over 17357.00 frames. ], tot_loss[loss=0.2023, ctc_loss=0.1323, cr_loss=0.3498, over 3352113.96 frames. ], batch size: 48, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:26:43,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=521215.3333333333, ans=0.0 2024-09-24 14:26:51,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=521215.3333333333, ans=0.125 2024-09-24 14:27:16,842 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-09-24 14:27:41,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=521355.3333333333, ans=0.0 2024-09-24 14:27:49,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=521402.0, ans=0.02 2024-09-24 14:27:54,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=521402.0, ans=0.0 2024-09-24 14:28:03,409 INFO [train.py:1198] (0/4) Epoch 29, batch 2650, loss[loss=0.2217, ctc_loss=0.145, cr_loss=0.3837, over 17217.00 frames. ], tot_loss[loss=0.2025, ctc_loss=0.1324, cr_loss=0.3503, over 3343781.63 frames. ], batch size: 47, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:28:22,581 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.267e+02 1.389e+02 1.480e+02 2.068e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-24 14:28:51,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=521542.0, ans=0.0 2024-09-24 14:29:13,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=521635.3333333333, ans=0.5 2024-09-24 14:29:17,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=521635.3333333333, ans=0.125 2024-09-24 14:29:24,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-09-24 14:29:25,644 INFO [train.py:1198] (0/4) Epoch 29, batch 2700, loss[loss=0.1942, ctc_loss=0.1241, cr_loss=0.3507, over 16312.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.132, cr_loss=0.3496, over 3348105.39 frames. ], batch size: 36, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:30:09,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=521775.3333333333, ans=0.0 2024-09-24 14:30:15,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521822.0, ans=0.125 2024-09-24 14:30:20,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=521822.0, ans=0.95 2024-09-24 14:30:23,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=521822.0, ans=0.0 2024-09-24 14:30:24,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-24 14:30:49,232 INFO [train.py:1198] (0/4) Epoch 29, batch 2750, loss[loss=0.1977, ctc_loss=0.1275, cr_loss=0.3512, over 17214.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1311, cr_loss=0.3485, over 3360590.17 frames. ], batch size: 55, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:31:02,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521915.3333333333, ans=0.125 2024-09-24 14:31:08,250 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.228e+02 1.306e+02 1.411e+02 3.014e+02, threshold=2.612e+02, percent-clipped=1.0 2024-09-24 14:31:18,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=22.5 2024-09-24 14:31:30,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=522008.6666666667, ans=0.125 2024-09-24 14:31:55,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=522102.0, ans=0.125 2024-09-24 14:31:57,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522102.0, ans=0.1 2024-09-24 14:32:11,739 INFO [train.py:1198] (0/4) Epoch 29, batch 2800, loss[loss=0.1953, ctc_loss=0.1286, cr_loss=0.3335, over 17361.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1312, cr_loss=0.349, over 3364185.86 frames. ], batch size: 48, lr: 4.10e-03, grad_scale: 32.0 2024-09-24 14:32:13,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=522148.6666666667, ans=0.1 2024-09-24 14:32:15,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=522148.6666666667, ans=0.125 2024-09-24 14:32:18,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=522148.6666666667, ans=0.125 2024-09-24 14:32:45,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=522242.0, ans=0.05 2024-09-24 14:32:46,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=522242.0, ans=0.0 2024-09-24 14:33:10,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522288.6666666667, ans=0.1 2024-09-24 14:33:12,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=22.5 2024-09-24 14:33:21,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=522335.3333333333, ans=0.0 2024-09-24 14:33:34,163 INFO [train.py:1198] (0/4) Epoch 29, batch 2850, loss[loss=0.1985, ctc_loss=0.1238, cr_loss=0.3731, over 17167.00 frames. ], tot_loss[loss=0.2012, ctc_loss=0.1314, cr_loss=0.3493, over 3352191.39 frames. ], batch size: 45, lr: 4.10e-03, grad_scale: 32.0 2024-09-24 14:33:45,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=522382.0, ans=0.125 2024-09-24 14:33:49,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-24 14:33:57,876 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.283e+02 1.399e+02 1.566e+02 1.810e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-24 14:34:01,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=522428.6666666667, ans=0.125 2024-09-24 14:34:17,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=522475.3333333333, ans=0.125 2024-09-24 14:34:18,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2024-09-24 14:34:57,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=522568.6666666667, ans=0.07 2024-09-24 14:35:00,063 INFO [train.py:1198] (0/4) Epoch 29, batch 2900, loss[loss=0.237, ctc_loss=0.1658, cr_loss=0.3562, over 11961.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1304, cr_loss=0.3472, over 3355323.32 frames. ], batch size: 123, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:35:04,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-09-24 14:35:15,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-24 14:35:16,505 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-112000.pt 2024-09-24 14:35:41,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2024-09-24 14:35:58,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=522755.3333333333, ans=0.1 2024-09-24 14:36:08,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=522802.0, ans=0.0 2024-09-24 14:36:15,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2024-09-24 14:36:22,783 INFO [train.py:1198] (0/4) Epoch 29, batch 2950, loss[loss=0.2264, ctc_loss=0.1507, cr_loss=0.3783, over 15988.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1308, cr_loss=0.3484, over 3352633.79 frames. ], batch size: 74, lr: 4.10e-03, grad_scale: 16.0 2024-09-24 14:36:43,419 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.270e+02 1.367e+02 1.488e+02 1.985e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-24 14:37:14,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=522988.6666666667, ans=0.0 2024-09-24 14:37:20,230 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2024-09-24 14:37:44,730 INFO [train.py:1198] (0/4) Epoch 29, batch 3000, loss[loss=0.2248, ctc_loss=0.148, cr_loss=0.3839, over 17051.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1311, cr_loss=0.3491, over 3361916.61 frames. ], batch size: 52, lr: 4.09e-03, grad_scale: 16.0 2024-09-24 14:37:44,731 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 14:38:00,324 INFO [train.py:1230] (0/4) Epoch 29, validation: loss=0.03658, ctc_loss=0.03658, cr_loss=8.731e-15, over 944034.00 frames. 2024-09-24 14:38:00,325 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 14:38:24,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=523128.6666666667, ans=0.0 2024-09-24 14:38:24,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=523128.6666666667, ans=0.025 2024-09-24 14:38:30,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=523175.3333333333, ans=0.0 2024-09-24 14:38:46,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=523222.0, ans=0.125 2024-09-24 14:39:19,060 INFO [train.py:1198] (0/4) Epoch 29, batch 3050, loss[loss=0.1871, ctc_loss=0.1227, cr_loss=0.3223, over 17082.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1307, cr_loss=0.3483, over 3361118.95 frames. ], batch size: 43, lr: 4.09e-03, grad_scale: 16.0 2024-09-24 14:39:37,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-09-24 14:39:39,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=523362.0, ans=0.125 2024-09-24 14:39:42,288 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.250e+02 1.367e+02 1.506e+02 1.984e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-24 14:39:45,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=523362.0, ans=0.025 2024-09-24 14:39:53,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=523408.6666666667, ans=0.0 2024-09-24 14:39:55,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=523408.6666666667, ans=0.125 2024-09-24 14:39:59,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523408.6666666667, ans=0.1 2024-09-24 14:40:36,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=523502.0, ans=0.2 2024-09-24 14:40:40,603 INFO [train.py:1198] (0/4) Epoch 29, batch 3100, loss[loss=0.1871, ctc_loss=0.1202, cr_loss=0.3346, over 17023.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.131, cr_loss=0.349, over 3356376.97 frames. ], batch size: 44, lr: 4.09e-03, grad_scale: 16.0 2024-09-24 14:40:45,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=523548.6666666667, ans=0.09899494936611666 2024-09-24 14:41:20,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=523642.0, ans=0.2 2024-09-24 14:41:20,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=523642.0, ans=0.0 2024-09-24 14:41:31,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=523688.6666666667, ans=0.125 2024-09-24 14:41:49,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=523735.3333333333, ans=0.0 2024-09-24 14:42:01,585 INFO [train.py:1198] (0/4) Epoch 29, batch 3150, loss[loss=0.1793, ctc_loss=0.1142, cr_loss=0.325, over 16960.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1311, cr_loss=0.3491, over 3355643.42 frames. ], batch size: 42, lr: 4.09e-03, grad_scale: 16.0 2024-09-24 14:42:06,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=523782.0, ans=0.125 2024-09-24 14:42:21,628 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.246e+02 1.357e+02 1.527e+02 1.959e+02, threshold=2.714e+02, percent-clipped=0.0 2024-09-24 14:42:55,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2024-09-24 14:43:14,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=523968.6666666667, ans=0.0 2024-09-24 14:43:19,448 INFO [train.py:1198] (0/4) Epoch 29, batch 3200, loss[loss=0.2068, ctc_loss=0.1341, cr_loss=0.3635, over 16001.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1309, cr_loss=0.3493, over 3357898.74 frames. ], batch size: 74, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:43:42,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=524062.0, ans=0.125 2024-09-24 14:43:42,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524062.0, ans=0.0 2024-09-24 14:43:44,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=524062.0, ans=0.125 2024-09-24 14:43:53,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=524108.6666666667, ans=0.125 2024-09-24 14:44:12,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=524155.3333333333, ans=0.125 2024-09-24 14:44:14,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=524155.3333333333, ans=0.0 2024-09-24 14:44:15,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=524155.3333333333, ans=0.125 2024-09-24 14:44:25,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=524202.0, ans=0.025 2024-09-24 14:44:37,177 INFO [train.py:1198] (0/4) Epoch 29, batch 3250, loss[loss=0.21, ctc_loss=0.1382, cr_loss=0.3587, over 17148.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1317, cr_loss=0.3505, over 3351694.60 frames. ], batch size: 45, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:44:37,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=524248.6666666667, ans=0.125 2024-09-24 14:44:57,322 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.269e+02 1.350e+02 1.434e+02 1.655e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 14:45:06,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=22.5 2024-09-24 14:45:09,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=524342.0, ans=0.125 2024-09-24 14:45:13,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=524342.0, ans=0.95 2024-09-24 14:45:21,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=524342.0, ans=0.125 2024-09-24 14:45:33,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=524388.6666666666, ans=0.04949747468305833 2024-09-24 14:45:55,498 INFO [train.py:1198] (0/4) Epoch 29, batch 3300, loss[loss=0.2323, ctc_loss=0.1522, cr_loss=0.4007, over 16501.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1315, cr_loss=0.3503, over 3359214.73 frames. ], batch size: 66, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:46:22,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2024-09-24 14:46:32,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=524575.3333333334, ans=0.0 2024-09-24 14:46:33,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-24 14:46:34,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=524575.3333333334, ans=0.125 2024-09-24 14:46:38,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=524575.3333333334, ans=0.0 2024-09-24 14:46:58,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=524668.6666666666, ans=0.0 2024-09-24 14:47:15,323 INFO [train.py:1198] (0/4) Epoch 29, batch 3350, loss[loss=0.2138, ctc_loss=0.1432, cr_loss=0.353, over 17240.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1316, cr_loss=0.3499, over 3354415.46 frames. ], batch size: 55, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:47:32,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=524762.0, ans=0.025 2024-09-24 14:47:35,617 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.319e+02 1.369e+02 1.450e+02 1.942e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-24 14:47:41,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=15.0 2024-09-24 14:48:01,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=524808.6666666666, ans=0.0 2024-09-24 14:48:15,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=524855.3333333334, ans=0.125 2024-09-24 14:48:18,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=524902.0, ans=0.125 2024-09-24 14:48:21,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=524902.0, ans=0.0 2024-09-24 14:48:35,633 INFO [train.py:1198] (0/4) Epoch 29, batch 3400, loss[loss=0.2123, ctc_loss=0.1383, cr_loss=0.3701, over 16497.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1315, cr_loss=0.3492, over 3354781.97 frames. ], batch size: 66, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:48:48,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=524948.6666666666, ans=0.0 2024-09-24 14:48:55,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2024-09-24 14:48:59,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=524995.3333333334, ans=0.0 2024-09-24 14:49:07,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525042.0, ans=0.1 2024-09-24 14:49:38,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=525135.3333333334, ans=0.025 2024-09-24 14:49:39,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=525135.3333333334, ans=0.125 2024-09-24 14:49:55,467 INFO [train.py:1198] (0/4) Epoch 29, batch 3450, loss[loss=0.1673, ctc_loss=0.1092, cr_loss=0.2901, over 17110.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1309, cr_loss=0.3483, over 3358912.91 frames. ], batch size: 40, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:50:15,930 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.258e+02 1.341e+02 1.460e+02 2.344e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-24 14:50:18,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.43 vs. limit=22.5 2024-09-24 14:50:36,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=525275.3333333334, ans=0.125 2024-09-24 14:51:08,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525368.6666666666, ans=0.1 2024-09-24 14:51:14,075 INFO [train.py:1198] (0/4) Epoch 29, batch 3500, loss[loss=0.2323, ctc_loss=0.1592, cr_loss=0.3653, over 11967.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.132, cr_loss=0.35, over 3356205.22 frames. ], batch size: 123, lr: 4.09e-03, grad_scale: 32.0 2024-09-24 14:51:14,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-09-24 14:51:19,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=525415.3333333334, ans=0.1 2024-09-24 14:51:19,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=525415.3333333334, ans=0.0 2024-09-24 14:52:34,351 INFO [train.py:1198] (0/4) Epoch 29, batch 3550, loss[loss=0.2009, ctc_loss=0.1335, cr_loss=0.3366, over 16842.00 frames. ], tot_loss[loss=0.2014, ctc_loss=0.1317, cr_loss=0.349, over 3355020.42 frames. ], batch size: 58, lr: 4.08e-03, grad_scale: 32.0 2024-09-24 14:52:54,489 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.066e+02 1.276e+02 1.353e+02 1.468e+02 1.866e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 14:53:10,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2024-09-24 14:53:28,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=525788.6666666666, ans=0.025 2024-09-24 14:53:44,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=525835.3333333334, ans=0.125 2024-09-24 14:53:52,067 INFO [train.py:1198] (0/4) Epoch 29, batch 3600, loss[loss=0.185, ctc_loss=0.1167, cr_loss=0.3415, over 17091.00 frames. ], tot_loss[loss=0.2021, ctc_loss=0.1321, cr_loss=0.3498, over 3352921.91 frames. ], batch size: 49, lr: 4.08e-03, grad_scale: 32.0 2024-09-24 14:53:57,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525882.0, ans=0.1 2024-09-24 14:55:04,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=526068.6666666666, ans=0.125 2024-09-24 14:55:10,107 INFO [train.py:1198] (0/4) Epoch 29, batch 3650, loss[loss=0.1713, ctc_loss=0.1077, cr_loss=0.3183, over 17277.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1317, cr_loss=0.3494, over 3355050.27 frames. ], batch size: 42, lr: 4.08e-03, grad_scale: 32.0 2024-09-24 14:55:17,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.98 vs. limit=10.0 2024-09-24 14:55:22,889 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:55:31,934 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.251e+02 1.330e+02 1.491e+02 2.044e+02, threshold=2.661e+02, percent-clipped=0.0 2024-09-24 14:55:41,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=526208.6666666666, ans=0.1 2024-09-24 14:55:51,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=526208.6666666666, ans=0.0 2024-09-24 14:56:31,269 INFO [train.py:1198] (0/4) Epoch 29, batch 3700, loss[loss=0.2327, ctc_loss=0.1608, cr_loss=0.3595, over 11961.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1312, cr_loss=0.3486, over 3354821.10 frames. ], batch size: 124, lr: 4.08e-03, grad_scale: 16.0 2024-09-24 14:56:58,110 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:57:46,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=526535.3333333334, ans=0.125 2024-09-24 14:57:49,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=526582.0, ans=0.125 2024-09-24 14:57:51,172 INFO [train.py:1198] (0/4) Epoch 29, batch 3750, loss[loss=0.2206, ctc_loss=0.146, cr_loss=0.3731, over 16785.00 frames. ], tot_loss[loss=0.2012, ctc_loss=0.1315, cr_loss=0.3483, over 3344202.99 frames. ], batch size: 61, lr: 4.08e-03, grad_scale: 16.0 2024-09-24 14:58:13,097 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.272e+02 1.350e+02 1.451e+02 2.081e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 14:58:27,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=526675.3333333334, ans=0.025 2024-09-24 14:58:52,462 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 14:58:55,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526768.6666666666, ans=0.1 2024-09-24 14:59:07,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=526768.6666666666, ans=0.125 2024-09-24 14:59:10,334 INFO [train.py:1198] (0/4) Epoch 29, batch 3800, loss[loss=0.2547, ctc_loss=0.1759, cr_loss=0.3944, over 14978.00 frames. ], tot_loss[loss=0.2019, ctc_loss=0.132, cr_loss=0.3493, over 3328500.51 frames. ], batch size: 89, lr: 4.08e-03, grad_scale: 16.0 2024-09-24 14:59:50,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=526908.6666666666, ans=0.125 2024-09-24 14:59:57,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=526955.3333333334, ans=0.2 2024-09-24 15:00:27,867 INFO [train.py:1198] (0/4) Epoch 29, batch 3850, loss[loss=0.2249, ctc_loss=0.1458, cr_loss=0.3956, over 16518.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1328, cr_loss=0.3498, over 3313818.03 frames. ], batch size: 66, lr: 4.08e-03, grad_scale: 16.0 2024-09-24 15:00:49,020 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.344e+02 1.460e+02 1.620e+02 2.302e+02, threshold=2.919e+02, percent-clipped=0.0 2024-09-24 15:00:55,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=527095.3333333334, ans=0.125 2024-09-24 15:01:03,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=527142.0, ans=0.0 2024-09-24 15:01:05,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2024-09-24 15:01:11,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527142.0, ans=0.0 2024-09-24 15:01:37,752 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-29.pt 2024-09-24 15:02:28,431 INFO [train.py:1198] (0/4) Epoch 30, batch 0, loss[loss=0.2172, ctc_loss=0.1411, cr_loss=0.3806, over 17206.00 frames. ], tot_loss[loss=0.2172, ctc_loss=0.1411, cr_loss=0.3806, over 17206.00 frames. ], batch size: 47, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:02:28,432 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 15:02:37,933 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.9093, 4.0133, 3.9908, 4.4606, 3.9313, 3.8122, 4.4111, 4.6812], device='cuda:0') 2024-09-24 15:02:43,743 INFO [train.py:1230] (0/4) Epoch 30, validation: loss=0.0352, ctc_loss=0.0352, cr_loss=9.262e-15, over 944034.00 frames. 2024-09-24 15:02:43,744 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 15:02:50,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=527263.3333333334, ans=0.2 2024-09-24 15:03:00,776 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-09-24 15:03:06,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=527310.0, ans=0.125 2024-09-24 15:03:07,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-24 15:03:08,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=527310.0, ans=0.125 2024-09-24 15:03:14,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=527310.0, ans=0.0 2024-09-24 15:03:32,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=527356.6666666666, ans=0.2 2024-09-24 15:04:07,011 INFO [train.py:1198] (0/4) Epoch 30, batch 50, loss[loss=0.1969, ctc_loss=0.1277, cr_loss=0.3462, over 17029.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1318, cr_loss=0.3488, over 751982.40 frames. ], batch size: 44, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:04:14,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2024-09-24 15:04:35,822 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.261e+02 1.395e+02 1.583e+02 2.602e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-24 15:04:36,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=527543.3333333334, ans=0.125 2024-09-24 15:04:42,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-09-24 15:04:43,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=527590.0, ans=0.0 2024-09-24 15:05:06,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527636.6666666666, ans=0.1 2024-09-24 15:05:16,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=527683.3333333334, ans=0.0 2024-09-24 15:05:19,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=527683.3333333334, ans=0.125 2024-09-24 15:05:30,395 INFO [train.py:1198] (0/4) Epoch 30, batch 100, loss[loss=0.2146, ctc_loss=0.1398, cr_loss=0.374, over 16983.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1318, cr_loss=0.3501, over 1338103.45 frames. ], batch size: 56, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:05:46,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=12.0 2024-09-24 15:05:51,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=527776.6666666666, ans=0.05 2024-09-24 15:06:00,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=527823.3333333334, ans=0.125 2024-09-24 15:06:14,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=527823.3333333334, ans=0.125 2024-09-24 15:06:32,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=527916.6666666666, ans=0.04949747468305833 2024-09-24 15:06:43,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=527916.6666666666, ans=0.125 2024-09-24 15:06:55,915 INFO [train.py:1198] (0/4) Epoch 30, batch 150, loss[loss=0.2323, ctc_loss=0.1528, cr_loss=0.3975, over 17011.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1323, cr_loss=0.3507, over 1783510.12 frames. ], batch size: 52, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:06:56,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=527963.3333333334, ans=0.2 2024-09-24 15:07:02,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527963.3333333334, ans=0.1 2024-09-24 15:07:15,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=528010.0, ans=0.0 2024-09-24 15:07:23,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=528010.0, ans=0.05 2024-09-24 15:07:23,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=528010.0, ans=0.09899494936611666 2024-09-24 15:07:24,703 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.297e+02 1.402e+02 1.523e+02 2.631e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-24 15:07:33,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=528056.6666666666, ans=0.125 2024-09-24 15:07:34,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=528056.6666666666, ans=0.035 2024-09-24 15:08:15,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2024-09-24 15:08:19,312 INFO [train.py:1198] (0/4) Epoch 30, batch 200, loss[loss=0.2056, ctc_loss=0.1359, cr_loss=0.3484, over 17020.00 frames. ], tot_loss[loss=0.2028, ctc_loss=0.1325, cr_loss=0.3513, over 2119202.74 frames. ], batch size: 51, lr: 4.01e-03, grad_scale: 32.0 2024-09-24 15:08:20,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=528196.6666666666, ans=15.0 2024-09-24 15:08:53,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=528290.0, ans=0.125 2024-09-24 15:09:09,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=528336.6666666666, ans=0.2 2024-09-24 15:09:20,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=528336.6666666666, ans=0.5 2024-09-24 15:09:33,051 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-24 15:09:42,473 INFO [train.py:1198] (0/4) Epoch 30, batch 250, loss[loss=0.1958, ctc_loss=0.1269, cr_loss=0.3444, over 17142.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1316, cr_loss=0.3501, over 2398107.14 frames. ], batch size: 48, lr: 4.00e-03, grad_scale: 32.0 2024-09-24 15:09:47,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=528430.0, ans=0.1 2024-09-24 15:10:02,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=528476.6666666666, ans=0.0 2024-09-24 15:10:11,297 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.266e+02 1.350e+02 1.451e+02 1.821e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-24 15:10:21,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=528523.3333333334, ans=0.125 2024-09-24 15:10:21,360 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:10:31,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=528570.0, ans=0.2 2024-09-24 15:10:51,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=528616.6666666666, ans=0.025 2024-09-24 15:10:55,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=528616.6666666666, ans=0.125 2024-09-24 15:10:58,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2024-09-24 15:11:02,796 INFO [train.py:1198] (0/4) Epoch 30, batch 300, loss[loss=0.1871, ctc_loss=0.1187, cr_loss=0.3416, over 17291.00 frames. ], tot_loss[loss=0.202, ctc_loss=0.1319, cr_loss=0.3506, over 2613290.05 frames. ], batch size: 46, lr: 4.00e-03, grad_scale: 32.0 2024-09-24 15:11:25,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=528710.0, ans=0.2 2024-09-24 15:11:32,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=528710.0, ans=0.0 2024-09-24 15:11:48,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.59 vs. limit=15.0 2024-09-24 15:12:07,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2024-09-24 15:12:07,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=528803.3333333334, ans=0.125 2024-09-24 15:12:17,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=528850.0, ans=0.0 2024-09-24 15:12:18,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=12.0 2024-09-24 15:12:28,505 INFO [train.py:1198] (0/4) Epoch 30, batch 350, loss[loss=0.1678, ctc_loss=0.1078, cr_loss=0.2998, over 16211.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1308, cr_loss=0.3481, over 2773391.88 frames. ], batch size: 36, lr: 4.00e-03, grad_scale: 32.0 2024-09-24 15:12:28,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=528896.6666666666, ans=0.1 2024-09-24 15:13:00,084 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.261e+02 1.365e+02 1.554e+02 1.989e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-24 15:13:05,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.88 vs. limit=22.5 2024-09-24 15:13:35,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=12.0 2024-09-24 15:13:51,537 INFO [train.py:1198] (0/4) Epoch 30, batch 400, loss[loss=0.2101, ctc_loss=0.139, cr_loss=0.3552, over 17095.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1308, cr_loss=0.3475, over 2904781.45 frames. ], batch size: 49, lr: 4.00e-03, grad_scale: 32.0 2024-09-24 15:14:25,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=529223.3333333334, ans=0.0 2024-09-24 15:14:37,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=529223.3333333334, ans=0.2 2024-09-24 15:14:46,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2024-09-24 15:15:14,480 INFO [train.py:1198] (0/4) Epoch 30, batch 450, loss[loss=0.2339, ctc_loss=0.1568, cr_loss=0.3856, over 11942.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1318, cr_loss=0.349, over 2994060.08 frames. ], batch size: 124, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:15:20,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=529363.3333333334, ans=15.0 2024-09-24 15:15:29,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=529410.0, ans=0.125 2024-09-24 15:15:29,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.77 vs. limit=15.0 2024-09-24 15:15:38,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=529410.0, ans=0.2 2024-09-24 15:15:44,838 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.260e+02 1.335e+02 1.450e+02 2.256e+02, threshold=2.670e+02, percent-clipped=0.0 2024-09-24 15:15:57,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=529456.6666666666, ans=0.0 2024-09-24 15:16:15,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=529503.3333333334, ans=0.1 2024-09-24 15:16:34,285 INFO [train.py:1198] (0/4) Epoch 30, batch 500, loss[loss=0.1977, ctc_loss=0.1263, cr_loss=0.3569, over 17010.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3482, over 3083730.28 frames. ], batch size: 51, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:16:41,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.54 vs. limit=22.5 2024-09-24 15:17:00,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=529643.3333333334, ans=0.1 2024-09-24 15:17:10,944 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=22.5 2024-09-24 15:17:20,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=529690.0, ans=0.125 2024-09-24 15:17:54,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=529783.3333333334, ans=0.0 2024-09-24 15:18:02,947 INFO [train.py:1198] (0/4) Epoch 30, batch 550, loss[loss=0.2078, ctc_loss=0.1365, cr_loss=0.3563, over 17014.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1312, cr_loss=0.348, over 3140557.13 frames. ], batch size: 56, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:18:10,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=529830.0, ans=15.0 2024-09-24 15:18:14,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=529830.0, ans=0.125 2024-09-24 15:18:14,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=529830.0, ans=0.05 2024-09-24 15:18:33,366 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.259e+02 1.342e+02 1.453e+02 2.055e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-24 15:18:38,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=529923.3333333334, ans=0.125 2024-09-24 15:19:02,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=529970.0, ans=0.0 2024-09-24 15:19:04,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=529970.0, ans=0.125 2024-09-24 15:19:19,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=12.0 2024-09-24 15:19:20,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=530016.6666666666, ans=0.0 2024-09-24 15:19:23,320 INFO [train.py:1198] (0/4) Epoch 30, batch 600, loss[loss=0.1788, ctc_loss=0.117, cr_loss=0.3093, over 16943.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1305, cr_loss=0.3462, over 3200635.76 frames. ], batch size: 58, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:19:26,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=530063.3333333334, ans=0.0 2024-09-24 15:19:39,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=530110.0, ans=0.025 2024-09-24 15:19:48,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=530110.0, ans=0.2 2024-09-24 15:20:45,718 INFO [train.py:1198] (0/4) Epoch 30, batch 650, loss[loss=0.2106, ctc_loss=0.1388, cr_loss=0.3588, over 17314.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1304, cr_loss=0.3466, over 3244979.15 frames. ], batch size: 51, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:20:55,438 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:20:57,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=530296.6666666666, ans=0.0 2024-09-24 15:21:15,829 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.275e+02 1.371e+02 1.445e+02 2.518e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-24 15:22:09,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=530530.0, ans=0.125 2024-09-24 15:22:10,439 INFO [train.py:1198] (0/4) Epoch 30, batch 700, loss[loss=0.1944, ctc_loss=0.128, cr_loss=0.332, over 17020.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1302, cr_loss=0.346, over 3273630.69 frames. ], batch size: 51, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:22:18,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=530530.0, ans=0.2 2024-09-24 15:22:22,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=530530.0, ans=0.0 2024-09-24 15:22:24,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2024-09-24 15:22:58,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=530623.3333333334, ans=0.2 2024-09-24 15:23:19,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=530716.6666666666, ans=0.2 2024-09-24 15:23:25,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=530716.6666666666, ans=0.0 2024-09-24 15:23:32,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=530763.3333333334, ans=0.0 2024-09-24 15:23:33,187 INFO [train.py:1198] (0/4) Epoch 30, batch 750, loss[loss=0.2116, ctc_loss=0.137, cr_loss=0.3728, over 17313.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.1301, cr_loss=0.346, over 3291263.83 frames. ], batch size: 51, lr: 4.00e-03, grad_scale: 16.0 2024-09-24 15:23:38,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=12.0 2024-09-24 15:24:04,035 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.243e+02 1.346e+02 1.462e+02 2.306e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 15:24:04,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=530856.6666666666, ans=0.2 2024-09-24 15:24:12,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=530856.6666666666, ans=0.125 2024-09-24 15:24:52,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2024-09-24 15:24:56,210 INFO [train.py:1198] (0/4) Epoch 30, batch 800, loss[loss=0.2103, ctc_loss=0.1376, cr_loss=0.3632, over 17170.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1305, cr_loss=0.3474, over 3301092.24 frames. ], batch size: 45, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:25:04,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=530996.6666666666, ans=0.0 2024-09-24 15:25:25,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2024-09-24 15:25:36,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=531090.0, ans=0.1 2024-09-24 15:26:10,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-09-24 15:26:16,196 INFO [train.py:1198] (0/4) Epoch 30, batch 850, loss[loss=0.2486, ctc_loss=0.1727, cr_loss=0.3795, over 11616.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.1299, cr_loss=0.3468, over 3320561.53 frames. ], batch size: 123, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:26:53,452 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.016e+02 1.252e+02 1.339e+02 1.409e+02 2.350e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 15:27:30,117 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-09-24 15:27:37,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=531416.6666666666, ans=0.1 2024-09-24 15:27:41,877 INFO [train.py:1198] (0/4) Epoch 30, batch 900, loss[loss=0.1886, ctc_loss=0.1223, cr_loss=0.3315, over 17166.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.1291, cr_loss=0.3453, over 3333286.20 frames. ], batch size: 45, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:27:48,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=531463.3333333334, ans=0.0 2024-09-24 15:28:03,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2024-09-24 15:28:08,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=531510.0, ans=0.0 2024-09-24 15:28:16,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=531556.6666666666, ans=0.125 2024-09-24 15:28:22,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=531556.6666666666, ans=0.0 2024-09-24 15:28:46,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=531650.0, ans=0.125 2024-09-24 15:28:53,394 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:29:04,216 INFO [train.py:1198] (0/4) Epoch 30, batch 950, loss[loss=0.2291, ctc_loss=0.1543, cr_loss=0.3742, over 17041.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1291, cr_loss=0.3446, over 3341510.26 frames. ], batch size: 52, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:29:22,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2024-09-24 15:29:35,837 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.249e+02 1.333e+02 1.422e+02 1.714e+02, threshold=2.667e+02, percent-clipped=0.0 2024-09-24 15:29:37,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=531790.0, ans=0.125 2024-09-24 15:30:02,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=531836.6666666666, ans=0.125 2024-09-24 15:30:14,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=531883.3333333334, ans=0.1 2024-09-24 15:30:26,508 INFO [train.py:1198] (0/4) Epoch 30, batch 1000, loss[loss=0.1897, ctc_loss=0.1234, cr_loss=0.3315, over 17092.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1298, cr_loss=0.3455, over 3348263.02 frames. ], batch size: 43, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:30:28,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531930.0, ans=0.1 2024-09-24 15:31:01,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=532023.3333333334, ans=0.0 2024-09-24 15:31:01,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532023.3333333334, ans=0.1 2024-09-24 15:31:15,660 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=22.5 2024-09-24 15:31:23,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=532070.0, ans=0.05 2024-09-24 15:31:31,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2024-09-24 15:31:34,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=532116.6666666666, ans=0.0 2024-09-24 15:31:51,972 INFO [train.py:1198] (0/4) Epoch 30, batch 1050, loss[loss=0.2336, ctc_loss=0.1534, cr_loss=0.4011, over 17036.00 frames. ], tot_loss[loss=0.1985, ctc_loss=0.1296, cr_loss=0.3448, over 3355149.35 frames. ], batch size: 53, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:32:06,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=532210.0, ans=0.125 2024-09-24 15:32:23,922 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.266e+02 1.377e+02 1.519e+02 2.640e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-24 15:32:25,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=532256.6666666666, ans=0.07 2024-09-24 15:32:38,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=532303.3333333334, ans=0.0 2024-09-24 15:32:44,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=532303.3333333334, ans=0.05 2024-09-24 15:32:51,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2024-09-24 15:33:01,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=532350.0, ans=0.125 2024-09-24 15:33:06,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-24 15:33:14,103 INFO [train.py:1198] (0/4) Epoch 30, batch 1100, loss[loss=0.2161, ctc_loss=0.1401, cr_loss=0.3803, over 17058.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1307, cr_loss=0.3471, over 3357342.29 frames. ], batch size: 46, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:33:30,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=532443.3333333334, ans=0.1 2024-09-24 15:33:33,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=532443.3333333334, ans=0.025 2024-09-24 15:33:58,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=532490.0, ans=0.125 2024-09-24 15:34:06,895 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=532536.6666666666, ans=0.5 2024-09-24 15:34:10,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-24 15:34:33,774 INFO [train.py:1198] (0/4) Epoch 30, batch 1150, loss[loss=0.2155, ctc_loss=0.1425, cr_loss=0.365, over 17022.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.131, cr_loss=0.3476, over 3361614.61 frames. ], batch size: 52, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:34:42,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2024-09-24 15:35:05,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-09-24 15:35:09,706 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.299e+02 1.402e+02 1.532e+02 2.058e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-24 15:35:21,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=532723.3333333334, ans=0.125 2024-09-24 15:35:26,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=532770.0, ans=10.0 2024-09-24 15:35:30,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=532770.0, ans=0.2 2024-09-24 15:35:40,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=532816.6666666666, ans=0.0 2024-09-24 15:35:53,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=532816.6666666666, ans=0.125 2024-09-24 15:35:56,126 INFO [train.py:1198] (0/4) Epoch 30, batch 1200, loss[loss=0.222, ctc_loss=0.1449, cr_loss=0.3856, over 17004.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1311, cr_loss=0.3475, over 3365912.40 frames. ], batch size: 53, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:36:19,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=532910.0, ans=0.125 2024-09-24 15:36:29,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2024-09-24 15:37:02,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533003.3333333334, ans=0.1 2024-09-24 15:37:07,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=533050.0, ans=0.0 2024-09-24 15:37:21,570 INFO [train.py:1198] (0/4) Epoch 30, batch 1250, loss[loss=0.1718, ctc_loss=0.1103, cr_loss=0.3076, over 16758.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1311, cr_loss=0.3473, over 3355098.30 frames. ], batch size: 37, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:37:21,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=533096.6666666666, ans=0.0 2024-09-24 15:37:42,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=533143.3333333334, ans=0.125 2024-09-24 15:37:44,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533143.3333333334, ans=0.1 2024-09-24 15:37:47,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2024-09-24 15:37:54,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2024-09-24 15:37:57,235 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.068e+02 1.280e+02 1.378e+02 1.490e+02 1.846e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 15:38:16,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.85 vs. limit=22.5 2024-09-24 15:38:17,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=22.5 2024-09-24 15:38:24,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533236.6666666666, ans=0.1 2024-09-24 15:38:43,376 INFO [train.py:1198] (0/4) Epoch 30, batch 1300, loss[loss=0.2248, ctc_loss=0.1504, cr_loss=0.3721, over 17008.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1306, cr_loss=0.3461, over 3344366.78 frames. ], batch size: 53, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:39:26,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2024-09-24 15:39:39,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=22.5 2024-09-24 15:39:51,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=533516.6666666666, ans=0.125 2024-09-24 15:40:01,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=533516.6666666666, ans=0.125 2024-09-24 15:40:04,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=533563.3333333334, ans=0.2 2024-09-24 15:40:05,654 INFO [train.py:1198] (0/4) Epoch 30, batch 1350, loss[loss=0.1954, ctc_loss=0.1259, cr_loss=0.3475, over 17170.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.1311, cr_loss=0.347, over 3353963.30 frames. ], batch size: 41, lr: 3.99e-03, grad_scale: 16.0 2024-09-24 15:40:18,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=533563.3333333334, ans=0.125 2024-09-24 15:40:22,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=22.5 2024-09-24 15:40:38,859 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.284e+02 1.374e+02 1.506e+02 2.096e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-24 15:40:45,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.87 vs. limit=10.0 2024-09-24 15:40:48,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=533656.6666666666, ans=0.125 2024-09-24 15:40:57,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2024-09-24 15:41:05,424 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:41:11,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=533750.0, ans=0.025 2024-09-24 15:41:26,053 INFO [train.py:1198] (0/4) Epoch 30, batch 1400, loss[loss=0.1649, ctc_loss=0.1045, cr_loss=0.3023, over 17025.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1309, cr_loss=0.347, over 3348901.40 frames. ], batch size: 44, lr: 3.98e-03, grad_scale: 16.0 2024-09-24 15:41:26,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=533796.6666666666, ans=0.07 2024-09-24 15:41:28,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=533796.6666666666, ans=0.0 2024-09-24 15:41:41,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=533796.6666666666, ans=0.0 2024-09-24 15:41:47,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=533843.3333333334, ans=0.0 2024-09-24 15:41:56,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.85 vs. limit=6.0 2024-09-24 15:42:15,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=533890.0, ans=0.0 2024-09-24 15:42:35,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2024-09-24 15:42:54,636 INFO [train.py:1198] (0/4) Epoch 30, batch 1450, loss[loss=0.1665, ctc_loss=0.1046, cr_loss=0.3094, over 17069.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1309, cr_loss=0.347, over 3347624.47 frames. ], batch size: 46, lr: 3.98e-03, grad_scale: 16.0 2024-09-24 15:43:04,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=534030.0, ans=0.025 2024-09-24 15:43:28,224 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.340e+02 1.457e+02 1.567e+02 2.615e+02, threshold=2.914e+02, percent-clipped=0.0 2024-09-24 15:43:33,411 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:43:39,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=534123.3333333334, ans=0.125 2024-09-24 15:44:11,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=534216.6666666666, ans=0.0 2024-09-24 15:44:14,827 INFO [train.py:1198] (0/4) Epoch 30, batch 1500, loss[loss=0.2163, ctc_loss=0.1425, cr_loss=0.369, over 17302.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1309, cr_loss=0.3474, over 3354620.10 frames. ], batch size: 49, lr: 3.98e-03, grad_scale: 16.0 2024-09-24 15:44:40,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=534310.0, ans=0.0 2024-09-24 15:44:46,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=534310.0, ans=0.125 2024-09-24 15:45:05,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=534403.3333333334, ans=0.2 2024-09-24 15:45:10,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=534403.3333333334, ans=0.125 2024-09-24 15:45:34,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=534450.0, ans=0.1 2024-09-24 15:45:37,369 INFO [train.py:1198] (0/4) Epoch 30, batch 1550, loss[loss=0.1829, ctc_loss=0.1186, cr_loss=0.3214, over 16967.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1316, cr_loss=0.3489, over 3359349.65 frames. ], batch size: 42, lr: 3.98e-03, grad_scale: 16.0 2024-09-24 15:46:05,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=534543.3333333334, ans=0.125 2024-09-24 15:46:11,088 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.255e+02 1.331e+02 1.418e+02 1.761e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-24 15:47:02,575 INFO [train.py:1198] (0/4) Epoch 30, batch 1600, loss[loss=0.1577, ctc_loss=0.1002, cr_loss=0.2875, over 17186.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1315, cr_loss=0.3492, over 3363842.37 frames. ], batch size: 41, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:47:06,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=534730.0, ans=0.2 2024-09-24 15:47:13,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=534730.0, ans=0.125 2024-09-24 15:47:22,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534776.6666666666, ans=0.1 2024-09-24 15:47:50,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=534823.3333333334, ans=0.125 2024-09-24 15:48:17,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=534916.6666666666, ans=0.025 2024-09-24 15:48:19,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=534916.6666666666, ans=0.125 2024-09-24 15:48:25,595 INFO [train.py:1198] (0/4) Epoch 30, batch 1650, loss[loss=0.2008, ctc_loss=0.1316, cr_loss=0.3461, over 17138.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1314, cr_loss=0.3489, over 3366301.47 frames. ], batch size: 48, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:48:37,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=22.5 2024-09-24 15:48:57,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.84 vs. limit=5.0 2024-09-24 15:48:59,210 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.272e+02 1.342e+02 1.418e+02 1.836e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-24 15:49:14,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2024-09-24 15:49:45,318 INFO [train.py:1198] (0/4) Epoch 30, batch 1700, loss[loss=0.2454, ctc_loss=0.1683, cr_loss=0.3853, over 15029.00 frames. ], tot_loss[loss=0.2015, ctc_loss=0.1317, cr_loss=0.3494, over 3365965.90 frames. ], batch size: 89, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:50:17,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2024-09-24 15:51:02,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=535383.3333333334, ans=0.2 2024-09-24 15:51:08,584 INFO [train.py:1198] (0/4) Epoch 30, batch 1750, loss[loss=0.1824, ctc_loss=0.1155, cr_loss=0.3347, over 17053.00 frames. ], tot_loss[loss=0.2021, ctc_loss=0.1321, cr_loss=0.3499, over 3354374.96 frames. ], batch size: 39, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:51:42,518 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-09-24 15:51:47,335 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.310e+02 1.375e+02 1.459e+02 2.196e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-24 15:52:11,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=535570.0, ans=0.0 2024-09-24 15:52:33,548 INFO [train.py:1198] (0/4) Epoch 30, batch 1800, loss[loss=0.1957, ctc_loss=0.1274, cr_loss=0.3415, over 17107.00 frames. ], tot_loss[loss=0.2016, ctc_loss=0.1319, cr_loss=0.3485, over 3337102.28 frames. ], batch size: 49, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:53:42,348 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:53:53,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=535850.0, ans=0.125 2024-09-24 15:53:56,384 INFO [train.py:1198] (0/4) Epoch 30, batch 1850, loss[loss=0.1936, ctc_loss=0.1271, cr_loss=0.3328, over 17092.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1309, cr_loss=0.3465, over 3340162.86 frames. ], batch size: 49, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:53:59,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=535896.6666666666, ans=0.125 2024-09-24 15:54:30,248 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.250e+02 1.313e+02 1.396e+02 1.985e+02, threshold=2.627e+02, percent-clipped=0.0 2024-09-24 15:55:03,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=536083.3333333334, ans=0.2 2024-09-24 15:55:19,284 INFO [train.py:1198] (0/4) Epoch 30, batch 1900, loss[loss=0.1842, ctc_loss=0.1183, cr_loss=0.3294, over 17286.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1311, cr_loss=0.3479, over 3344496.93 frames. ], batch size: 49, lr: 3.98e-03, grad_scale: 32.0 2024-09-24 15:55:21,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-09-24 15:55:32,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=536130.0, ans=0.125 2024-09-24 15:55:50,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=536223.3333333334, ans=0.125 2024-09-24 15:55:59,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=536223.3333333334, ans=0.0 2024-09-24 15:56:11,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=536270.0, ans=0.125 2024-09-24 15:56:12,482 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 15:56:12,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=536270.0, ans=0.04949747468305833 2024-09-24 15:56:18,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=536270.0, ans=0.0 2024-09-24 15:56:25,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=536316.6666666666, ans=0.0 2024-09-24 15:56:37,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=536316.6666666666, ans=0.125 2024-09-24 15:56:44,476 INFO [train.py:1198] (0/4) Epoch 30, batch 1950, loss[loss=0.2081, ctc_loss=0.1342, cr_loss=0.3692, over 17208.00 frames. ], tot_loss[loss=0.2005, ctc_loss=0.131, cr_loss=0.3477, over 3339949.67 frames. ], batch size: 47, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 15:57:02,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=536410.0, ans=0.2 2024-09-24 15:57:10,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=536410.0, ans=0.1 2024-09-24 15:57:18,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=536456.6666666666, ans=0.125 2024-09-24 15:57:19,472 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.270e+02 1.356e+02 1.450e+02 2.095e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 15:57:34,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=536503.3333333334, ans=0.025 2024-09-24 15:57:49,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=536550.0, ans=0.125 2024-09-24 15:58:06,789 INFO [train.py:1198] (0/4) Epoch 30, batch 2000, loss[loss=0.1651, ctc_loss=0.1025, cr_loss=0.3132, over 17025.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1308, cr_loss=0.3477, over 3347941.96 frames. ], batch size: 39, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 15:58:08,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=536596.6666666666, ans=0.2 2024-09-24 15:58:10,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=536596.6666666666, ans=0.0 2024-09-24 15:58:21,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-24 15:58:29,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=536643.3333333334, ans=0.025 2024-09-24 15:58:29,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=536643.3333333334, ans=0.125 2024-09-24 15:58:45,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=536690.0, ans=0.125 2024-09-24 15:58:53,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=536736.6666666666, ans=0.0 2024-09-24 15:59:15,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=536783.3333333334, ans=0.0 2024-09-24 15:59:25,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=536830.0, ans=0.1 2024-09-24 15:59:26,921 INFO [train.py:1198] (0/4) Epoch 30, batch 2050, loss[loss=0.1984, ctc_loss=0.1332, cr_loss=0.3258, over 17041.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1311, cr_loss=0.3479, over 3353690.53 frames. ], batch size: 52, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:00:00,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=536923.3333333334, ans=0.1 2024-09-24 16:00:04,534 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.272e+02 1.366e+02 1.463e+02 2.391e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 16:00:07,021 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-09-24 16:00:09,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=536923.3333333334, ans=0.2 2024-09-24 16:00:12,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=536923.3333333334, ans=0.025 2024-09-24 16:00:23,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2024-09-24 16:00:30,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=536970.0, ans=0.125 2024-09-24 16:00:49,762 INFO [train.py:1198] (0/4) Epoch 30, batch 2100, loss[loss=0.2234, ctc_loss=0.1501, cr_loss=0.3665, over 16079.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1311, cr_loss=0.3479, over 3346311.29 frames. ], batch size: 74, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:00:55,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=12.0 2024-09-24 16:01:32,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=537156.6666666666, ans=0.025 2024-09-24 16:01:42,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-24 16:01:57,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=537250.0, ans=0.0 2024-09-24 16:02:03,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=537250.0, ans=0.125 2024-09-24 16:02:14,608 INFO [train.py:1198] (0/4) Epoch 30, batch 2150, loss[loss=0.1824, ctc_loss=0.118, cr_loss=0.3221, over 17233.00 frames. ], tot_loss[loss=0.2011, ctc_loss=0.1313, cr_loss=0.349, over 3346429.53 frames. ], batch size: 47, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:02:44,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=537343.3333333334, ans=0.0 2024-09-24 16:02:52,199 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.245e+02 1.327e+02 1.449e+02 2.310e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-24 16:03:07,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2024-09-24 16:03:30,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=537483.3333333334, ans=0.5 2024-09-24 16:03:36,774 INFO [train.py:1198] (0/4) Epoch 30, batch 2200, loss[loss=0.175, ctc_loss=0.1148, cr_loss=0.3011, over 16967.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1313, cr_loss=0.3483, over 3339094.54 frames. ], batch size: 42, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:04:28,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=537670.0, ans=0.0 2024-09-24 16:04:33,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=537670.0, ans=0.0 2024-09-24 16:04:43,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2024-09-24 16:05:00,239 INFO [train.py:1198] (0/4) Epoch 30, batch 2250, loss[loss=0.2412, ctc_loss=0.1629, cr_loss=0.3915, over 17207.00 frames. ], tot_loss[loss=0.2009, ctc_loss=0.1311, cr_loss=0.3488, over 3354781.62 frames. ], batch size: 50, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:05:02,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=537763.3333333334, ans=0.025 2024-09-24 16:05:08,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=537763.3333333334, ans=0.0 2024-09-24 16:05:15,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=537810.0, ans=0.125 2024-09-24 16:05:34,371 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 16:05:35,471 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.295e+02 1.394e+02 1.566e+02 2.386e+02, threshold=2.787e+02, percent-clipped=0.0 2024-09-24 16:06:06,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-09-24 16:06:09,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=537950.0, ans=0.2 2024-09-24 16:06:20,402 INFO [train.py:1198] (0/4) Epoch 30, batch 2300, loss[loss=0.1754, ctc_loss=0.1096, cr_loss=0.3287, over 17024.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1308, cr_loss=0.3479, over 3353150.62 frames. ], batch size: 39, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:07:02,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=538090.0, ans=0.0 2024-09-24 16:07:31,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=538183.3333333334, ans=0.125 2024-09-24 16:07:31,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=538183.3333333334, ans=0.0 2024-09-24 16:07:47,755 INFO [train.py:1198] (0/4) Epoch 30, batch 2350, loss[loss=0.205, ctc_loss=0.1317, cr_loss=0.3664, over 17253.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.131, cr_loss=0.3488, over 3341554.89 frames. ], batch size: 44, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:08:05,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=538276.6666666666, ans=0.125 2024-09-24 16:08:17,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=538276.6666666666, ans=0.125 2024-09-24 16:08:21,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=538323.3333333334, ans=0.1 2024-09-24 16:08:23,164 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.275e+02 1.344e+02 1.471e+02 2.396e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-24 16:08:26,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=538323.3333333334, ans=0.125 2024-09-24 16:08:39,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=538370.0, ans=0.0 2024-09-24 16:09:08,043 INFO [train.py:1198] (0/4) Epoch 30, batch 2400, loss[loss=0.204, ctc_loss=0.1331, cr_loss=0.3546, over 17213.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1306, cr_loss=0.3474, over 3346597.58 frames. ], batch size: 47, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:09:09,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=538463.3333333334, ans=0.125 2024-09-24 16:09:46,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=538556.6666666666, ans=0.125 2024-09-24 16:10:30,246 INFO [train.py:1198] (0/4) Epoch 30, batch 2450, loss[loss=0.2193, ctc_loss=0.1446, cr_loss=0.3737, over 16693.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1307, cr_loss=0.3473, over 3358171.97 frames. ], batch size: 61, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:10:41,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=538696.6666666666, ans=0.125 2024-09-24 16:10:57,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=538743.3333333334, ans=0.125 2024-09-24 16:11:05,483 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.261e+02 1.331e+02 1.438e+02 1.772e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-24 16:11:17,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=538836.6666666666, ans=0.0 2024-09-24 16:11:27,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538836.6666666666, ans=0.1 2024-09-24 16:11:40,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=12.0 2024-09-24 16:11:55,430 INFO [train.py:1198] (0/4) Epoch 30, batch 2500, loss[loss=0.1832, ctc_loss=0.1164, cr_loss=0.3341, over 17274.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1305, cr_loss=0.3478, over 3361154.36 frames. ], batch size: 42, lr: 3.97e-03, grad_scale: 32.0 2024-09-24 16:11:57,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=538930.0, ans=0.0 2024-09-24 16:11:57,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=538930.0, ans=0.2 2024-09-24 16:12:06,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=538930.0, ans=0.2 2024-09-24 16:12:57,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=539070.0, ans=0.0 2024-09-24 16:13:17,977 INFO [train.py:1198] (0/4) Epoch 30, batch 2550, loss[loss=0.1981, ctc_loss=0.1299, cr_loss=0.3412, over 17210.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1297, cr_loss=0.3463, over 3374129.99 frames. ], batch size: 47, lr: 3.96e-03, grad_scale: 32.0 2024-09-24 16:13:37,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=539210.0, ans=0.125 2024-09-24 16:13:48,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=539256.6666666666, ans=0.1 2024-09-24 16:13:50,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=539256.6666666666, ans=0.0 2024-09-24 16:13:53,195 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.247e+02 1.346e+02 1.482e+02 2.162e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-24 16:13:53,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=539256.6666666666, ans=0.0 2024-09-24 16:14:29,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=539350.0, ans=0.2 2024-09-24 16:14:40,189 INFO [train.py:1198] (0/4) Epoch 30, batch 2600, loss[loss=0.1995, ctc_loss=0.1286, cr_loss=0.3548, over 17299.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1293, cr_loss=0.3449, over 3362130.74 frames. ], batch size: 51, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:14:54,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539443.3333333334, ans=0.125 2024-09-24 16:14:58,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.15 vs. limit=10.0 2024-09-24 16:15:13,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=539490.0, ans=0.125 2024-09-24 16:15:34,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=539536.6666666666, ans=0.0 2024-09-24 16:15:41,147 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 16:16:00,286 INFO [train.py:1198] (0/4) Epoch 30, batch 2650, loss[loss=0.1824, ctc_loss=0.1175, cr_loss=0.3247, over 16280.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1292, cr_loss=0.3446, over 3359290.32 frames. ], batch size: 36, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:16:02,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=539630.0, ans=0.125 2024-09-24 16:16:09,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=12.0 2024-09-24 16:16:41,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-09-24 16:16:42,933 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.284e+02 1.354e+02 1.501e+02 2.171e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-24 16:16:51,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=539723.3333333334, ans=0.125 2024-09-24 16:17:05,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=539770.0, ans=0.125 2024-09-24 16:17:05,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539770.0, ans=0.125 2024-09-24 16:17:07,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2024-09-24 16:17:28,544 INFO [train.py:1198] (0/4) Epoch 30, batch 2700, loss[loss=0.2402, ctc_loss=0.1591, cr_loss=0.4057, over 16459.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1297, cr_loss=0.3456, over 3360782.46 frames. ], batch size: 66, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:17:57,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=539910.0, ans=0.125 2024-09-24 16:18:14,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=540003.3333333334, ans=0.0 2024-09-24 16:18:21,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=540003.3333333334, ans=0.2 2024-09-24 16:18:32,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=540050.0, ans=0.125 2024-09-24 16:18:35,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-24 16:18:37,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=540050.0, ans=0.025 2024-09-24 16:18:48,020 INFO [train.py:1198] (0/4) Epoch 30, batch 2750, loss[loss=0.1837, ctc_loss=0.1172, cr_loss=0.3324, over 17333.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.13, cr_loss=0.3463, over 3361874.47 frames. ], batch size: 48, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:18:49,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=540096.6666666666, ans=0.125 2024-09-24 16:19:04,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=540143.3333333334, ans=0.125 2024-09-24 16:19:09,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=540143.3333333334, ans=0.125 2024-09-24 16:19:26,140 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.254e+02 1.332e+02 1.454e+02 2.287e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-24 16:19:32,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=540190.0, ans=0.125 2024-09-24 16:19:47,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-09-24 16:20:00,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=540283.3333333334, ans=0.0 2024-09-24 16:20:04,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=540283.3333333334, ans=0.0 2024-09-24 16:20:10,881 INFO [train.py:1198] (0/4) Epoch 30, batch 2800, loss[loss=0.2178, ctc_loss=0.1446, cr_loss=0.366, over 17203.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1301, cr_loss=0.3472, over 3356495.16 frames. ], batch size: 55, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:20:30,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=540376.6666666666, ans=0.125 2024-09-24 16:20:32,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=540376.6666666666, ans=0.125 2024-09-24 16:20:34,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-09-24 16:20:38,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540376.6666666666, ans=0.1 2024-09-24 16:20:39,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.67 vs. limit=15.0 2024-09-24 16:20:50,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2024-09-24 16:20:51,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=540423.3333333334, ans=0.2 2024-09-24 16:20:53,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2024-09-24 16:21:22,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=22.5 2024-09-24 16:21:26,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=540516.6666666666, ans=0.1 2024-09-24 16:21:28,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=540516.6666666666, ans=0.04949747468305833 2024-09-24 16:21:35,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=540563.3333333334, ans=0.125 2024-09-24 16:21:36,742 INFO [train.py:1198] (0/4) Epoch 30, batch 2850, loss[loss=0.2052, ctc_loss=0.1339, cr_loss=0.3565, over 17256.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1305, cr_loss=0.3473, over 3351749.57 frames. ], batch size: 42, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:21:37,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=540563.3333333334, ans=0.125 2024-09-24 16:21:37,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=540563.3333333334, ans=0.0 2024-09-24 16:21:40,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=540563.3333333334, ans=0.125 2024-09-24 16:21:47,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=540563.3333333334, ans=0.025 2024-09-24 16:22:08,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-09-24 16:22:15,509 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.298e+02 1.387e+02 1.478e+02 2.436e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-24 16:22:17,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=540656.6666666666, ans=0.1 2024-09-24 16:22:41,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-09-24 16:22:47,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=540750.0, ans=0.0 2024-09-24 16:22:56,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-24 16:23:00,144 INFO [train.py:1198] (0/4) Epoch 30, batch 2900, loss[loss=0.1765, ctc_loss=0.1162, cr_loss=0.3012, over 17288.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.13, cr_loss=0.3465, over 3356488.29 frames. ], batch size: 46, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:23:03,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=540796.6666666666, ans=0.0 2024-09-24 16:23:37,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=540890.0, ans=0.125 2024-09-24 16:24:00,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=540936.6666666666, ans=0.0 2024-09-24 16:24:06,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=540983.3333333334, ans=0.125 2024-09-24 16:24:20,583 INFO [train.py:1198] (0/4) Epoch 30, batch 2950, loss[loss=0.2331, ctc_loss=0.155, cr_loss=0.3905, over 17223.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1297, cr_loss=0.3464, over 3354818.71 frames. ], batch size: 50, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:24:28,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=541030.0, ans=0.0 2024-09-24 16:24:44,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=541076.6666666666, ans=0.0 2024-09-24 16:24:49,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=541076.6666666666, ans=0.125 2024-09-24 16:24:55,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-09-24 16:25:00,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=541123.3333333334, ans=0.05 2024-09-24 16:25:01,512 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.276e+02 1.368e+02 1.488e+02 1.810e+02, threshold=2.735e+02, percent-clipped=0.0 2024-09-24 16:25:11,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=541170.0, ans=0.125 2024-09-24 16:25:32,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2024-09-24 16:25:42,518 INFO [train.py:1198] (0/4) Epoch 30, batch 3000, loss[loss=0.1673, ctc_loss=0.107, cr_loss=0.3015, over 17036.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.3461, over 3354583.83 frames. ], batch size: 39, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:25:42,519 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 16:25:57,425 INFO [train.py:1230] (0/4) Epoch 30, validation: loss=0.03649, ctc_loss=0.03649, cr_loss=8.522e-15, over 944034.00 frames. 2024-09-24 16:25:57,426 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 16:25:59,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=541263.3333333334, ans=0.125 2024-09-24 16:26:07,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=541263.3333333334, ans=0.125 2024-09-24 16:26:10,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=22.5 2024-09-24 16:26:22,368 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-116000.pt 2024-09-24 16:26:34,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=541356.6666666666, ans=0.0 2024-09-24 16:26:41,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=541356.6666666666, ans=0.07 2024-09-24 16:26:54,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541403.3333333334, ans=0.1 2024-09-24 16:27:06,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-24 16:27:10,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2024-09-24 16:27:22,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541496.6666666666, ans=0.1 2024-09-24 16:27:23,552 INFO [train.py:1198] (0/4) Epoch 30, batch 3050, loss[loss=0.2433, ctc_loss=0.1698, cr_loss=0.3679, over 12277.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1302, cr_loss=0.3471, over 3357115.36 frames. ], batch size: 123, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:27:36,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2024-09-24 16:27:37,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=541543.3333333334, ans=0.125 2024-09-24 16:27:37,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=541543.3333333334, ans=0.0 2024-09-24 16:27:51,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=541543.3333333334, ans=0.2 2024-09-24 16:28:00,734 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.275e+02 1.345e+02 1.423e+02 1.790e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-24 16:28:02,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=541590.0, ans=0.0 2024-09-24 16:28:05,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=541590.0, ans=0.0 2024-09-24 16:28:12,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2024-09-24 16:28:25,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=541683.3333333334, ans=0.125 2024-09-24 16:28:35,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=541683.3333333334, ans=0.0 2024-09-24 16:28:41,111 INFO [train.py:1198] (0/4) Epoch 30, batch 3100, loss[loss=0.2084, ctc_loss=0.1352, cr_loss=0.3658, over 17060.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1304, cr_loss=0.3477, over 3368596.10 frames. ], batch size: 46, lr: 3.96e-03, grad_scale: 16.0 2024-09-24 16:28:58,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=22.5 2024-09-24 16:29:07,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=541776.6666666666, ans=0.2 2024-09-24 16:29:07,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=541776.6666666666, ans=0.1 2024-09-24 16:29:18,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-24 16:29:19,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=541823.3333333334, ans=0.2 2024-09-24 16:29:27,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=541823.3333333334, ans=0.0 2024-09-24 16:29:31,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2024-09-24 16:29:35,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=541870.0, ans=0.2 2024-09-24 16:29:57,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541916.6666666666, ans=0.1 2024-09-24 16:30:01,866 INFO [train.py:1198] (0/4) Epoch 30, batch 3150, loss[loss=0.1736, ctc_loss=0.1125, cr_loss=0.3055, over 17290.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.13, cr_loss=0.3468, over 3368319.32 frames. ], batch size: 46, lr: 3.95e-03, grad_scale: 16.0 2024-09-24 16:30:16,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=542010.0, ans=0.125 2024-09-24 16:30:19,023 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 16:30:33,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=542056.6666666666, ans=0.125 2024-09-24 16:30:39,238 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.276e+02 1.352e+02 1.455e+02 1.845e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-24 16:30:52,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=542103.3333333334, ans=0.0 2024-09-24 16:31:01,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=542103.3333333334, ans=10.0 2024-09-24 16:31:14,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=542150.0, ans=0.125 2024-09-24 16:31:20,116 INFO [train.py:1198] (0/4) Epoch 30, batch 3200, loss[loss=0.2266, ctc_loss=0.1491, cr_loss=0.3871, over 15934.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1304, cr_loss=0.347, over 3349076.16 frames. ], batch size: 74, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:31:37,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=542243.3333333334, ans=0.95 2024-09-24 16:32:01,558 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-24 16:32:05,800 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 16:32:30,422 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2024-09-24 16:32:31,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=542383.3333333334, ans=0.0 2024-09-24 16:32:38,560 INFO [train.py:1198] (0/4) Epoch 30, batch 3250, loss[loss=0.2106, ctc_loss=0.137, cr_loss=0.368, over 17019.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1308, cr_loss=0.3478, over 3347939.83 frames. ], batch size: 51, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:32:45,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=22.5 2024-09-24 16:32:57,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=542476.6666666666, ans=0.125 2024-09-24 16:33:10,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=542523.3333333334, ans=0.0 2024-09-24 16:33:12,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=542523.3333333334, ans=0.0 2024-09-24 16:33:16,568 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.257e+02 1.346e+02 1.457e+02 3.617e+02, threshold=2.692e+02, percent-clipped=1.0 2024-09-24 16:33:55,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=542616.6666666666, ans=0.125 2024-09-24 16:33:59,505 INFO [train.py:1198] (0/4) Epoch 30, batch 3300, loss[loss=0.1658, ctc_loss=0.1053, cr_loss=0.3022, over 16743.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1308, cr_loss=0.3478, over 3348085.39 frames. ], batch size: 37, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:34:18,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=542710.0, ans=0.0 2024-09-24 16:34:39,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-09-24 16:35:17,844 INFO [train.py:1198] (0/4) Epoch 30, batch 3350, loss[loss=0.1725, ctc_loss=0.1105, cr_loss=0.3101, over 17246.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1305, cr_loss=0.3472, over 3358251.69 frames. ], batch size: 42, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:35:30,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=542896.6666666666, ans=0.125 2024-09-24 16:35:46,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=542943.3333333334, ans=0.125 2024-09-24 16:35:55,302 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.300e+02 1.382e+02 1.466e+02 2.312e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-24 16:35:55,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=542990.0, ans=0.0 2024-09-24 16:35:55,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=542990.0, ans=0.125 2024-09-24 16:36:25,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=543083.3333333334, ans=0.125 2024-09-24 16:36:25,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=543083.3333333334, ans=0.125 2024-09-24 16:36:36,356 INFO [train.py:1198] (0/4) Epoch 30, batch 3400, loss[loss=0.1727, ctc_loss=0.1126, cr_loss=0.3005, over 17039.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1302, cr_loss=0.347, over 3366794.99 frames. ], batch size: 39, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:37:42,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=543316.6666666666, ans=0.125 2024-09-24 16:37:50,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=543316.6666666666, ans=0.04949747468305833 2024-09-24 16:37:55,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=543363.3333333334, ans=0.0 2024-09-24 16:37:56,612 INFO [train.py:1198] (0/4) Epoch 30, batch 3450, loss[loss=0.2144, ctc_loss=0.1418, cr_loss=0.363, over 17215.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1298, cr_loss=0.346, over 3372876.20 frames. ], batch size: 55, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:38:00,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2024-09-24 16:38:35,837 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.271e+02 1.366e+02 1.468e+02 2.089e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-24 16:38:47,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=543503.3333333334, ans=0.125 2024-09-24 16:38:53,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=543503.3333333334, ans=0.125 2024-09-24 16:38:59,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=543550.0, ans=0.125 2024-09-24 16:39:13,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=543550.0, ans=0.0 2024-09-24 16:39:15,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=543596.6666666666, ans=0.2 2024-09-24 16:39:16,717 INFO [train.py:1198] (0/4) Epoch 30, batch 3500, loss[loss=0.262, ctc_loss=0.1779, cr_loss=0.4203, over 14996.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1301, cr_loss=0.3471, over 3365497.67 frames. ], batch size: 89, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:39:37,523 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 16:39:40,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=543643.3333333334, ans=0.0 2024-09-24 16:40:36,963 INFO [train.py:1198] (0/4) Epoch 30, batch 3550, loss[loss=0.167, ctc_loss=0.1045, cr_loss=0.3126, over 17200.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.13, cr_loss=0.3468, over 3370235.23 frames. ], batch size: 41, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:40:40,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=543830.0, ans=0.025 2024-09-24 16:40:57,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=543876.6666666666, ans=0.1 2024-09-24 16:41:14,201 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.268e+02 1.349e+02 1.434e+02 2.153e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 16:41:19,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=543923.3333333334, ans=15.0 2024-09-24 16:41:25,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=543970.0, ans=0.125 2024-09-24 16:41:26,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2024-09-24 16:41:39,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=544016.6666666666, ans=0.0 2024-09-24 16:41:54,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=544063.3333333334, ans=0.0 2024-09-24 16:41:55,341 INFO [train.py:1198] (0/4) Epoch 30, batch 3600, loss[loss=0.2097, ctc_loss=0.1389, cr_loss=0.354, over 14787.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1302, cr_loss=0.3465, over 3364029.77 frames. ], batch size: 89, lr: 3.95e-03, grad_scale: 32.0 2024-09-24 16:42:12,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=544110.0, ans=0.025 2024-09-24 16:42:12,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=544110.0, ans=0.125 2024-09-24 16:42:33,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2024-09-24 16:42:56,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=544250.0, ans=15.0 2024-09-24 16:43:07,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=544250.0, ans=0.07 2024-09-24 16:43:09,019 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=15.0 2024-09-24 16:43:13,174 INFO [train.py:1198] (0/4) Epoch 30, batch 3650, loss[loss=0.1945, ctc_loss=0.1279, cr_loss=0.3332, over 17069.00 frames. ], tot_loss[loss=0.1995, ctc_loss=0.1301, cr_loss=0.3467, over 3357252.95 frames. ], batch size: 46, lr: 3.95e-03, grad_scale: 16.0 2024-09-24 16:43:13,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=544296.6666666666, ans=0.125 2024-09-24 16:43:16,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=544296.6666666666, ans=0.0 2024-09-24 16:43:33,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=544343.3333333334, ans=10.0 2024-09-24 16:43:34,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-24 16:43:39,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-24 16:43:53,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=544390.0, ans=0.125 2024-09-24 16:43:54,825 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.034e+02 1.277e+02 1.338e+02 1.462e+02 2.145e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 16:44:01,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=544436.6666666666, ans=0.05 2024-09-24 16:44:33,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=544530.0, ans=0.1 2024-09-24 16:44:35,114 INFO [train.py:1198] (0/4) Epoch 30, batch 3700, loss[loss=0.1675, ctc_loss=0.108, cr_loss=0.2974, over 17123.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1305, cr_loss=0.3465, over 3346406.68 frames. ], batch size: 40, lr: 3.95e-03, grad_scale: 8.0 2024-09-24 16:45:05,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=544623.3333333334, ans=0.125 2024-09-24 16:45:27,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=544670.0, ans=0.1 2024-09-24 16:45:34,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=544670.0, ans=0.0 2024-09-24 16:45:53,706 INFO [train.py:1198] (0/4) Epoch 30, batch 3750, loss[loss=0.1773, ctc_loss=0.1118, cr_loss=0.3274, over 16302.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1297, cr_loss=0.3455, over 3351056.50 frames. ], batch size: 36, lr: 3.94e-03, grad_scale: 8.0 2024-09-24 16:46:35,276 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.265e+02 1.357e+02 1.446e+02 2.507e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 16:46:41,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=544903.3333333334, ans=0.125 2024-09-24 16:46:47,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-09-24 16:46:49,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-24 16:47:08,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2024-09-24 16:47:12,372 INFO [train.py:1198] (0/4) Epoch 30, batch 3800, loss[loss=0.2134, ctc_loss=0.1402, cr_loss=0.366, over 16935.00 frames. ], tot_loss[loss=0.1991, ctc_loss=0.1299, cr_loss=0.3459, over 3347055.11 frames. ], batch size: 58, lr: 3.94e-03, grad_scale: 8.0 2024-09-24 16:47:33,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=545043.3333333334, ans=0.125 2024-09-24 16:47:42,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=545043.3333333334, ans=0.0 2024-09-24 16:48:05,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=545136.6666666666, ans=0.125 2024-09-24 16:48:17,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=545183.3333333334, ans=0.125 2024-09-24 16:48:19,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=545183.3333333334, ans=0.02 2024-09-24 16:48:31,294 INFO [train.py:1198] (0/4) Epoch 30, batch 3850, loss[loss=0.215, ctc_loss=0.1445, cr_loss=0.3522, over 15200.00 frames. ], tot_loss[loss=0.2034, ctc_loss=0.1333, cr_loss=0.3506, over 3281820.70 frames. ], batch size: 89, lr: 3.94e-03, grad_scale: 8.0 2024-09-24 16:48:42,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=545230.0, ans=0.2 2024-09-24 16:48:57,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=545276.6666666666, ans=0.07 2024-09-24 16:49:11,170 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.380e+02 1.551e+02 1.671e+02 2.623e+02, threshold=3.102e+02, percent-clipped=0.0 2024-09-24 16:49:30,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=545416.6666666666, ans=0.125 2024-09-24 16:49:41,070 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-30.pt 2024-09-24 16:50:31,770 INFO [train.py:1198] (0/4) Epoch 31, batch 0, loss[loss=0.1946, ctc_loss=0.1249, cr_loss=0.3485, over 17214.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1249, cr_loss=0.3485, over 17214.00 frames. ], batch size: 47, lr: 3.88e-03, grad_scale: 16.0 2024-09-24 16:50:31,771 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 16:50:43,837 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.3704, 3.4878, 3.8521, 3.8320], device='cuda:0') 2024-09-24 16:50:45,539 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.5037, 5.3075, 4.6632, 5.2587], device='cuda:0') 2024-09-24 16:50:47,108 INFO [train.py:1230] (0/4) Epoch 31, validation: loss=0.03594, ctc_loss=0.03594, cr_loss=9.065e-15, over 944034.00 frames. 2024-09-24 16:50:47,108 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 16:50:55,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=545444.6666666666, ans=0.125 2024-09-24 16:51:12,433 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2024-09-24 16:51:27,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=545538.0, ans=0.125 2024-09-24 16:51:50,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=545584.6666666666, ans=0.02 2024-09-24 16:51:54,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=545631.3333333334, ans=0.125 2024-09-24 16:52:01,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=545631.3333333334, ans=0.07 2024-09-24 16:52:12,149 INFO [train.py:1198] (0/4) Epoch 31, batch 50, loss[loss=0.2375, ctc_loss=0.1628, cr_loss=0.3736, over 11530.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1314, cr_loss=0.348, over 751909.89 frames. ], batch size: 123, lr: 3.88e-03, grad_scale: 16.0 2024-09-24 16:52:31,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2024-09-24 16:52:34,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=545724.6666666666, ans=0.125 2024-09-24 16:52:34,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=545724.6666666666, ans=0.0 2024-09-24 16:52:39,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-24 16:52:41,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=545724.6666666666, ans=0.125 2024-09-24 16:52:43,342 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 16:52:44,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=545771.3333333334, ans=0.0 2024-09-24 16:52:55,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=545771.3333333334, ans=0.125 2024-09-24 16:53:01,963 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.275e+02 1.374e+02 1.534e+02 1.966e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-24 16:53:06,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2024-09-24 16:53:24,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=545864.6666666666, ans=0.07 2024-09-24 16:53:34,046 INFO [train.py:1198] (0/4) Epoch 31, batch 100, loss[loss=0.1855, ctc_loss=0.1205, cr_loss=0.3249, over 17248.00 frames. ], tot_loss[loss=0.2018, ctc_loss=0.1319, cr_loss=0.3497, over 1332470.04 frames. ], batch size: 50, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:54:25,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=546051.3333333334, ans=0.025 2024-09-24 16:54:31,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=546051.3333333334, ans=0.04949747468305833 2024-09-24 16:54:53,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=546098.0, ans=0.125 2024-09-24 16:54:56,692 INFO [train.py:1198] (0/4) Epoch 31, batch 150, loss[loss=0.1897, ctc_loss=0.1224, cr_loss=0.3365, over 17245.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1312, cr_loss=0.3481, over 1776652.32 frames. ], batch size: 44, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:55:00,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=546144.6666666666, ans=0.0 2024-09-24 16:55:18,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=546191.3333333334, ans=0.0 2024-09-24 16:55:41,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=546238.0, ans=0.0 2024-09-24 16:55:44,680 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.256e+02 1.336e+02 1.452e+02 1.976e+02, threshold=2.673e+02, percent-clipped=0.0 2024-09-24 16:56:16,760 INFO [train.py:1198] (0/4) Epoch 31, batch 200, loss[loss=0.1796, ctc_loss=0.117, cr_loss=0.3128, over 17208.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1313, cr_loss=0.3485, over 2134351.95 frames. ], batch size: 47, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:56:29,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=546378.0, ans=0.0 2024-09-24 16:56:44,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=546424.6666666666, ans=0.125 2024-09-24 16:56:44,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=546424.6666666666, ans=0.0 2024-09-24 16:56:50,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2024-09-24 16:56:53,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546471.3333333334, ans=0.1 2024-09-24 16:57:01,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=546471.3333333334, ans=0.125 2024-09-24 16:57:16,705 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2024-09-24 16:57:44,193 INFO [train.py:1198] (0/4) Epoch 31, batch 250, loss[loss=0.2425, ctc_loss=0.1611, cr_loss=0.4071, over 17007.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1304, cr_loss=0.3464, over 2407605.76 frames. ], batch size: 51, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:57:46,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-09-24 16:57:47,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=546611.3333333334, ans=0.2 2024-09-24 16:58:16,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=546704.6666666666, ans=0.0 2024-09-24 16:58:20,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=546704.6666666666, ans=0.07 2024-09-24 16:58:31,700 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.278e+02 1.364e+02 1.509e+02 2.036e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-24 16:58:38,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=546751.3333333334, ans=0.125 2024-09-24 16:58:46,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=546798.0, ans=0.1 2024-09-24 16:59:03,343 INFO [train.py:1198] (0/4) Epoch 31, batch 300, loss[loss=0.226, ctc_loss=0.1501, cr_loss=0.3795, over 17014.00 frames. ], tot_loss[loss=0.2003, ctc_loss=0.1309, cr_loss=0.3471, over 2600298.40 frames. ], batch size: 52, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 16:59:03,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=546844.6666666666, ans=0.0 2024-09-24 16:59:22,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=546891.3333333334, ans=0.125 2024-09-24 16:59:33,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=546891.3333333334, ans=0.025 2024-09-24 17:00:21,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=547031.3333333334, ans=0.09899494936611666 2024-09-24 17:00:26,014 INFO [train.py:1198] (0/4) Epoch 31, batch 350, loss[loss=0.1967, ctc_loss=0.1313, cr_loss=0.3273, over 17020.00 frames. ], tot_loss[loss=0.2013, ctc_loss=0.1316, cr_loss=0.3486, over 2776151.93 frames. ], batch size: 51, lr: 3.87e-03, grad_scale: 16.0 2024-09-24 17:00:37,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=547078.0, ans=0.05 2024-09-24 17:00:52,198 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:00:58,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=547171.3333333334, ans=0.0 2024-09-24 17:01:02,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-09-24 17:01:14,098 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.224e+02 1.318e+02 1.423e+02 1.795e+02, threshold=2.635e+02, percent-clipped=0.0 2024-09-24 17:01:25,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=547218.0, ans=0.0 2024-09-24 17:01:39,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2024-09-24 17:01:51,999 INFO [train.py:1198] (0/4) Epoch 31, batch 400, loss[loss=0.1854, ctc_loss=0.1182, cr_loss=0.3362, over 17169.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1305, cr_loss=0.3478, over 2913326.54 frames. ], batch size: 45, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:01:52,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=547311.3333333334, ans=0.0 2024-09-24 17:02:00,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=547311.3333333334, ans=0.125 2024-09-24 17:02:06,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=547358.0, ans=0.2 2024-09-24 17:03:00,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=547498.0, ans=0.0 2024-09-24 17:03:14,684 INFO [train.py:1198] (0/4) Epoch 31, batch 450, loss[loss=0.1884, ctc_loss=0.1242, cr_loss=0.3212, over 17082.00 frames. ], tot_loss[loss=0.2012, ctc_loss=0.1314, cr_loss=0.3492, over 3001980.07 frames. ], batch size: 46, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:03:19,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=547544.6666666666, ans=0.125 2024-09-24 17:03:28,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=547544.6666666666, ans=0.1 2024-09-24 17:03:58,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=547638.0, ans=0.04949747468305833 2024-09-24 17:04:02,909 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.314e+02 1.420e+02 1.522e+02 1.945e+02, threshold=2.840e+02, percent-clipped=0.0 2024-09-24 17:04:07,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=547684.6666666666, ans=0.2 2024-09-24 17:04:15,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=547684.6666666666, ans=0.125 2024-09-24 17:04:15,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=547684.6666666666, ans=0.125 2024-09-24 17:04:17,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=547731.3333333334, ans=0.125 2024-09-24 17:04:37,644 INFO [train.py:1198] (0/4) Epoch 31, batch 500, loss[loss=0.1557, ctc_loss=0.101, cr_loss=0.2736, over 17088.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.13, cr_loss=0.3465, over 3084245.14 frames. ], batch size: 43, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:05:08,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=547871.3333333334, ans=0.2 2024-09-24 17:05:25,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=547918.0, ans=0.0 2024-09-24 17:05:57,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=548011.3333333334, ans=0.2 2024-09-24 17:05:58,746 INFO [train.py:1198] (0/4) Epoch 31, batch 550, loss[loss=0.1925, ctc_loss=0.1249, cr_loss=0.3376, over 17262.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.346, over 3154441.80 frames. ], batch size: 44, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:06:22,094 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=22.5 2024-09-24 17:06:52,446 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.269e+02 1.346e+02 1.471e+02 2.079e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 17:07:03,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=548151.3333333334, ans=0.0 2024-09-24 17:07:09,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=548198.0, ans=0.125 2024-09-24 17:07:19,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=548198.0, ans=0.125 2024-09-24 17:07:23,921 INFO [train.py:1198] (0/4) Epoch 31, batch 600, loss[loss=0.2317, ctc_loss=0.1501, cr_loss=0.4081, over 17218.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1297, cr_loss=0.3461, over 3203134.57 frames. ], batch size: 50, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:07:35,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-24 17:07:50,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=548291.3333333334, ans=0.0 2024-09-24 17:07:57,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=22.5 2024-09-24 17:08:03,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=548338.0, ans=0.2 2024-09-24 17:08:30,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=548431.3333333334, ans=0.125 2024-09-24 17:08:32,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=548431.3333333334, ans=0.125 2024-09-24 17:08:36,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=548431.3333333334, ans=0.09899494936611666 2024-09-24 17:08:43,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=548431.3333333334, ans=0.125 2024-09-24 17:08:46,119 INFO [train.py:1198] (0/4) Epoch 31, batch 650, loss[loss=0.1758, ctc_loss=0.1128, cr_loss=0.3152, over 17270.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1296, cr_loss=0.3456, over 3245178.55 frames. ], batch size: 42, lr: 3.87e-03, grad_scale: 32.0 2024-09-24 17:09:27,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=548571.3333333334, ans=0.125 2024-09-24 17:09:30,708 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:09:33,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548571.3333333334, ans=0.1 2024-09-24 17:09:36,768 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.059e+02 1.261e+02 1.352e+02 1.465e+02 2.749e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-24 17:09:45,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548618.0, ans=0.1 2024-09-24 17:09:46,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=548618.0, ans=0.125 2024-09-24 17:09:49,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=548618.0, ans=0.1 2024-09-24 17:10:08,669 INFO [train.py:1198] (0/4) Epoch 31, batch 700, loss[loss=0.17, ctc_loss=0.1085, cr_loss=0.3075, over 17088.00 frames. ], tot_loss[loss=0.1985, ctc_loss=0.1295, cr_loss=0.3451, over 3268559.01 frames. ], batch size: 43, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:11:31,809 INFO [train.py:1198] (0/4) Epoch 31, batch 750, loss[loss=0.1771, ctc_loss=0.1154, cr_loss=0.3083, over 17206.00 frames. ], tot_loss[loss=0.1984, ctc_loss=0.1294, cr_loss=0.345, over 3296719.60 frames. ], batch size: 47, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:11:39,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548944.6666666666, ans=0.1 2024-09-24 17:11:55,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548991.3333333334, ans=0.1 2024-09-24 17:12:21,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=549084.6666666666, ans=0.0 2024-09-24 17:12:25,357 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.270e+02 1.339e+02 1.427e+02 2.075e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 17:12:37,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=549084.6666666666, ans=0.125 2024-09-24 17:12:57,350 INFO [train.py:1198] (0/4) Epoch 31, batch 800, loss[loss=0.2373, ctc_loss=0.1536, cr_loss=0.4183, over 17227.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1294, cr_loss=0.3443, over 3306542.55 frames. ], batch size: 55, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:12:59,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=549178.0, ans=0.125 2024-09-24 17:13:07,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549178.0, ans=0.1 2024-09-24 17:13:18,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549224.6666666666, ans=0.1 2024-09-24 17:13:52,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=549318.0, ans=0.1 2024-09-24 17:13:55,321 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:14:11,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-09-24 17:14:14,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=549364.6666666666, ans=0.0 2024-09-24 17:14:16,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=549411.3333333334, ans=0.125 2024-09-24 17:14:17,354 INFO [train.py:1198] (0/4) Epoch 31, batch 850, loss[loss=0.2227, ctc_loss=0.1486, cr_loss=0.3704, over 16939.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.129, cr_loss=0.3438, over 3320772.68 frames. ], batch size: 58, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:14:42,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=549458.0, ans=0.1 2024-09-24 17:14:49,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549458.0, ans=0.1 2024-09-24 17:15:09,435 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.257e+02 1.348e+02 1.447e+02 2.367e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 17:15:15,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=549551.3333333334, ans=0.0 2024-09-24 17:15:15,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=549551.3333333334, ans=0.025 2024-09-24 17:15:22,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=549598.0, ans=0.0 2024-09-24 17:15:27,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=549598.0, ans=0.125 2024-09-24 17:15:39,452 INFO [train.py:1198] (0/4) Epoch 31, batch 900, loss[loss=0.2106, ctc_loss=0.1351, cr_loss=0.3773, over 16998.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1298, cr_loss=0.3458, over 3333211.97 frames. ], batch size: 56, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:15:41,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=549644.6666666666, ans=0.07 2024-09-24 17:15:51,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=549644.6666666666, ans=0.125 2024-09-24 17:16:05,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=549691.3333333334, ans=0.125 2024-09-24 17:16:52,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=549831.3333333334, ans=0.025 2024-09-24 17:16:55,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549831.3333333334, ans=0.1 2024-09-24 17:16:59,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=549831.3333333334, ans=0.0 2024-09-24 17:17:02,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=549831.3333333334, ans=0.2 2024-09-24 17:17:05,204 INFO [train.py:1198] (0/4) Epoch 31, batch 950, loss[loss=0.2182, ctc_loss=0.1448, cr_loss=0.3673, over 16991.00 frames. ], tot_loss[loss=0.1985, ctc_loss=0.1292, cr_loss=0.3463, over 3350053.06 frames. ], batch size: 53, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:17:18,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=549878.0, ans=0.125 2024-09-24 17:17:24,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=549924.6666666666, ans=0.95 2024-09-24 17:17:33,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=549924.6666666666, ans=0.125 2024-09-24 17:17:57,028 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.281e+02 1.359e+02 1.441e+02 1.895e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-24 17:18:16,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2024-09-24 17:18:27,479 INFO [train.py:1198] (0/4) Epoch 31, batch 1000, loss[loss=0.1799, ctc_loss=0.1139, cr_loss=0.33, over 17074.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1294, cr_loss=0.3466, over 3362760.93 frames. ], batch size: 46, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:18:40,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=550111.3333333334, ans=0.0 2024-09-24 17:18:47,686 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-09-24 17:18:57,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-24 17:19:04,543 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=550204.6666666666, ans=0.125 2024-09-24 17:19:19,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=22.5 2024-09-24 17:19:47,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=550298.0, ans=0.0 2024-09-24 17:19:50,156 INFO [train.py:1198] (0/4) Epoch 31, batch 1050, loss[loss=0.1985, ctc_loss=0.1285, cr_loss=0.3501, over 17066.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1294, cr_loss=0.3469, over 3369216.76 frames. ], batch size: 46, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:20:14,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=550391.3333333334, ans=0.125 2024-09-24 17:20:33,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=550438.0, ans=0.125 2024-09-24 17:20:38,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550484.6666666666, ans=0.1 2024-09-24 17:20:38,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=550484.6666666666, ans=0.125 2024-09-24 17:20:39,886 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.258e+02 1.325e+02 1.421e+02 1.846e+02, threshold=2.651e+02, percent-clipped=0.0 2024-09-24 17:21:02,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=550531.3333333334, ans=0.0 2024-09-24 17:21:07,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=550531.3333333334, ans=0.0 2024-09-24 17:21:10,257 INFO [train.py:1198] (0/4) Epoch 31, batch 1100, loss[loss=0.2095, ctc_loss=0.1388, cr_loss=0.3533, over 16545.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.129, cr_loss=0.3463, over 3376682.98 frames. ], batch size: 66, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:21:18,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=550578.0, ans=0.0 2024-09-24 17:21:19,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.51 vs. limit=10.0 2024-09-24 17:21:31,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-09-24 17:21:33,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=550624.6666666666, ans=0.05 2024-09-24 17:22:09,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=550718.0, ans=0.0 2024-09-24 17:22:39,183 INFO [train.py:1198] (0/4) Epoch 31, batch 1150, loss[loss=0.1927, ctc_loss=0.124, cr_loss=0.3432, over 17138.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1304, cr_loss=0.3479, over 3360537.43 frames. ], batch size: 48, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:22:42,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=550811.3333333334, ans=0.125 2024-09-24 17:22:44,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550811.3333333334, ans=0.125 2024-09-24 17:22:46,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=550811.3333333334, ans=0.0 2024-09-24 17:23:04,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.21 vs. limit=15.0 2024-09-24 17:23:09,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=550904.6666666666, ans=0.025 2024-09-24 17:23:28,709 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.270e+02 1.354e+02 1.493e+02 2.055e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-24 17:23:41,118 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2024-09-24 17:23:44,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2024-09-24 17:23:50,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.25 vs. limit=15.0 2024-09-24 17:23:59,055 INFO [train.py:1198] (0/4) Epoch 31, batch 1200, loss[loss=0.1797, ctc_loss=0.1165, cr_loss=0.3156, over 17284.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1302, cr_loss=0.3483, over 3364588.67 frames. ], batch size: 46, lr: 3.86e-03, grad_scale: 32.0 2024-09-24 17:24:20,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=551091.3333333334, ans=0.125 2024-09-24 17:24:32,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=551138.0, ans=0.025 2024-09-24 17:25:01,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=551184.6666666666, ans=0.125 2024-09-24 17:25:20,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=551278.0, ans=0.2 2024-09-24 17:25:21,965 INFO [train.py:1198] (0/4) Epoch 31, batch 1250, loss[loss=0.1944, ctc_loss=0.1236, cr_loss=0.3539, over 17243.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1309, cr_loss=0.3487, over 3353520.43 frames. ], batch size: 44, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:25:35,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.86 vs. limit=10.0 2024-09-24 17:25:44,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=551324.6666666666, ans=10.0 2024-09-24 17:25:46,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=551324.6666666666, ans=0.0 2024-09-24 17:26:00,833 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:26:13,266 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.296e+02 1.398e+02 1.491e+02 2.236e+02, threshold=2.796e+02, percent-clipped=0.0 2024-09-24 17:26:15,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=551418.0, ans=0.2 2024-09-24 17:26:27,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=551464.6666666666, ans=0.0 2024-09-24 17:26:42,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=551464.6666666666, ans=0.125 2024-09-24 17:26:46,960 INFO [train.py:1198] (0/4) Epoch 31, batch 1300, loss[loss=0.2097, ctc_loss=0.1363, cr_loss=0.3673, over 17153.00 frames. ], tot_loss[loss=0.2006, ctc_loss=0.1308, cr_loss=0.349, over 3358932.26 frames. ], batch size: 45, lr: 3.86e-03, grad_scale: 16.0 2024-09-24 17:27:35,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2024-09-24 17:27:36,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=551651.3333333334, ans=0.125 2024-09-24 17:28:00,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=551698.0, ans=0.125 2024-09-24 17:28:09,626 INFO [train.py:1198] (0/4) Epoch 31, batch 1350, loss[loss=0.1958, ctc_loss=0.1252, cr_loss=0.3529, over 17083.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1304, cr_loss=0.3487, over 3361855.61 frames. ], batch size: 49, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:28:09,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=551744.6666666666, ans=0.125 2024-09-24 17:28:10,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-09-24 17:28:51,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=551838.0, ans=0.04949747468305833 2024-09-24 17:29:00,719 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.265e+02 1.360e+02 1.447e+02 1.964e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-24 17:29:12,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=551931.3333333334, ans=0.125 2024-09-24 17:29:15,890 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2024-09-24 17:29:26,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=12.0 2024-09-24 17:29:32,592 INFO [train.py:1198] (0/4) Epoch 31, batch 1400, loss[loss=0.1612, ctc_loss=0.1026, cr_loss=0.2926, over 17046.00 frames. ], tot_loss[loss=0.2001, ctc_loss=0.1304, cr_loss=0.3483, over 3354544.89 frames. ], batch size: 39, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:29:39,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=551978.0, ans=15.0 2024-09-24 17:29:47,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=552024.6666666666, ans=0.125 2024-09-24 17:29:50,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=552024.6666666666, ans=0.0 2024-09-24 17:29:54,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=12.0 2024-09-24 17:30:24,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=552118.0, ans=0.125 2024-09-24 17:30:25,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=552118.0, ans=0.125 2024-09-24 17:30:27,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=552118.0, ans=0.125 2024-09-24 17:30:35,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=552164.6666666666, ans=0.035 2024-09-24 17:30:38,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=552164.6666666666, ans=0.125 2024-09-24 17:30:41,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2024-09-24 17:30:45,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552164.6666666666, ans=0.1 2024-09-24 17:30:48,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2024-09-24 17:30:49,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=552164.6666666666, ans=0.0 2024-09-24 17:30:52,825 INFO [train.py:1198] (0/4) Epoch 31, batch 1450, loss[loss=0.2063, ctc_loss=0.1336, cr_loss=0.3634, over 17153.00 frames. ], tot_loss[loss=0.2008, ctc_loss=0.1309, cr_loss=0.3492, over 3352181.47 frames. ], batch size: 45, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:30:54,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552211.3333333334, ans=0.1 2024-09-24 17:30:56,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=22.5 2024-09-24 17:31:04,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=552211.3333333334, ans=0.125 2024-09-24 17:31:12,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=552258.0, ans=0.025 2024-09-24 17:31:27,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=552304.6666666666, ans=0.125 2024-09-24 17:31:41,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=552304.6666666666, ans=0.025 2024-09-24 17:31:48,822 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.229e+02 1.325e+02 1.469e+02 3.189e+02, threshold=2.649e+02, percent-clipped=1.0 2024-09-24 17:31:57,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=552351.3333333334, ans=0.0 2024-09-24 17:32:19,916 INFO [train.py:1198] (0/4) Epoch 31, batch 1500, loss[loss=0.1762, ctc_loss=0.1109, cr_loss=0.3265, over 17097.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1303, cr_loss=0.348, over 3352956.34 frames. ], batch size: 43, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:32:20,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=552444.6666666666, ans=0.025 2024-09-24 17:33:28,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552631.3333333334, ans=0.1 2024-09-24 17:33:39,658 INFO [train.py:1198] (0/4) Epoch 31, batch 1550, loss[loss=0.1569, ctc_loss=0.09667, cr_loss=0.3012, over 17179.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.3464, over 3359770.52 frames. ], batch size: 41, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:33:54,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552724.6666666666, ans=0.1 2024-09-24 17:34:02,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=552724.6666666666, ans=0.125 2024-09-24 17:34:22,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=552771.3333333334, ans=0.09899494936611666 2024-09-24 17:34:22,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=552771.3333333334, ans=0.1 2024-09-24 17:34:33,295 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.274e+02 1.344e+02 1.453e+02 2.165e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-24 17:34:46,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=552864.6666666666, ans=0.07 2024-09-24 17:35:02,239 INFO [train.py:1198] (0/4) Epoch 31, batch 1600, loss[loss=0.2091, ctc_loss=0.135, cr_loss=0.3708, over 17077.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1302, cr_loss=0.3476, over 3358601.19 frames. ], batch size: 46, lr: 3.85e-03, grad_scale: 32.0 2024-09-24 17:35:02,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=552911.3333333334, ans=0.125 2024-09-24 17:35:09,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-09-24 17:35:58,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=553051.3333333334, ans=0.2 2024-09-24 17:36:03,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=553051.3333333334, ans=0.125 2024-09-24 17:36:05,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=553098.0, ans=0.125 2024-09-24 17:36:25,120 INFO [train.py:1198] (0/4) Epoch 31, batch 1650, loss[loss=0.183, ctc_loss=0.1162, cr_loss=0.3341, over 17347.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1301, cr_loss=0.3481, over 3362920.67 frames. ], batch size: 48, lr: 3.85e-03, grad_scale: 32.0 2024-09-24 17:36:39,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2024-09-24 17:37:21,452 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.255e+02 1.333e+02 1.430e+02 2.111e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-24 17:37:24,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=553284.6666666666, ans=0.125 2024-09-24 17:37:24,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=553284.6666666666, ans=0.025 2024-09-24 17:37:44,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=553331.3333333334, ans=0.025 2024-09-24 17:37:50,372 INFO [train.py:1198] (0/4) Epoch 31, batch 1700, loss[loss=0.2183, ctc_loss=0.1475, cr_loss=0.3542, over 17236.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1303, cr_loss=0.3485, over 3359474.23 frames. ], batch size: 50, lr: 3.85e-03, grad_scale: 32.0 2024-09-24 17:37:54,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.35 vs. limit=15.0 2024-09-24 17:37:57,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=553378.0, ans=0.2 2024-09-24 17:38:20,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=553424.6666666666, ans=0.2 2024-09-24 17:38:25,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=553471.3333333334, ans=0.125 2024-09-24 17:38:39,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=553518.0, ans=0.1 2024-09-24 17:38:40,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=553518.0, ans=0.125 2024-09-24 17:38:50,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=553518.0, ans=0.125 2024-09-24 17:39:13,900 INFO [train.py:1198] (0/4) Epoch 31, batch 1750, loss[loss=0.1928, ctc_loss=0.1257, cr_loss=0.3353, over 17158.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1304, cr_loss=0.3479, over 3346495.47 frames. ], batch size: 45, lr: 3.85e-03, grad_scale: 32.0 2024-09-24 17:39:28,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=553658.0, ans=0.125 2024-09-24 17:39:34,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=553658.0, ans=0.125 2024-09-24 17:39:44,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=553704.6666666666, ans=0.125 2024-09-24 17:39:46,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=553704.6666666666, ans=0.1 2024-09-24 17:39:46,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-09-24 17:39:54,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=553704.6666666666, ans=0.0 2024-09-24 17:39:57,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=553704.6666666666, ans=0.0 2024-09-24 17:40:04,937 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.305e+02 1.399e+02 1.556e+02 3.053e+02, threshold=2.798e+02, percent-clipped=1.0 2024-09-24 17:40:08,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=553751.3333333334, ans=0.125 2024-09-24 17:40:29,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=553798.0, ans=0.0 2024-09-24 17:40:30,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=553798.0, ans=0.0 2024-09-24 17:40:33,711 INFO [train.py:1198] (0/4) Epoch 31, batch 1800, loss[loss=0.203, ctc_loss=0.1308, cr_loss=0.3611, over 17052.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1301, cr_loss=0.3469, over 3331478.80 frames. ], batch size: 52, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:40:35,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=553844.6666666666, ans=0.1 2024-09-24 17:40:46,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=553844.6666666666, ans=0.125 2024-09-24 17:41:19,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2024-09-24 17:41:20,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=553938.0, ans=0.0 2024-09-24 17:41:50,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=554031.3333333334, ans=0.125 2024-09-24 17:42:01,015 INFO [train.py:1198] (0/4) Epoch 31, batch 1850, loss[loss=0.2286, ctc_loss=0.1515, cr_loss=0.3854, over 16709.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1298, cr_loss=0.3461, over 3339249.57 frames. ], batch size: 61, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:42:11,092 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-24 17:42:12,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=554078.0, ans=6.0 2024-09-24 17:42:41,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=554171.3333333334, ans=0.0 2024-09-24 17:42:53,413 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.087e+02 1.284e+02 1.348e+02 1.466e+02 2.241e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 17:42:58,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=554218.0, ans=0.0 2024-09-24 17:43:20,558 INFO [train.py:1198] (0/4) Epoch 31, batch 1900, loss[loss=0.1996, ctc_loss=0.1295, cr_loss=0.3508, over 16516.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1302, cr_loss=0.3477, over 3346524.61 frames. ], batch size: 66, lr: 3.85e-03, grad_scale: 16.0 2024-09-24 17:43:24,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=554311.3333333334, ans=0.125 2024-09-24 17:43:51,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=554404.6666666666, ans=0.125 2024-09-24 17:44:42,949 INFO [train.py:1198] (0/4) Epoch 31, batch 1950, loss[loss=0.1708, ctc_loss=0.1064, cr_loss=0.3222, over 16295.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1299, cr_loss=0.3473, over 3357152.70 frames. ], batch size: 36, lr: 3.84e-03, grad_scale: 16.0 2024-09-24 17:45:09,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=554591.3333333334, ans=0.0 2024-09-24 17:45:35,728 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.290e+02 1.379e+02 1.473e+02 2.110e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-24 17:45:39,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=554684.6666666666, ans=0.2 2024-09-24 17:45:40,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=554684.6666666666, ans=0.125 2024-09-24 17:45:46,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2024-09-24 17:45:55,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.53 vs. limit=15.0 2024-09-24 17:45:56,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=554731.3333333334, ans=0.125 2024-09-24 17:46:01,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=554731.3333333334, ans=0.0 2024-09-24 17:46:01,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=554731.3333333334, ans=0.025 2024-09-24 17:46:05,600 INFO [train.py:1198] (0/4) Epoch 31, batch 2000, loss[loss=0.2233, ctc_loss=0.1466, cr_loss=0.3833, over 17013.00 frames. ], tot_loss[loss=0.1991, ctc_loss=0.1298, cr_loss=0.3466, over 3355955.73 frames. ], batch size: 56, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:46:13,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=554778.0, ans=0.0 2024-09-24 17:46:14,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2024-09-24 17:46:17,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=554778.0, ans=0.0 2024-09-24 17:46:17,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2024-09-24 17:46:37,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=554824.6666666666, ans=0.125 2024-09-24 17:46:42,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=554871.3333333334, ans=0.125 2024-09-24 17:46:45,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=554871.3333333334, ans=0.2 2024-09-24 17:46:45,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=554871.3333333334, ans=0.2 2024-09-24 17:47:28,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=554964.6666666666, ans=0.125 2024-09-24 17:47:31,372 INFO [train.py:1198] (0/4) Epoch 31, batch 2050, loss[loss=0.1799, ctc_loss=0.1172, cr_loss=0.3137, over 17154.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1298, cr_loss=0.3463, over 3354935.96 frames. ], batch size: 45, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:48:24,150 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.242e+02 1.319e+02 1.408e+02 1.614e+02, threshold=2.639e+02, percent-clipped=0.0 2024-09-24 17:48:33,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-09-24 17:48:51,333 INFO [train.py:1198] (0/4) Epoch 31, batch 2100, loss[loss=0.2137, ctc_loss=0.1407, cr_loss=0.3651, over 17148.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1292, cr_loss=0.3454, over 3358037.38 frames. ], batch size: 48, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:49:26,703 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:49:54,302 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-09-24 17:50:14,775 INFO [train.py:1198] (0/4) Epoch 31, batch 2150, loss[loss=0.2155, ctc_loss=0.1409, cr_loss=0.3733, over 17336.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1296, cr_loss=0.3463, over 3366834.10 frames. ], batch size: 48, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:50:18,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-09-24 17:50:19,931 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:50:43,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=22.5 2024-09-24 17:50:49,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=555571.3333333334, ans=0.2 2024-09-24 17:50:54,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=555571.3333333334, ans=0.125 2024-09-24 17:50:59,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555571.3333333334, ans=0.1 2024-09-24 17:51:01,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2024-09-24 17:51:10,125 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.047e+02 1.261e+02 1.344e+02 1.456e+02 2.277e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-24 17:51:25,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=555664.6666666666, ans=0.05 2024-09-24 17:51:34,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2024-09-24 17:51:39,714 INFO [train.py:1198] (0/4) Epoch 31, batch 2200, loss[loss=0.2285, ctc_loss=0.1513, cr_loss=0.3857, over 17222.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.1298, cr_loss=0.3473, over 3365636.07 frames. ], batch size: 47, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:52:15,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=555804.6666666666, ans=0.125 2024-09-24 17:52:22,326 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 17:52:45,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-09-24 17:53:02,095 INFO [train.py:1198] (0/4) Epoch 31, batch 2250, loss[loss=0.2187, ctc_loss=0.1432, cr_loss=0.3772, over 17042.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1295, cr_loss=0.3465, over 3370520.95 frames. ], batch size: 52, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:53:18,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=555991.3333333334, ans=0.125 2024-09-24 17:53:42,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2024-09-24 17:53:57,561 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.242e+02 1.333e+02 1.428e+02 2.130e+02, threshold=2.666e+02, percent-clipped=0.0 2024-09-24 17:54:04,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2024-09-24 17:54:13,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=556131.3333333334, ans=0.0 2024-09-24 17:54:14,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=22.5 2024-09-24 17:54:17,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=556131.3333333334, ans=0.025 2024-09-24 17:54:24,673 INFO [train.py:1198] (0/4) Epoch 31, batch 2300, loss[loss=0.1833, ctc_loss=0.1196, cr_loss=0.3186, over 17317.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1291, cr_loss=0.3459, over 3365164.85 frames. ], batch size: 49, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:54:56,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-09-24 17:55:11,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556318.0, ans=0.1 2024-09-24 17:55:33,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=556364.6666666666, ans=0.125 2024-09-24 17:55:47,364 INFO [train.py:1198] (0/4) Epoch 31, batch 2350, loss[loss=0.1809, ctc_loss=0.1157, cr_loss=0.3258, over 17084.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1285, cr_loss=0.3453, over 3361331.34 frames. ], batch size: 43, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:56:00,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=556411.3333333334, ans=0.125 2024-09-24 17:56:43,027 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.265e+02 1.349e+02 1.461e+02 1.759e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-24 17:57:06,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=556598.0, ans=0.5 2024-09-24 17:57:12,807 INFO [train.py:1198] (0/4) Epoch 31, batch 2400, loss[loss=0.1819, ctc_loss=0.1181, cr_loss=0.3191, over 16973.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1284, cr_loss=0.3449, over 3368343.59 frames. ], batch size: 42, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:57:29,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2024-09-24 17:57:32,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556691.3333333334, ans=0.1 2024-09-24 17:58:01,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=556784.6666666666, ans=0.125 2024-09-24 17:58:32,915 INFO [train.py:1198] (0/4) Epoch 31, batch 2450, loss[loss=0.1801, ctc_loss=0.1138, cr_loss=0.3313, over 17265.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1285, cr_loss=0.3455, over 3370633.48 frames. ], batch size: 42, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 17:58:36,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=556878.0, ans=0.0 2024-09-24 17:59:19,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=556971.3333333334, ans=0.2 2024-09-24 17:59:28,402 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.235e+02 1.328e+02 1.443e+02 2.383e+02, threshold=2.655e+02, percent-clipped=0.0 2024-09-24 17:59:42,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=557064.6666666666, ans=0.2 2024-09-24 17:59:46,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2024-09-24 17:59:55,317 INFO [train.py:1198] (0/4) Epoch 31, batch 2500, loss[loss=0.1816, ctc_loss=0.1149, cr_loss=0.3339, over 17208.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.128, cr_loss=0.3444, over 3377860.04 frames. ], batch size: 41, lr: 3.84e-03, grad_scale: 32.0 2024-09-24 18:00:16,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=557158.0, ans=0.125 2024-09-24 18:00:16,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=557158.0, ans=0.0 2024-09-24 18:00:18,294 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2024-09-24 18:00:43,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=557204.6666666666, ans=0.2 2024-09-24 18:00:43,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-09-24 18:00:45,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=557251.3333333334, ans=0.125 2024-09-24 18:01:16,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=557298.0, ans=15.0 2024-09-24 18:01:18,464 INFO [train.py:1198] (0/4) Epoch 31, batch 2550, loss[loss=0.2413, ctc_loss=0.1604, cr_loss=0.4044, over 16506.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1278, cr_loss=0.3444, over 3378275.69 frames. ], batch size: 66, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:01:32,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=557344.6666666666, ans=0.0 2024-09-24 18:02:02,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=557438.0, ans=0.0 2024-09-24 18:02:15,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=557484.6666666666, ans=0.125 2024-09-24 18:02:16,589 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.233e+02 1.314e+02 1.407e+02 1.652e+02, threshold=2.628e+02, percent-clipped=0.0 2024-09-24 18:02:23,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2024-09-24 18:02:43,700 INFO [train.py:1198] (0/4) Epoch 31, batch 2600, loss[loss=0.2324, ctc_loss=0.1519, cr_loss=0.4022, over 16880.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1277, cr_loss=0.3436, over 3370678.17 frames. ], batch size: 58, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:02:44,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.44 vs. limit=10.0 2024-09-24 18:03:06,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=557624.6666666666, ans=0.125 2024-09-24 18:03:21,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=557671.3333333334, ans=0.05 2024-09-24 18:03:42,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-24 18:03:43,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=557718.0, ans=0.125 2024-09-24 18:03:45,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=557718.0, ans=0.1 2024-09-24 18:04:06,717 INFO [train.py:1198] (0/4) Epoch 31, batch 2650, loss[loss=0.1934, ctc_loss=0.1256, cr_loss=0.3391, over 17068.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1283, cr_loss=0.3438, over 3364993.16 frames. ], batch size: 46, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:04:24,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=557858.0, ans=0.07 2024-09-24 18:04:43,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=557904.6666666666, ans=0.1 2024-09-24 18:04:46,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=557904.6666666666, ans=0.04949747468305833 2024-09-24 18:04:50,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=557904.6666666666, ans=0.1 2024-09-24 18:04:53,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=557951.3333333334, ans=0.04949747468305833 2024-09-24 18:04:54,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=22.5 2024-09-24 18:04:59,304 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.276e+02 1.391e+02 1.491e+02 3.639e+02, threshold=2.781e+02, percent-clipped=1.0 2024-09-24 18:05:26,500 INFO [train.py:1198] (0/4) Epoch 31, batch 2700, loss[loss=0.184, ctc_loss=0.116, cr_loss=0.3403, over 17095.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1287, cr_loss=0.3447, over 3368892.31 frames. ], batch size: 43, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:05:26,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=558044.6666666666, ans=0.125 2024-09-24 18:05:28,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=558044.6666666666, ans=0.0 2024-09-24 18:05:51,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=558091.3333333334, ans=0.125 2024-09-24 18:05:55,283 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2024-09-24 18:06:19,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=558184.6666666666, ans=0.1 2024-09-24 18:06:27,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=558184.6666666666, ans=0.125 2024-09-24 18:06:37,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=558231.3333333334, ans=0.125 2024-09-24 18:06:37,898 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:06:54,370 INFO [train.py:1198] (0/4) Epoch 31, batch 2750, loss[loss=0.203, ctc_loss=0.1306, cr_loss=0.3617, over 17299.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1282, cr_loss=0.3442, over 3376939.47 frames. ], batch size: 51, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:07:03,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-24 18:07:17,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=558324.6666666666, ans=0.2 2024-09-24 18:07:18,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=558324.6666666666, ans=0.2 2024-09-24 18:07:44,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558418.0, ans=0.1 2024-09-24 18:07:48,591 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.267e+02 1.363e+02 1.488e+02 1.840e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-24 18:07:52,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=558418.0, ans=0.0 2024-09-24 18:08:08,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=558464.6666666666, ans=0.025 2024-09-24 18:08:14,312 INFO [train.py:1198] (0/4) Epoch 31, batch 2800, loss[loss=0.2351, ctc_loss=0.153, cr_loss=0.4101, over 17315.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1284, cr_loss=0.3444, over 3380117.20 frames. ], batch size: 49, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:08:24,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=558511.3333333334, ans=0.2 2024-09-24 18:08:35,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558558.0, ans=0.1 2024-09-24 18:09:01,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=558604.6666666666, ans=0.0 2024-09-24 18:09:11,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=558651.3333333334, ans=0.125 2024-09-24 18:09:27,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=558698.0, ans=0.0 2024-09-24 18:09:36,684 INFO [train.py:1198] (0/4) Epoch 31, batch 2850, loss[loss=0.1626, ctc_loss=0.1059, cr_loss=0.2833, over 17035.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1283, cr_loss=0.3446, over 3380351.85 frames. ], batch size: 39, lr: 3.83e-03, grad_scale: 32.0 2024-09-24 18:09:42,628 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2024-09-24 18:09:53,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=558791.3333333334, ans=0.2 2024-09-24 18:09:56,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=558791.3333333334, ans=0.025 2024-09-24 18:10:17,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-09-24 18:10:35,375 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.299e+02 1.361e+02 1.459e+02 2.038e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-24 18:10:46,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=558931.3333333334, ans=0.125 2024-09-24 18:10:53,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558931.3333333334, ans=0.1 2024-09-24 18:10:59,642 INFO [train.py:1198] (0/4) Epoch 31, batch 2900, loss[loss=0.1907, ctc_loss=0.1227, cr_loss=0.3396, over 17289.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1285, cr_loss=0.3442, over 3357165.40 frames. ], batch size: 49, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:11:58,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=559118.0, ans=0.0 2024-09-24 18:11:59,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=559118.0, ans=0.2 2024-09-24 18:12:07,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.70 vs. limit=5.0 2024-09-24 18:12:25,455 INFO [train.py:1198] (0/4) Epoch 31, batch 2950, loss[loss=0.1743, ctc_loss=0.1103, cr_loss=0.3201, over 16716.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1281, cr_loss=0.344, over 3362359.18 frames. ], batch size: 37, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:13:04,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=559304.6666666666, ans=0.125 2024-09-24 18:13:17,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=559351.3333333334, ans=0.025 2024-09-24 18:13:21,739 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.279e+02 1.362e+02 1.485e+02 2.605e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-24 18:13:24,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559351.3333333334, ans=0.1 2024-09-24 18:13:37,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=559398.0, ans=0.025 2024-09-24 18:13:37,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=559398.0, ans=0.125 2024-09-24 18:13:45,365 INFO [train.py:1198] (0/4) Epoch 31, batch 3000, loss[loss=0.234, ctc_loss=0.1584, cr_loss=0.3781, over 15058.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1297, cr_loss=0.3467, over 3360513.73 frames. ], batch size: 89, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:13:45,366 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 18:14:00,848 INFO [train.py:1230] (0/4) Epoch 31, validation: loss=0.03667, ctc_loss=0.03667, cr_loss=9.013e-15, over 944034.00 frames. 2024-09-24 18:14:00,848 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 18:14:07,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=559444.6666666666, ans=0.0 2024-09-24 18:14:25,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2024-09-24 18:14:32,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=559538.0, ans=0.0 2024-09-24 18:14:35,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=559538.0, ans=0.0 2024-09-24 18:15:19,083 INFO [train.py:1198] (0/4) Epoch 31, batch 3050, loss[loss=0.2407, ctc_loss=0.1578, cr_loss=0.4144, over 16548.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.1299, cr_loss=0.3464, over 3343304.23 frames. ], batch size: 66, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:15:51,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2024-09-24 18:15:52,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=559771.3333333334, ans=0.125 2024-09-24 18:16:13,651 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.275e+02 1.357e+02 1.505e+02 2.128e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-24 18:16:23,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=559864.6666666666, ans=0.025 2024-09-24 18:16:26,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=559864.6666666666, ans=0.2 2024-09-24 18:16:29,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=559864.6666666666, ans=0.025 2024-09-24 18:16:31,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=559864.6666666666, ans=0.125 2024-09-24 18:16:32,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=559864.6666666666, ans=0.125 2024-09-24 18:16:34,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=559864.6666666666, ans=0.125 2024-09-24 18:16:37,087 INFO [train.py:1198] (0/4) Epoch 31, batch 3100, loss[loss=0.1738, ctc_loss=0.1098, cr_loss=0.3197, over 16836.00 frames. ], tot_loss[loss=0.2002, ctc_loss=0.1305, cr_loss=0.3482, over 3345814.39 frames. ], batch size: 58, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:16:51,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-09-24 18:17:05,668 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-120000.pt 2024-09-24 18:17:18,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=560004.6666666666, ans=0.2 2024-09-24 18:17:29,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=560051.3333333334, ans=0.025 2024-09-24 18:18:00,010 INFO [train.py:1198] (0/4) Epoch 31, batch 3150, loss[loss=0.2114, ctc_loss=0.1403, cr_loss=0.3554, over 16802.00 frames. ], tot_loss[loss=0.1996, ctc_loss=0.1301, cr_loss=0.3472, over 3344576.53 frames. ], batch size: 61, lr: 3.83e-03, grad_scale: 16.0 2024-09-24 18:18:17,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560191.3333333334, ans=0.1 2024-09-24 18:18:22,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=560191.3333333334, ans=0.0 2024-09-24 18:18:24,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-09-24 18:18:33,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=560238.0, ans=0.125 2024-09-24 18:18:41,349 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:18:44,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=560238.0, ans=0.0 2024-09-24 18:18:55,087 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.279e+02 1.355e+02 1.423e+02 2.365e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 18:19:00,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=12.0 2024-09-24 18:19:18,858 INFO [train.py:1198] (0/4) Epoch 31, batch 3200, loss[loss=0.2059, ctc_loss=0.1359, cr_loss=0.3503, over 16889.00 frames. ], tot_loss[loss=0.1993, ctc_loss=0.1299, cr_loss=0.3469, over 3338991.13 frames. ], batch size: 58, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:19:19,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=560378.0, ans=0.125 2024-09-24 18:19:34,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=560424.6666666666, ans=0.0 2024-09-24 18:20:06,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560518.0, ans=0.1 2024-09-24 18:20:14,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560518.0, ans=0.1 2024-09-24 18:20:41,361 INFO [train.py:1198] (0/4) Epoch 31, batch 3250, loss[loss=0.2426, ctc_loss=0.1623, cr_loss=0.4016, over 15153.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.1288, cr_loss=0.3443, over 3345449.42 frames. ], batch size: 89, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:20:46,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=560611.3333333334, ans=10.0 2024-09-24 18:21:37,470 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.237e+02 1.355e+02 1.495e+02 1.938e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-24 18:21:40,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=560751.3333333334, ans=0.125 2024-09-24 18:21:41,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-09-24 18:21:42,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=560798.0, ans=0.2 2024-09-24 18:21:59,317 INFO [train.py:1198] (0/4) Epoch 31, batch 3300, loss[loss=0.1865, ctc_loss=0.1204, cr_loss=0.3305, over 17001.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1282, cr_loss=0.3445, over 3357711.96 frames. ], batch size: 44, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:23:07,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=561031.3333333334, ans=0.125 2024-09-24 18:23:08,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=561031.3333333334, ans=10.0 2024-09-24 18:23:18,164 INFO [train.py:1198] (0/4) Epoch 31, batch 3350, loss[loss=0.1912, ctc_loss=0.1234, cr_loss=0.339, over 17080.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1285, cr_loss=0.3447, over 3354213.04 frames. ], batch size: 46, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:23:29,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2024-09-24 18:23:48,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=561171.3333333334, ans=10.0 2024-09-24 18:23:59,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=561171.3333333334, ans=0.0 2024-09-24 18:24:14,289 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.283e+02 1.367e+02 1.477e+02 4.748e+02, threshold=2.734e+02, percent-clipped=1.0 2024-09-24 18:24:15,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=561218.0, ans=0.125 2024-09-24 18:24:36,278 INFO [train.py:1198] (0/4) Epoch 31, batch 3400, loss[loss=0.2192, ctc_loss=0.146, cr_loss=0.3663, over 17311.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1282, cr_loss=0.3444, over 3364821.44 frames. ], batch size: 51, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:24:36,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=561311.3333333334, ans=0.09899494936611666 2024-09-24 18:24:38,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561311.3333333334, ans=0.1 2024-09-24 18:24:39,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=561311.3333333334, ans=0.125 2024-09-24 18:24:41,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561311.3333333334, ans=0.1 2024-09-24 18:24:56,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2024-09-24 18:25:55,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2024-09-24 18:25:56,291 INFO [train.py:1198] (0/4) Epoch 31, batch 3450, loss[loss=0.1948, ctc_loss=0.1264, cr_loss=0.342, over 17033.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.3461, over 3354083.99 frames. ], batch size: 44, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:26:07,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=561544.6666666666, ans=0.0 2024-09-24 18:26:21,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=561591.3333333334, ans=0.025 2024-09-24 18:26:34,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=561638.0, ans=0.05 2024-09-24 18:26:37,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=561638.0, ans=0.125 2024-09-24 18:26:37,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=561638.0, ans=0.125 2024-09-24 18:26:49,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=561684.6666666666, ans=0.0 2024-09-24 18:26:51,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=561684.6666666666, ans=0.0 2024-09-24 18:26:52,252 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.286e+02 1.363e+02 1.449e+02 2.372e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-24 18:27:14,200 INFO [train.py:1198] (0/4) Epoch 31, batch 3500, loss[loss=0.2148, ctc_loss=0.1423, cr_loss=0.3622, over 15768.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1301, cr_loss=0.3478, over 3346534.66 frames. ], batch size: 74, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:27:14,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=561778.0, ans=0.125 2024-09-24 18:27:27,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=561778.0, ans=0.125 2024-09-24 18:27:48,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=8.0 2024-09-24 18:28:06,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=561918.0, ans=0.025 2024-09-24 18:28:34,275 INFO [train.py:1198] (0/4) Epoch 31, batch 3550, loss[loss=0.2009, ctc_loss=0.1317, cr_loss=0.3461, over 17293.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1289, cr_loss=0.3455, over 3357666.24 frames. ], batch size: 49, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:29:05,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562104.6666666666, ans=0.125 2024-09-24 18:29:23,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-09-24 18:29:25,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=562151.3333333334, ans=0.5 2024-09-24 18:29:34,935 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.302e+02 1.378e+02 1.478e+02 2.528e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-24 18:29:39,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=562198.0, ans=0.125 2024-09-24 18:29:56,785 INFO [train.py:1198] (0/4) Epoch 31, batch 3600, loss[loss=0.1758, ctc_loss=0.1175, cr_loss=0.2915, over 17096.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.1302, cr_loss=0.3484, over 3364584.84 frames. ], batch size: 49, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:29:58,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562244.6666666666, ans=0.1 2024-09-24 18:30:49,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=562384.6666666666, ans=0.125 2024-09-24 18:31:14,556 INFO [train.py:1198] (0/4) Epoch 31, batch 3650, loss[loss=0.2043, ctc_loss=0.134, cr_loss=0.3515, over 17034.00 frames. ], tot_loss[loss=0.1999, ctc_loss=0.1302, cr_loss=0.3482, over 3355386.57 frames. ], batch size: 52, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:31:44,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562571.3333333334, ans=0.1 2024-09-24 18:32:11,875 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.260e+02 1.374e+02 1.513e+02 2.191e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-24 18:32:12,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=562618.0, ans=0.125 2024-09-24 18:32:15,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=562618.0, ans=0.125 2024-09-24 18:32:33,922 INFO [train.py:1198] (0/4) Epoch 31, batch 3700, loss[loss=0.1978, ctc_loss=0.1291, cr_loss=0.3439, over 17150.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.1309, cr_loss=0.3488, over 3346821.22 frames. ], batch size: 48, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:33:02,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=562758.0, ans=0.2 2024-09-24 18:33:16,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=562804.6666666666, ans=0.125 2024-09-24 18:33:24,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=562851.3333333334, ans=0.0 2024-09-24 18:33:52,156 INFO [train.py:1198] (0/4) Epoch 31, batch 3750, loss[loss=0.1932, ctc_loss=0.1244, cr_loss=0.3438, over 16301.00 frames. ], tot_loss[loss=0.2, ctc_loss=0.1304, cr_loss=0.348, over 3354051.46 frames. ], batch size: 36, lr: 3.82e-03, grad_scale: 32.0 2024-09-24 18:34:26,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-24 18:34:35,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2024-09-24 18:34:48,370 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.261e+02 1.326e+02 1.431e+02 2.182e+02, threshold=2.652e+02, percent-clipped=0.0 2024-09-24 18:34:50,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=563084.6666666666, ans=0.0 2024-09-24 18:35:09,881 INFO [train.py:1198] (0/4) Epoch 31, batch 3800, loss[loss=0.2163, ctc_loss=0.144, cr_loss=0.3611, over 17359.00 frames. ], tot_loss[loss=0.2007, ctc_loss=0.131, cr_loss=0.3485, over 3341953.59 frames. ], batch size: 48, lr: 3.82e-03, grad_scale: 16.0 2024-09-24 18:35:23,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=563224.6666666666, ans=0.125 2024-09-24 18:35:59,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=563318.0, ans=10.0 2024-09-24 18:35:59,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-24 18:36:10,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=563364.6666666666, ans=0.0 2024-09-24 18:36:14,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=563364.6666666666, ans=0.0 2024-09-24 18:36:25,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=563411.3333333334, ans=0.2 2024-09-24 18:36:27,195 INFO [train.py:1198] (0/4) Epoch 31, batch 3850, loss[loss=0.2085, ctc_loss=0.1349, cr_loss=0.3678, over 17211.00 frames. ], tot_loss[loss=0.2024, ctc_loss=0.1325, cr_loss=0.3499, over 3288861.05 frames. ], batch size: 47, lr: 3.81e-03, grad_scale: 16.0 2024-09-24 18:36:35,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=563411.3333333334, ans=0.0 2024-09-24 18:36:45,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2024-09-24 18:36:46,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=563458.0, ans=0.125 2024-09-24 18:36:57,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=563504.6666666666, ans=0.125 2024-09-24 18:37:11,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=563504.6666666666, ans=0.125 2024-09-24 18:37:14,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=563551.3333333334, ans=0.2 2024-09-24 18:37:24,225 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.313e+02 1.427e+02 1.596e+02 2.274e+02, threshold=2.854e+02, percent-clipped=0.0 2024-09-24 18:37:37,539 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-31.pt 2024-09-24 18:38:29,745 INFO [train.py:1198] (0/4) Epoch 32, batch 0, loss[loss=0.2369, ctc_loss=0.163, cr_loss=0.3692, over 11643.00 frames. ], tot_loss[loss=0.2369, ctc_loss=0.163, cr_loss=0.3692, over 11643.00 frames. ], batch size: 123, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:38:29,746 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 18:38:45,170 INFO [train.py:1230] (0/4) Epoch 32, validation: loss=0.03599, ctc_loss=0.03599, cr_loss=9.022e-15, over 944034.00 frames. 2024-09-24 18:38:45,170 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 18:38:58,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=563626.0, ans=0.125 2024-09-24 18:38:59,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=563672.6666666666, ans=0.0 2024-09-24 18:39:17,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=563719.3333333334, ans=0.0 2024-09-24 18:39:18,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=563719.3333333334, ans=0.2 2024-09-24 18:40:14,197 INFO [train.py:1198] (0/4) Epoch 32, batch 50, loss[loss=0.1598, ctc_loss=0.1012, cr_loss=0.293, over 17022.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.13, cr_loss=0.3464, over 752265.95 frames. ], batch size: 39, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:40:30,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=563906.0, ans=0.125 2024-09-24 18:41:19,567 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.253e+02 1.338e+02 1.477e+02 2.326e+02, threshold=2.677e+02, percent-clipped=0.0 2024-09-24 18:41:33,987 INFO [train.py:1198] (0/4) Epoch 32, batch 100, loss[loss=0.2113, ctc_loss=0.1387, cr_loss=0.3628, over 16720.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1287, cr_loss=0.3446, over 1334767.58 frames. ], batch size: 61, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:41:34,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=564092.6666666666, ans=0.125 2024-09-24 18:41:55,138 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:41:57,182 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.83 vs. limit=10.0 2024-09-24 18:42:03,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564139.3333333334, ans=0.1 2024-09-24 18:42:09,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=564186.0, ans=0.025 2024-09-24 18:42:56,208 INFO [train.py:1198] (0/4) Epoch 32, batch 150, loss[loss=0.1896, ctc_loss=0.1211, cr_loss=0.3426, over 17181.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1296, cr_loss=0.347, over 1784053.49 frames. ], batch size: 41, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:43:42,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=564466.0, ans=0.0 2024-09-24 18:43:42,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=564466.0, ans=0.0 2024-09-24 18:43:46,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=564466.0, ans=0.125 2024-09-24 18:43:47,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=564466.0, ans=0.0 2024-09-24 18:44:01,708 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.312e+02 1.410e+02 1.549e+02 1.890e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-24 18:44:11,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=564512.6666666666, ans=0.1 2024-09-24 18:44:16,132 INFO [train.py:1198] (0/4) Epoch 32, batch 200, loss[loss=0.1826, ctc_loss=0.1198, cr_loss=0.3138, over 16952.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1287, cr_loss=0.3464, over 2132954.82 frames. ], batch size: 42, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:45:19,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=15.0 2024-09-24 18:45:29,099 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:45:32,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=12.0 2024-09-24 18:45:46,134 INFO [train.py:1198] (0/4) Epoch 32, batch 250, loss[loss=0.203, ctc_loss=0.129, cr_loss=0.3701, over 17071.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1294, cr_loss=0.3483, over 2409956.99 frames. ], batch size: 46, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:46:35,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=564932.6666666666, ans=0.07 2024-09-24 18:46:35,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=564932.6666666666, ans=0.1 2024-09-24 18:46:45,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=564932.6666666666, ans=0.125 2024-09-24 18:46:52,819 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.269e+02 1.327e+02 1.431e+02 2.007e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-24 18:47:05,439 INFO [train.py:1198] (0/4) Epoch 32, batch 300, loss[loss=0.1972, ctc_loss=0.129, cr_loss=0.3406, over 17320.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1288, cr_loss=0.3463, over 2624028.39 frames. ], batch size: 49, lr: 3.75e-03, grad_scale: 16.0 2024-09-24 18:47:30,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=565072.6666666666, ans=0.125 2024-09-24 18:48:13,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=565212.6666666666, ans=0.5 2024-09-24 18:48:29,165 INFO [train.py:1198] (0/4) Epoch 32, batch 350, loss[loss=0.2029, ctc_loss=0.1315, cr_loss=0.3572, over 17364.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1281, cr_loss=0.3444, over 2786525.44 frames. ], batch size: 48, lr: 3.75e-03, grad_scale: 16.0 2024-09-24 18:48:31,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=565259.3333333334, ans=0.0 2024-09-24 18:48:47,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=565306.0, ans=0.0 2024-09-24 18:48:48,113 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=12.0 2024-09-24 18:49:06,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=565352.6666666666, ans=0.0 2024-09-24 18:49:22,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=565399.3333333334, ans=0.025 2024-09-24 18:49:36,804 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.263e+02 1.319e+02 1.416e+02 2.178e+02, threshold=2.638e+02, percent-clipped=0.0 2024-09-24 18:49:52,385 INFO [train.py:1198] (0/4) Epoch 32, batch 400, loss[loss=0.2621, ctc_loss=0.1755, cr_loss=0.4329, over 17233.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1285, cr_loss=0.3457, over 2920379.57 frames. ], batch size: 55, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:50:33,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=565586.0, ans=0.0 2024-09-24 18:50:47,226 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:50:54,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=565632.6666666666, ans=15.0 2024-09-24 18:51:18,646 INFO [train.py:1198] (0/4) Epoch 32, batch 450, loss[loss=0.2002, ctc_loss=0.1301, cr_loss=0.3507, over 17353.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1295, cr_loss=0.347, over 3014307.88 frames. ], batch size: 48, lr: 3.75e-03, grad_scale: 32.0 2024-09-24 18:51:38,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=565772.6666666666, ans=0.1 2024-09-24 18:51:46,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=565772.6666666666, ans=0.125 2024-09-24 18:52:00,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=565819.3333333334, ans=0.0 2024-09-24 18:52:08,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=565866.0, ans=0.125 2024-09-24 18:52:25,961 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.290e+02 1.389e+02 1.522e+02 2.571e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-24 18:52:38,791 INFO [train.py:1198] (0/4) Epoch 32, batch 500, loss[loss=0.2001, ctc_loss=0.1273, cr_loss=0.3639, over 16945.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1289, cr_loss=0.3461, over 3091258.11 frames. ], batch size: 42, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 18:52:39,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565959.3333333334, ans=0.125 2024-09-24 18:52:39,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565959.3333333334, ans=0.1 2024-09-24 18:52:40,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=565959.3333333334, ans=0.125 2024-09-24 18:52:45,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=565959.3333333334, ans=0.05 2024-09-24 18:52:51,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=565959.3333333334, ans=0.0 2024-09-24 18:52:55,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=565959.3333333334, ans=0.0 2024-09-24 18:53:33,297 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:53:39,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=566099.3333333334, ans=0.125 2024-09-24 18:54:01,197 INFO [train.py:1198] (0/4) Epoch 32, batch 550, loss[loss=0.1924, ctc_loss=0.1279, cr_loss=0.3226, over 16627.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1298, cr_loss=0.3477, over 3130586.40 frames. ], batch size: 61, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 18:54:24,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=566239.3333333334, ans=0.0 2024-09-24 18:54:25,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=566239.3333333334, ans=0.0 2024-09-24 18:54:40,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=566286.0, ans=0.1 2024-09-24 18:54:53,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=22.5 2024-09-24 18:55:16,236 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.263e+02 1.353e+02 1.425e+02 2.013e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-24 18:55:26,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=566379.3333333334, ans=0.125 2024-09-24 18:55:29,024 INFO [train.py:1198] (0/4) Epoch 32, batch 600, loss[loss=0.1929, ctc_loss=0.1271, cr_loss=0.329, over 17233.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.13, cr_loss=0.3473, over 3180301.29 frames. ], batch size: 47, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 18:55:29,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=566426.0, ans=0.125 2024-09-24 18:55:42,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=566426.0, ans=0.0 2024-09-24 18:55:53,462 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 18:56:09,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=566519.3333333334, ans=0.0 2024-09-24 18:56:12,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=566519.3333333334, ans=0.09899494936611666 2024-09-24 18:56:16,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=566566.0, ans=0.07 2024-09-24 18:56:49,244 INFO [train.py:1198] (0/4) Epoch 32, batch 650, loss[loss=0.1604, ctc_loss=0.1028, cr_loss=0.2878, over 17205.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1288, cr_loss=0.3454, over 3225348.73 frames. ], batch size: 41, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 18:56:55,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=566659.3333333334, ans=15.0 2024-09-24 18:57:00,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=566659.3333333334, ans=0.2 2024-09-24 18:57:10,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=566706.0, ans=0.125 2024-09-24 18:57:23,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2024-09-24 18:57:25,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2024-09-24 18:57:28,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=566752.6666666666, ans=10.0 2024-09-24 18:58:01,566 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.265e+02 1.321e+02 1.437e+02 2.053e+02, threshold=2.643e+02, percent-clipped=0.0 2024-09-24 18:58:11,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=566892.6666666666, ans=0.0 2024-09-24 18:58:13,019 INFO [train.py:1198] (0/4) Epoch 32, batch 700, loss[loss=0.1917, ctc_loss=0.1243, cr_loss=0.3372, over 17256.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.129, cr_loss=0.346, over 3256041.14 frames. ], batch size: 42, lr: 3.74e-03, grad_scale: 16.0 2024-09-24 18:58:16,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=566892.6666666666, ans=0.125 2024-09-24 18:58:39,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-09-24 18:58:42,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-09-24 18:58:46,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=566986.0, ans=0.2 2024-09-24 18:59:33,163 INFO [train.py:1198] (0/4) Epoch 32, batch 750, loss[loss=0.1798, ctc_loss=0.1168, cr_loss=0.3153, over 17058.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1294, cr_loss=0.3462, over 3267905.86 frames. ], batch size: 46, lr: 3.74e-03, grad_scale: 16.0 2024-09-24 18:59:43,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567126.0, ans=0.1 2024-09-24 19:00:09,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2024-09-24 19:00:16,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=567219.3333333334, ans=0.125 2024-09-24 19:00:25,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567219.3333333334, ans=0.1 2024-09-24 19:00:30,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=567266.0, ans=0.07 2024-09-24 19:00:48,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2024-09-24 19:00:49,277 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.286e+02 1.352e+02 1.497e+02 2.163e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-24 19:00:52,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=567312.6666666666, ans=0.0 2024-09-24 19:00:52,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567312.6666666666, ans=0.1 2024-09-24 19:00:57,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=567312.6666666666, ans=0.125 2024-09-24 19:01:00,449 INFO [train.py:1198] (0/4) Epoch 32, batch 800, loss[loss=0.202, ctc_loss=0.1307, cr_loss=0.3568, over 17294.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1292, cr_loss=0.3455, over 3288149.04 frames. ], batch size: 51, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:01:10,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=567359.3333333334, ans=0.025 2024-09-24 19:01:21,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=567406.0, ans=0.125 2024-09-24 19:01:24,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=567406.0, ans=0.2 2024-09-24 19:01:25,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=567406.0, ans=0.1 2024-09-24 19:02:00,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=22.5 2024-09-24 19:02:09,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=567546.0, ans=0.0 2024-09-24 19:02:12,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=567546.0, ans=0.125 2024-09-24 19:02:20,178 INFO [train.py:1198] (0/4) Epoch 32, batch 850, loss[loss=0.168, ctc_loss=0.1091, cr_loss=0.2945, over 17232.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1296, cr_loss=0.3455, over 3303265.38 frames. ], batch size: 47, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:02:26,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567592.6666666666, ans=0.1 2024-09-24 19:03:01,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.30 vs. limit=10.0 2024-09-24 19:03:09,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-24 19:03:31,999 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.300e+02 1.382e+02 1.535e+02 3.730e+02, threshold=2.764e+02, percent-clipped=1.0 2024-09-24 19:03:40,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=567779.3333333334, ans=0.0 2024-09-24 19:03:40,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=567779.3333333334, ans=0.125 2024-09-24 19:03:43,405 INFO [train.py:1198] (0/4) Epoch 32, batch 900, loss[loss=0.2186, ctc_loss=0.147, cr_loss=0.358, over 15341.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1291, cr_loss=0.345, over 3316968.45 frames. ], batch size: 89, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:04:09,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567872.6666666666, ans=0.1 2024-09-24 19:04:11,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=567872.6666666666, ans=0.0 2024-09-24 19:04:17,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=567919.3333333334, ans=0.2 2024-09-24 19:04:22,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=567919.3333333334, ans=0.0 2024-09-24 19:05:09,166 INFO [train.py:1198] (0/4) Epoch 32, batch 950, loss[loss=0.2012, ctc_loss=0.1353, cr_loss=0.3296, over 17002.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1297, cr_loss=0.3458, over 3329323.44 frames. ], batch size: 53, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:05:15,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568059.3333333334, ans=0.1 2024-09-24 19:05:15,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2024-09-24 19:05:45,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568152.6666666666, ans=0.1 2024-09-24 19:05:47,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=568152.6666666666, ans=0.0 2024-09-24 19:05:52,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=568152.6666666666, ans=0.1 2024-09-24 19:05:55,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=568152.6666666666, ans=0.2 2024-09-24 19:06:12,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=568199.3333333334, ans=0.0 2024-09-24 19:06:20,503 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.287e+02 1.382e+02 1.480e+02 1.903e+02, threshold=2.765e+02, percent-clipped=0.0 2024-09-24 19:06:31,839 INFO [train.py:1198] (0/4) Epoch 32, batch 1000, loss[loss=0.1869, ctc_loss=0.1183, cr_loss=0.3431, over 17281.00 frames. ], tot_loss[loss=0.1988, ctc_loss=0.1297, cr_loss=0.3457, over 3339436.39 frames. ], batch size: 42, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:06:49,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=568339.3333333334, ans=0.0 2024-09-24 19:06:51,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=568339.3333333334, ans=0.125 2024-09-24 19:07:20,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-09-24 19:07:30,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=568432.6666666666, ans=0.0 2024-09-24 19:07:46,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=568479.3333333334, ans=0.125 2024-09-24 19:07:54,841 INFO [train.py:1198] (0/4) Epoch 32, batch 1050, loss[loss=0.2221, ctc_loss=0.1446, cr_loss=0.3877, over 17004.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1292, cr_loss=0.3455, over 3351969.50 frames. ], batch size: 56, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:07:56,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=568526.0, ans=0.125 2024-09-24 19:08:24,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.13 vs. limit=10.0 2024-09-24 19:08:26,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=568619.3333333334, ans=0.0 2024-09-24 19:08:44,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-09-24 19:08:57,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2024-09-24 19:09:03,547 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.269e+02 1.394e+02 1.503e+02 2.012e+02, threshold=2.787e+02, percent-clipped=0.0 2024-09-24 19:09:15,035 INFO [train.py:1198] (0/4) Epoch 32, batch 1100, loss[loss=0.187, ctc_loss=0.1198, cr_loss=0.3359, over 17011.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1286, cr_loss=0.3449, over 3358616.69 frames. ], batch size: 51, lr: 3.74e-03, grad_scale: 32.0 2024-09-24 19:09:18,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568759.3333333334, ans=0.1 2024-09-24 19:09:19,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=568759.3333333334, ans=0.125 2024-09-24 19:09:19,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=568759.3333333334, ans=0.95 2024-09-24 19:09:41,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568806.0, ans=0.1 2024-09-24 19:09:47,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=568806.0, ans=0.0 2024-09-24 19:09:57,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2024-09-24 19:10:02,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-24 19:10:32,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568946.0, ans=0.1 2024-09-24 19:10:37,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=568946.0, ans=0.125 2024-09-24 19:10:41,724 INFO [train.py:1198] (0/4) Epoch 32, batch 1150, loss[loss=0.1762, ctc_loss=0.1121, cr_loss=0.3202, over 17047.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1279, cr_loss=0.3435, over 3364215.75 frames. ], batch size: 39, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:10:43,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=568992.6666666666, ans=0.125 2024-09-24 19:11:12,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=569086.0, ans=15.0 2024-09-24 19:11:14,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=569086.0, ans=0.125 2024-09-24 19:11:26,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=569086.0, ans=10.0 2024-09-24 19:11:42,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=569132.6666666666, ans=0.0 2024-09-24 19:11:50,710 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.032e+02 1.271e+02 1.362e+02 1.461e+02 3.149e+02, threshold=2.725e+02, percent-clipped=1.0 2024-09-24 19:11:55,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=569179.3333333334, ans=0.125 2024-09-24 19:11:57,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=569179.3333333334, ans=0.2 2024-09-24 19:12:00,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=569226.0, ans=0.125 2024-09-24 19:12:01,963 INFO [train.py:1198] (0/4) Epoch 32, batch 1200, loss[loss=0.2266, ctc_loss=0.1459, cr_loss=0.4035, over 17100.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1284, cr_loss=0.3452, over 3362924.27 frames. ], batch size: 49, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:12:14,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=569226.0, ans=0.025 2024-09-24 19:12:24,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=569272.6666666666, ans=0.0 2024-09-24 19:13:03,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=569366.0, ans=0.07 2024-09-24 19:13:07,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=569412.6666666666, ans=0.0 2024-09-24 19:13:16,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=569412.6666666666, ans=0.2 2024-09-24 19:13:24,567 INFO [train.py:1198] (0/4) Epoch 32, batch 1250, loss[loss=0.2142, ctc_loss=0.1421, cr_loss=0.3607, over 15851.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1284, cr_loss=0.3449, over 3366310.55 frames. ], batch size: 74, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:13:27,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-09-24 19:13:31,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=569459.3333333334, ans=0.125 2024-09-24 19:14:07,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2024-09-24 19:14:14,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=569599.3333333334, ans=0.125 2024-09-24 19:14:29,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2024-09-24 19:14:34,070 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.275e+02 1.363e+02 1.497e+02 1.949e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-24 19:14:49,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=12.0 2024-09-24 19:14:50,539 INFO [train.py:1198] (0/4) Epoch 32, batch 1300, loss[loss=0.2146, ctc_loss=0.1404, cr_loss=0.371, over 16796.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.128, cr_loss=0.3443, over 3350729.79 frames. ], batch size: 61, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:14:58,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569692.6666666666, ans=0.1 2024-09-24 19:15:02,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=22.5 2024-09-24 19:15:20,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=569739.3333333334, ans=0.0 2024-09-24 19:15:25,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=569786.0, ans=0.0 2024-09-24 19:15:49,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=569832.6666666666, ans=0.125 2024-09-24 19:15:59,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=569879.3333333334, ans=0.125 2024-09-24 19:16:00,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=569879.3333333334, ans=0.2 2024-09-24 19:16:13,219 INFO [train.py:1198] (0/4) Epoch 32, batch 1350, loss[loss=0.1624, ctc_loss=0.1034, cr_loss=0.2952, over 17264.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1283, cr_loss=0.3445, over 3358027.13 frames. ], batch size: 42, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:16:43,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=570019.3333333334, ans=0.0 2024-09-24 19:16:50,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=570019.3333333334, ans=0.05 2024-09-24 19:16:50,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=570019.3333333334, ans=0.0 2024-09-24 19:16:51,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=570019.3333333334, ans=0.0 2024-09-24 19:16:56,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=570019.3333333334, ans=0.125 2024-09-24 19:17:23,672 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.251e+02 1.359e+02 1.461e+02 2.063e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-24 19:17:33,283 INFO [train.py:1198] (0/4) Epoch 32, batch 1400, loss[loss=0.2156, ctc_loss=0.143, cr_loss=0.3633, over 17319.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1278, cr_loss=0.344, over 3362633.76 frames. ], batch size: 51, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:17:56,086 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-09-24 19:18:06,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=570252.6666666666, ans=0.2 2024-09-24 19:18:23,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=570299.3333333334, ans=0.0 2024-09-24 19:18:37,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=570299.3333333334, ans=0.025 2024-09-24 19:18:56,762 INFO [train.py:1198] (0/4) Epoch 32, batch 1450, loss[loss=0.1646, ctc_loss=0.1057, cr_loss=0.2943, over 16950.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1276, cr_loss=0.344, over 3362947.34 frames. ], batch size: 42, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:19:01,969 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 19:19:24,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=570439.3333333334, ans=0.125 2024-09-24 19:20:02,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-09-24 19:20:03,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=570532.6666666666, ans=0.125 2024-09-24 19:20:12,693 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.251e+02 1.341e+02 1.423e+02 1.681e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-24 19:20:13,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=570579.3333333334, ans=0.0 2024-09-24 19:20:13,750 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2024-09-24 19:20:24,789 INFO [train.py:1198] (0/4) Epoch 32, batch 1500, loss[loss=0.2117, ctc_loss=0.1416, cr_loss=0.3504, over 17046.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1285, cr_loss=0.3456, over 3356772.39 frames. ], batch size: 56, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:20:53,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=570672.6666666666, ans=0.2 2024-09-24 19:20:55,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=570719.3333333334, ans=0.125 2024-09-24 19:21:44,259 INFO [train.py:1198] (0/4) Epoch 32, batch 1550, loss[loss=0.202, ctc_loss=0.1324, cr_loss=0.3481, over 17361.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1278, cr_loss=0.3436, over 3358309.34 frames. ], batch size: 48, lr: 3.73e-03, grad_scale: 16.0 2024-09-24 19:21:56,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=570859.3333333334, ans=0.125 2024-09-24 19:22:11,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=22.5 2024-09-24 19:22:28,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=570952.6666666666, ans=0.125 2024-09-24 19:22:57,885 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.269e+02 1.362e+02 1.460e+02 2.798e+02, threshold=2.724e+02, percent-clipped=1.0 2024-09-24 19:22:59,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=571046.0, ans=0.0 2024-09-24 19:23:07,638 INFO [train.py:1198] (0/4) Epoch 32, batch 1600, loss[loss=0.1775, ctc_loss=0.1151, cr_loss=0.3121, over 16740.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1284, cr_loss=0.3444, over 3344728.41 frames. ], batch size: 61, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:23:16,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=8.0 2024-09-24 19:23:25,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=571139.3333333334, ans=0.0 2024-09-24 19:24:27,904 INFO [train.py:1198] (0/4) Epoch 32, batch 1650, loss[loss=0.1944, ctc_loss=0.125, cr_loss=0.3468, over 17303.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1276, cr_loss=0.3434, over 3344736.65 frames. ], batch size: 49, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:25:00,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=571372.6666666666, ans=0.0 2024-09-24 19:25:04,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=22.5 2024-09-24 19:25:40,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=571512.6666666666, ans=0.125 2024-09-24 19:25:45,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-09-24 19:25:46,145 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.279e+02 1.364e+02 1.457e+02 2.649e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-24 19:25:46,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=571512.6666666666, ans=0.125 2024-09-24 19:25:52,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=571512.6666666666, ans=0.125 2024-09-24 19:25:55,640 INFO [train.py:1198] (0/4) Epoch 32, batch 1700, loss[loss=0.2297, ctc_loss=0.1596, cr_loss=0.3508, over 11591.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1274, cr_loss=0.3425, over 3340182.26 frames. ], batch size: 123, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:26:18,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=571606.0, ans=0.125 2024-09-24 19:26:29,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=571652.6666666666, ans=0.025 2024-09-24 19:26:40,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=571652.6666666666, ans=0.125 2024-09-24 19:26:56,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=571699.3333333334, ans=0.0 2024-09-24 19:26:59,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=571746.0, ans=0.125 2024-09-24 19:27:07,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2024-09-24 19:27:15,588 INFO [train.py:1198] (0/4) Epoch 32, batch 1750, loss[loss=0.1921, ctc_loss=0.125, cr_loss=0.3354, over 17051.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1282, cr_loss=0.344, over 3352816.07 frames. ], batch size: 46, lr: 3.73e-03, grad_scale: 32.0 2024-09-24 19:27:18,255 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2024-09-24 19:28:09,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=571932.6666666666, ans=0.125 2024-09-24 19:28:28,286 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.257e+02 1.347e+02 1.444e+02 1.830e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-24 19:28:37,947 INFO [train.py:1198] (0/4) Epoch 32, batch 1800, loss[loss=0.2284, ctc_loss=0.1511, cr_loss=0.3864, over 17017.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.129, cr_loss=0.3457, over 3353567.41 frames. ], batch size: 56, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:28:38,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=572026.0, ans=0.1 2024-09-24 19:29:03,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=572072.6666666666, ans=0.0 2024-09-24 19:29:21,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=572119.3333333334, ans=0.1 2024-09-24 19:29:44,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=572166.0, ans=0.0 2024-09-24 19:29:58,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=572212.6666666666, ans=0.125 2024-09-24 19:30:02,992 INFO [train.py:1198] (0/4) Epoch 32, batch 1850, loss[loss=0.2418, ctc_loss=0.1672, cr_loss=0.3729, over 11530.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.1291, cr_loss=0.3455, over 3351224.44 frames. ], batch size: 123, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:30:18,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=572259.3333333334, ans=0.09899494936611666 2024-09-24 19:30:22,968 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-24 19:30:41,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=572352.6666666666, ans=0.1 2024-09-24 19:30:45,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=572352.6666666666, ans=0.125 2024-09-24 19:30:53,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=572399.3333333334, ans=0.125 2024-09-24 19:31:17,415 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.271e+02 1.345e+02 1.449e+02 2.207e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-24 19:31:25,382 INFO [train.py:1198] (0/4) Epoch 32, batch 1900, loss[loss=0.1699, ctc_loss=0.1131, cr_loss=0.2839, over 17093.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.1287, cr_loss=0.3448, over 3355657.55 frames. ], batch size: 49, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:31:28,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572492.6666666666, ans=0.1 2024-09-24 19:31:36,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=572492.6666666666, ans=0.125 2024-09-24 19:31:40,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=572539.3333333334, ans=0.2 2024-09-24 19:31:43,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=572539.3333333334, ans=0.125 2024-09-24 19:31:45,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=572539.3333333334, ans=0.2 2024-09-24 19:32:23,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=572632.6666666666, ans=0.0 2024-09-24 19:32:36,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=572679.3333333334, ans=0.125 2024-09-24 19:32:47,863 INFO [train.py:1198] (0/4) Epoch 32, batch 1950, loss[loss=0.2081, ctc_loss=0.1391, cr_loss=0.3452, over 16440.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1283, cr_loss=0.3447, over 3349531.62 frames. ], batch size: 66, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:32:51,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=572726.0, ans=0.0 2024-09-24 19:33:19,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-09-24 19:33:32,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=572819.3333333334, ans=0.0 2024-09-24 19:33:36,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=572866.0, ans=0.0 2024-09-24 19:33:37,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=572866.0, ans=0.0 2024-09-24 19:33:54,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=22.5 2024-09-24 19:34:00,277 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.299e+02 1.379e+02 1.509e+02 1.987e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-24 19:34:02,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=572912.6666666666, ans=0.0 2024-09-24 19:34:04,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-09-24 19:34:08,190 INFO [train.py:1198] (0/4) Epoch 32, batch 2000, loss[loss=0.2217, ctc_loss=0.1442, cr_loss=0.3874, over 16542.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.128, cr_loss=0.3444, over 3356249.52 frames. ], batch size: 66, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:34:16,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=572959.3333333334, ans=0.0 2024-09-24 19:34:24,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=573006.0, ans=0.0 2024-09-24 19:34:35,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-24 19:34:41,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=573052.6666666666, ans=0.2 2024-09-24 19:35:06,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.68 vs. limit=12.0 2024-09-24 19:35:11,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2024-09-24 19:35:33,443 INFO [train.py:1198] (0/4) Epoch 32, batch 2050, loss[loss=0.1907, ctc_loss=0.1227, cr_loss=0.3401, over 17019.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3441, over 3367162.98 frames. ], batch size: 44, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:35:52,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2024-09-24 19:35:56,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=573239.3333333334, ans=0.0 2024-09-24 19:36:25,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2024-09-24 19:36:45,681 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.270e+02 1.344e+02 1.458e+02 2.417e+02, threshold=2.689e+02, percent-clipped=0.0 2024-09-24 19:36:53,624 INFO [train.py:1198] (0/4) Epoch 32, batch 2100, loss[loss=0.1741, ctc_loss=0.1131, cr_loss=0.3047, over 17258.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1282, cr_loss=0.3452, over 3370660.39 frames. ], batch size: 44, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:37:06,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=573426.0, ans=0.0 2024-09-24 19:37:08,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=573472.6666666666, ans=0.125 2024-09-24 19:37:20,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=573472.6666666666, ans=0.125 2024-09-24 19:37:32,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=573519.3333333334, ans=0.025 2024-09-24 19:37:58,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=573612.6666666666, ans=0.125 2024-09-24 19:38:15,961 INFO [train.py:1198] (0/4) Epoch 32, batch 2150, loss[loss=0.2211, ctc_loss=0.1444, cr_loss=0.3833, over 17014.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1288, cr_loss=0.3459, over 3350890.04 frames. ], batch size: 53, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:39:10,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=573799.3333333334, ans=0.0 2024-09-24 19:39:33,825 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.018e+02 1.275e+02 1.342e+02 1.446e+02 2.934e+02, threshold=2.683e+02, percent-clipped=1.0 2024-09-24 19:39:34,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=573846.0, ans=0.125 2024-09-24 19:39:40,380 INFO [train.py:1198] (0/4) Epoch 32, batch 2200, loss[loss=0.1988, ctc_loss=0.1313, cr_loss=0.3371, over 17351.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.129, cr_loss=0.3457, over 3352651.12 frames. ], batch size: 48, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:39:53,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=573892.6666666666, ans=0.125 2024-09-24 19:40:06,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=573939.3333333334, ans=0.0 2024-09-24 19:40:12,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-09-24 19:40:42,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=574032.6666666666, ans=0.0 2024-09-24 19:40:47,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=574079.3333333334, ans=0.125 2024-09-24 19:41:03,502 INFO [train.py:1198] (0/4) Epoch 32, batch 2250, loss[loss=0.1986, ctc_loss=0.128, cr_loss=0.3533, over 17272.00 frames. ], tot_loss[loss=0.1987, ctc_loss=0.1295, cr_loss=0.3462, over 3350250.14 frames. ], batch size: 44, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:41:18,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2024-09-24 19:41:23,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=574172.6666666666, ans=0.2 2024-09-24 19:41:37,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=574219.3333333334, ans=0.125 2024-09-24 19:41:49,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=574266.0, ans=0.0 2024-09-24 19:42:05,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=574312.6666666666, ans=0.125 2024-09-24 19:42:16,564 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.263e+02 1.326e+02 1.428e+02 1.960e+02, threshold=2.652e+02, percent-clipped=0.0 2024-09-24 19:42:22,903 INFO [train.py:1198] (0/4) Epoch 32, batch 2300, loss[loss=0.1917, ctc_loss=0.1238, cr_loss=0.3397, over 17082.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.129, cr_loss=0.3457, over 3357006.40 frames. ], batch size: 43, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:42:35,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=574359.3333333334, ans=0.125 2024-09-24 19:42:40,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=574406.0, ans=0.125 2024-09-24 19:42:54,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=574406.0, ans=0.0 2024-09-24 19:42:55,239 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.57 vs. limit=6.0 2024-09-24 19:42:57,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=574452.6666666666, ans=0.0 2024-09-24 19:43:44,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-24 19:43:45,788 INFO [train.py:1198] (0/4) Epoch 32, batch 2350, loss[loss=0.186, ctc_loss=0.1207, cr_loss=0.3263, over 17153.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.129, cr_loss=0.3457, over 3352377.13 frames. ], batch size: 41, lr: 3.72e-03, grad_scale: 16.0 2024-09-24 19:44:05,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=574639.3333333334, ans=0.125 2024-09-24 19:44:07,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=574639.3333333334, ans=0.125 2024-09-24 19:44:10,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=574639.3333333334, ans=15.0 2024-09-24 19:44:11,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=574639.3333333334, ans=0.125 2024-09-24 19:44:20,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=574686.0, ans=22.5 2024-09-24 19:45:01,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=574779.3333333334, ans=0.125 2024-09-24 19:45:04,498 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.259e+02 1.339e+02 1.484e+02 2.121e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 19:45:13,563 INFO [train.py:1198] (0/4) Epoch 32, batch 2400, loss[loss=0.1854, ctc_loss=0.1194, cr_loss=0.3297, over 17155.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1286, cr_loss=0.345, over 3357532.65 frames. ], batch size: 45, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:45:28,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=574872.6666666666, ans=0.2 2024-09-24 19:45:36,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=574872.6666666666, ans=0.0 2024-09-24 19:45:36,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=574872.6666666666, ans=0.0 2024-09-24 19:45:39,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=574872.6666666666, ans=0.0 2024-09-24 19:45:47,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-24 19:45:59,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2024-09-24 19:46:27,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=575012.6666666666, ans=0.125 2024-09-24 19:46:33,600 INFO [train.py:1198] (0/4) Epoch 32, batch 2450, loss[loss=0.2214, ctc_loss=0.1462, cr_loss=0.376, over 16781.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1282, cr_loss=0.3451, over 3360581.03 frames. ], batch size: 61, lr: 3.72e-03, grad_scale: 32.0 2024-09-24 19:46:43,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=575059.3333333334, ans=0.0 2024-09-24 19:46:45,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=575059.3333333334, ans=0.125 2024-09-24 19:46:51,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=575106.0, ans=0.5 2024-09-24 19:47:35,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=575199.3333333334, ans=0.0 2024-09-24 19:47:49,733 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.250e+02 1.318e+02 1.420e+02 2.123e+02, threshold=2.637e+02, percent-clipped=0.0 2024-09-24 19:47:53,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=575246.0, ans=0.0 2024-09-24 19:47:56,356 INFO [train.py:1198] (0/4) Epoch 32, batch 2500, loss[loss=0.1476, ctc_loss=0.09444, cr_loss=0.2657, over 17092.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.3449, over 3368707.82 frames. ], batch size: 43, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:48:20,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=575339.3333333334, ans=0.125 2024-09-24 19:48:22,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=575339.3333333334, ans=0.125 2024-09-24 19:48:32,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2024-09-24 19:48:33,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=575386.0, ans=10.0 2024-09-24 19:48:33,798 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-24 19:48:36,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=575386.0, ans=0.2 2024-09-24 19:48:36,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=575386.0, ans=0.125 2024-09-24 19:48:58,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=575479.3333333334, ans=0.0 2024-09-24 19:49:00,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=575479.3333333334, ans=0.07 2024-09-24 19:49:06,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=575479.3333333334, ans=0.0 2024-09-24 19:49:18,664 INFO [train.py:1198] (0/4) Epoch 32, batch 2550, loss[loss=0.2361, ctc_loss=0.1598, cr_loss=0.3815, over 15209.00 frames. ], tot_loss[loss=0.1978, ctc_loss=0.1285, cr_loss=0.3463, over 3368607.86 frames. ], batch size: 89, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:50:06,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=575619.3333333334, ans=0.125 2024-09-24 19:50:17,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=575666.0, ans=0.0 2024-09-24 19:50:17,615 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=22.5 2024-09-24 19:50:34,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=575712.6666666666, ans=0.125 2024-09-24 19:50:39,008 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.044e+02 1.280e+02 1.336e+02 1.446e+02 2.286e+02, threshold=2.672e+02, percent-clipped=0.0 2024-09-24 19:50:43,840 INFO [train.py:1198] (0/4) Epoch 32, batch 2600, loss[loss=0.2037, ctc_loss=0.1323, cr_loss=0.3569, over 17350.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1287, cr_loss=0.3466, over 3366999.06 frames. ], batch size: 48, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 19:50:48,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=575759.3333333334, ans=0.125 2024-09-24 19:50:56,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=575759.3333333334, ans=0.125 2024-09-24 19:51:04,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=575806.0, ans=10.0 2024-09-24 19:51:10,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-24 19:51:11,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=575806.0, ans=0.125 2024-09-24 19:51:36,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=575899.3333333334, ans=0.0 2024-09-24 19:51:47,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=575946.0, ans=0.0 2024-09-24 19:51:49,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=575946.0, ans=0.125 2024-09-24 19:52:03,680 INFO [train.py:1198] (0/4) Epoch 32, batch 2650, loss[loss=0.2183, ctc_loss=0.1459, cr_loss=0.3623, over 15095.00 frames. ], tot_loss[loss=0.1973, ctc_loss=0.1281, cr_loss=0.3457, over 3371217.38 frames. ], batch size: 89, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 19:52:35,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=576039.3333333334, ans=0.2 2024-09-24 19:52:54,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.29 vs. limit=22.5 2024-09-24 19:52:55,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=576132.6666666666, ans=0.0 2024-09-24 19:53:03,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=576132.6666666666, ans=0.125 2024-09-24 19:53:15,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-24 19:53:22,461 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.282e+02 1.356e+02 1.452e+02 2.905e+02, threshold=2.711e+02, percent-clipped=3.0 2024-09-24 19:53:27,415 INFO [train.py:1198] (0/4) Epoch 32, batch 2700, loss[loss=0.1624, ctc_loss=0.1038, cr_loss=0.2931, over 16956.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1284, cr_loss=0.3461, over 3369247.92 frames. ], batch size: 42, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 19:53:40,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=576226.0, ans=0.07 2024-09-24 19:54:04,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=576319.3333333334, ans=10.0 2024-09-24 19:54:04,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=576319.3333333334, ans=0.125 2024-09-24 19:54:22,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=576366.0, ans=0.125 2024-09-24 19:54:30,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=576366.0, ans=0.1 2024-09-24 19:54:50,201 INFO [train.py:1198] (0/4) Epoch 32, batch 2750, loss[loss=0.2056, ctc_loss=0.1336, cr_loss=0.36, over 17072.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1286, cr_loss=0.3464, over 3375106.31 frames. ], batch size: 46, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 19:54:50,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=576459.3333333334, ans=0.07 2024-09-24 19:54:52,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=576459.3333333334, ans=0.1 2024-09-24 19:55:00,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2024-09-24 19:55:08,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=576506.0, ans=0.2 2024-09-24 19:55:39,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=576599.3333333334, ans=0.2 2024-09-24 19:56:03,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=576646.0, ans=0.125 2024-09-24 19:56:08,203 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.294e+02 1.403e+02 1.500e+02 1.947e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-24 19:56:13,040 INFO [train.py:1198] (0/4) Epoch 32, batch 2800, loss[loss=0.2087, ctc_loss=0.1315, cr_loss=0.3864, over 17212.00 frames. ], tot_loss[loss=0.1983, ctc_loss=0.1288, cr_loss=0.3472, over 3369766.05 frames. ], batch size: 47, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:56:22,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=576692.6666666666, ans=0.0 2024-09-24 19:56:23,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2024-09-24 19:57:13,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=576832.6666666666, ans=0.125 2024-09-24 19:57:14,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=576832.6666666666, ans=0.125 2024-09-24 19:57:36,058 INFO [train.py:1198] (0/4) Epoch 32, batch 2850, loss[loss=0.2245, ctc_loss=0.1493, cr_loss=0.3761, over 15267.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1293, cr_loss=0.348, over 3370845.98 frames. ], batch size: 89, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:58:01,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=576972.6666666666, ans=0.125 2024-09-24 19:58:45,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=577112.6666666666, ans=0.0 2024-09-24 19:58:51,150 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.067e+02 1.250e+02 1.340e+02 1.459e+02 3.976e+02, threshold=2.680e+02, percent-clipped=1.0 2024-09-24 19:58:55,899 INFO [train.py:1198] (0/4) Epoch 32, batch 2900, loss[loss=0.2174, ctc_loss=0.1424, cr_loss=0.3751, over 16910.00 frames. ], tot_loss[loss=0.1998, ctc_loss=0.13, cr_loss=0.349, over 3363644.65 frames. ], batch size: 58, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 19:58:56,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577159.3333333334, ans=0.1 2024-09-24 19:58:56,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=577159.3333333334, ans=0.125 2024-09-24 19:59:20,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=577206.0, ans=0.125 2024-09-24 19:59:20,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=577206.0, ans=10.0 2024-09-24 19:59:28,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=577206.0, ans=0.125 2024-09-24 19:59:28,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=577206.0, ans=0.0 2024-09-24 19:59:36,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577252.6666666666, ans=0.1 2024-09-24 19:59:55,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=577299.3333333334, ans=0.0 2024-09-24 20:00:05,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=577346.0, ans=0.0 2024-09-24 20:00:23,210 INFO [train.py:1198] (0/4) Epoch 32, batch 2950, loss[loss=0.1959, ctc_loss=0.1278, cr_loss=0.3404, over 17230.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1293, cr_loss=0.3479, over 3363928.74 frames. ], batch size: 50, lr: 3.71e-03, grad_scale: 32.0 2024-09-24 20:00:31,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=577392.6666666666, ans=10.0 2024-09-24 20:00:37,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=577439.3333333334, ans=0.0 2024-09-24 20:00:37,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=22.5 2024-09-24 20:01:39,276 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.060e+02 1.244e+02 1.340e+02 1.434e+02 1.849e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 20:01:42,521 INFO [train.py:1198] (0/4) Epoch 32, batch 3000, loss[loss=0.209, ctc_loss=0.1371, cr_loss=0.3594, over 17091.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.1292, cr_loss=0.3469, over 3362884.63 frames. ], batch size: 49, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 20:01:42,522 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 20:01:57,949 INFO [train.py:1230] (0/4) Epoch 32, validation: loss=0.03608, ctc_loss=0.03608, cr_loss=9.027e-15, over 944034.00 frames. 2024-09-24 20:01:57,950 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 20:02:15,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=577672.6666666666, ans=0.125 2024-09-24 20:02:29,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=577719.3333333334, ans=0.0 2024-09-24 20:02:52,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=577766.0, ans=0.04949747468305833 2024-09-24 20:03:01,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=577812.6666666666, ans=0.1 2024-09-24 20:03:16,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=577812.6666666666, ans=0.0 2024-09-24 20:03:19,565 INFO [train.py:1198] (0/4) Epoch 32, batch 3050, loss[loss=0.2141, ctc_loss=0.1446, cr_loss=0.3474, over 11492.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1297, cr_loss=0.3481, over 3362730.90 frames. ], batch size: 124, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 20:03:29,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=22.5 2024-09-24 20:03:49,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=577952.6666666666, ans=0.1 2024-09-24 20:04:34,696 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.228e+02 1.314e+02 1.465e+02 2.452e+02, threshold=2.627e+02, percent-clipped=0.0 2024-09-24 20:04:37,880 INFO [train.py:1198] (0/4) Epoch 32, batch 3100, loss[loss=0.2049, ctc_loss=0.133, cr_loss=0.3598, over 17288.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1299, cr_loss=0.349, over 3365435.35 frames. ], batch size: 46, lr: 3.71e-03, grad_scale: 16.0 2024-09-24 20:04:41,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=578092.6666666666, ans=0.0 2024-09-24 20:04:43,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2024-09-24 20:04:43,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-24 20:04:47,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=578092.6666666666, ans=0.125 2024-09-24 20:05:19,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=578186.0, ans=0.125 2024-09-24 20:05:24,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-09-24 20:05:38,067 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-09-24 20:05:47,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-24 20:05:50,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=578279.3333333334, ans=0.125 2024-09-24 20:05:56,196 INFO [train.py:1198] (0/4) Epoch 32, batch 3150, loss[loss=0.1599, ctc_loss=0.1037, cr_loss=0.2811, over 17027.00 frames. ], tot_loss[loss=0.1992, ctc_loss=0.1296, cr_loss=0.3478, over 3361673.47 frames. ], batch size: 39, lr: 3.70e-03, grad_scale: 16.0 2024-09-24 20:06:21,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=578372.6666666666, ans=0.125 2024-09-24 20:06:21,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=578372.6666666666, ans=0.0 2024-09-24 20:06:30,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578419.3333333334, ans=0.1 2024-09-24 20:06:46,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=578466.0, ans=0.025 2024-09-24 20:06:48,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=578466.0, ans=0.025 2024-09-24 20:07:08,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.85 vs. limit=10.0 2024-09-24 20:07:15,662 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.274e+02 1.414e+02 1.514e+02 2.013e+02, threshold=2.828e+02, percent-clipped=0.0 2024-09-24 20:07:18,927 INFO [train.py:1198] (0/4) Epoch 32, batch 3200, loss[loss=0.2079, ctc_loss=0.1355, cr_loss=0.3621, over 17017.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.1286, cr_loss=0.3454, over 3355984.97 frames. ], batch size: 52, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:07:19,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2024-09-24 20:07:25,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=578559.3333333334, ans=0.05 2024-09-24 20:07:34,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=578606.0, ans=0.125 2024-09-24 20:07:50,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=578652.6666666666, ans=0.0 2024-09-24 20:07:50,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=578652.6666666666, ans=0.2 2024-09-24 20:07:53,480 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-124000.pt 2024-09-24 20:08:09,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578699.3333333334, ans=0.1 2024-09-24 20:08:13,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=578699.3333333334, ans=0.125 2024-09-24 20:08:16,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=578699.3333333334, ans=0.1 2024-09-24 20:08:17,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=578699.3333333334, ans=0.125 2024-09-24 20:08:39,228 INFO [train.py:1198] (0/4) Epoch 32, batch 3250, loss[loss=0.1795, ctc_loss=0.1175, cr_loss=0.3099, over 17097.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1295, cr_loss=0.3476, over 3363819.71 frames. ], batch size: 43, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:08:55,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=578839.3333333334, ans=0.0 2024-09-24 20:09:19,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=578886.0, ans=0.125 2024-09-24 20:09:31,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=578932.6666666666, ans=0.125 2024-09-24 20:09:56,538 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.279e+02 1.356e+02 1.447e+02 1.834e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 20:09:59,775 INFO [train.py:1198] (0/4) Epoch 32, batch 3300, loss[loss=0.1622, ctc_loss=0.1054, cr_loss=0.2839, over 17273.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1286, cr_loss=0.3461, over 3366309.83 frames. ], batch size: 42, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:10:25,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2024-09-24 20:10:26,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=579072.6666666666, ans=0.125 2024-09-24 20:11:00,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=579212.6666666666, ans=0.125 2024-09-24 20:11:01,270 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-09-24 20:11:07,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-09-24 20:11:17,460 INFO [train.py:1198] (0/4) Epoch 32, batch 3350, loss[loss=0.2166, ctc_loss=0.1398, cr_loss=0.3841, over 16976.00 frames. ], tot_loss[loss=0.1981, ctc_loss=0.1288, cr_loss=0.3466, over 3369899.69 frames. ], batch size: 56, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:11:27,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579259.3333333334, ans=0.1 2024-09-24 20:11:33,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2024-09-24 20:11:44,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579306.0, ans=0.1 2024-09-24 20:12:16,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2024-09-24 20:12:22,332 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-09-24 20:12:32,493 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.280e+02 1.354e+02 1.467e+02 2.390e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 20:12:35,642 INFO [train.py:1198] (0/4) Epoch 32, batch 3400, loss[loss=0.2267, ctc_loss=0.1464, cr_loss=0.4015, over 17014.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.3452, over 3372172.06 frames. ], batch size: 51, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:12:35,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=579492.6666666666, ans=0.125 2024-09-24 20:13:55,924 INFO [train.py:1198] (0/4) Epoch 32, batch 3450, loss[loss=0.2136, ctc_loss=0.1367, cr_loss=0.3848, over 17354.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1275, cr_loss=0.3449, over 3378553.18 frames. ], batch size: 48, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:14:05,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579726.0, ans=0.1 2024-09-24 20:14:19,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=579772.6666666666, ans=0.0 2024-09-24 20:14:24,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=12.0 2024-09-24 20:14:28,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=579819.3333333334, ans=0.0 2024-09-24 20:14:32,001 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:14:47,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=579866.0, ans=0.125 2024-09-24 20:14:59,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=579912.6666666666, ans=0.125 2024-09-24 20:15:01,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-09-24 20:15:06,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=579912.6666666666, ans=0.0 2024-09-24 20:15:10,636 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.299e+02 1.417e+02 1.544e+02 2.266e+02, threshold=2.834e+02, percent-clipped=0.0 2024-09-24 20:15:10,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579912.6666666666, ans=0.1 2024-09-24 20:15:13,760 INFO [train.py:1198] (0/4) Epoch 32, batch 3500, loss[loss=0.2286, ctc_loss=0.1486, cr_loss=0.4004, over 17222.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1279, cr_loss=0.3451, over 3359815.60 frames. ], batch size: 55, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:15:16,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2024-09-24 20:15:56,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=580052.6666666666, ans=0.125 2024-09-24 20:16:32,168 INFO [train.py:1198] (0/4) Epoch 32, batch 3550, loss[loss=0.2234, ctc_loss=0.1461, cr_loss=0.3865, over 17314.00 frames. ], tot_loss[loss=0.1978, ctc_loss=0.1286, cr_loss=0.346, over 3352745.99 frames. ], batch size: 49, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:16:32,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=580192.6666666666, ans=0.025 2024-09-24 20:16:56,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=580239.3333333334, ans=0.09899494936611666 2024-09-24 20:17:14,344 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.92 vs. limit=10.0 2024-09-24 20:17:19,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=8.0 2024-09-24 20:17:34,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=580332.6666666666, ans=0.125 2024-09-24 20:17:42,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=580379.3333333334, ans=0.0 2024-09-24 20:17:51,751 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.264e+02 1.345e+02 1.464e+02 2.321e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-24 20:17:54,937 INFO [train.py:1198] (0/4) Epoch 32, batch 3600, loss[loss=0.2254, ctc_loss=0.1491, cr_loss=0.3815, over 15215.00 frames. ], tot_loss[loss=0.1976, ctc_loss=0.1283, cr_loss=0.3463, over 3360326.55 frames. ], batch size: 89, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:18:31,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=580519.3333333334, ans=0.0 2024-09-24 20:18:55,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=580566.0, ans=0.125 2024-09-24 20:18:56,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=580566.0, ans=0.0 2024-09-24 20:18:56,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=580566.0, ans=0.09899494936611666 2024-09-24 20:19:15,282 INFO [train.py:1198] (0/4) Epoch 32, batch 3650, loss[loss=0.1637, ctc_loss=0.1024, cr_loss=0.3064, over 17097.00 frames. ], tot_loss[loss=0.1986, ctc_loss=0.129, cr_loss=0.3477, over 3363052.54 frames. ], batch size: 43, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:19:42,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.23 vs. limit=12.0 2024-09-24 20:19:45,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=580752.6666666666, ans=0.125 2024-09-24 20:20:22,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=580846.0, ans=0.125 2024-09-24 20:20:30,691 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.254e+02 1.367e+02 1.508e+02 1.700e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-24 20:20:33,725 INFO [train.py:1198] (0/4) Epoch 32, batch 3700, loss[loss=0.2058, ctc_loss=0.131, cr_loss=0.374, over 17272.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1285, cr_loss=0.3471, over 3367150.14 frames. ], batch size: 44, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:20:37,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=580892.6666666666, ans=0.5 2024-09-24 20:20:46,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=580892.6666666666, ans=0.025 2024-09-24 20:20:54,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580939.3333333334, ans=0.1 2024-09-24 20:21:10,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=580986.0, ans=0.125 2024-09-24 20:21:19,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=581032.6666666666, ans=0.025 2024-09-24 20:21:30,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=581032.6666666666, ans=0.125 2024-09-24 20:21:38,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=581079.3333333334, ans=0.0 2024-09-24 20:21:40,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2024-09-24 20:21:51,885 INFO [train.py:1198] (0/4) Epoch 32, batch 3750, loss[loss=0.1537, ctc_loss=0.09894, cr_loss=0.274, over 16261.00 frames. ], tot_loss[loss=0.1982, ctc_loss=0.1287, cr_loss=0.3471, over 3356866.99 frames. ], batch size: 36, lr: 3.70e-03, grad_scale: 32.0 2024-09-24 20:21:56,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=581126.0, ans=0.125 2024-09-24 20:22:05,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.61 vs. limit=10.0 2024-09-24 20:22:09,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=581172.6666666666, ans=0.5 2024-09-24 20:22:31,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-09-24 20:22:42,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=581266.0, ans=0.0 2024-09-24 20:23:06,879 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.283e+02 1.375e+02 1.482e+02 1.854e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-24 20:23:10,821 INFO [train.py:1198] (0/4) Epoch 32, batch 3800, loss[loss=0.1796, ctc_loss=0.1143, cr_loss=0.3263, over 17030.00 frames. ], tot_loss[loss=0.1978, ctc_loss=0.1285, cr_loss=0.3466, over 3338158.88 frames. ], batch size: 39, lr: 3.69e-03, grad_scale: 32.0 2024-09-24 20:23:18,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=581359.3333333334, ans=0.0 2024-09-24 20:23:52,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=581452.6666666666, ans=0.125 2024-09-24 20:24:06,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=581499.3333333334, ans=0.2 2024-09-24 20:24:08,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581499.3333333334, ans=0.1 2024-09-24 20:24:25,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-09-24 20:24:28,215 INFO [train.py:1198] (0/4) Epoch 32, batch 3850, loss[loss=0.2755, ctc_loss=0.1945, cr_loss=0.4051, over 11316.00 frames. ], tot_loss[loss=0.201, ctc_loss=0.1312, cr_loss=0.3492, over 3256036.92 frames. ], batch size: 123, lr: 3.69e-03, grad_scale: 32.0 2024-09-24 20:24:40,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=581592.6666666666, ans=0.025 2024-09-24 20:24:57,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=581686.0, ans=0.125 2024-09-24 20:25:09,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=581686.0, ans=0.125 2024-09-24 20:25:17,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=581732.6666666666, ans=0.2 2024-09-24 20:25:31,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581779.3333333334, ans=0.1 2024-09-24 20:25:38,737 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-32.pt 2024-09-24 20:26:29,989 INFO [train.py:1198] (0/4) Epoch 33, batch 0, loss[loss=0.2033, ctc_loss=0.1324, cr_loss=0.3542, over 15854.00 frames. ], tot_loss[loss=0.2033, ctc_loss=0.1324, cr_loss=0.3542, over 15854.00 frames. ], batch size: 74, lr: 3.64e-03, grad_scale: 32.0 2024-09-24 20:26:29,990 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 20:26:46,729 INFO [train.py:1230] (0/4) Epoch 33, validation: loss=0.03608, ctc_loss=0.03608, cr_loss=9.001e-15, over 944034.00 frames. 2024-09-24 20:26:46,730 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 20:26:52,646 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.455e+02 1.559e+02 1.655e+02 2.375e+02, threshold=3.119e+02, percent-clipped=0.0 2024-09-24 20:27:17,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.96 vs. limit=22.5 2024-09-24 20:27:22,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=12.0 2024-09-24 20:27:28,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581900.6666666666, ans=0.1 2024-09-24 20:27:41,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=581947.3333333334, ans=0.0 2024-09-24 20:27:42,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=581947.3333333334, ans=0.0 2024-09-24 20:27:52,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=581994.0, ans=0.2 2024-09-24 20:28:09,442 INFO [train.py:1198] (0/4) Epoch 33, batch 50, loss[loss=0.1701, ctc_loss=0.1071, cr_loss=0.315, over 17044.00 frames. ], tot_loss[loss=0.199, ctc_loss=0.1293, cr_loss=0.3485, over 750773.36 frames. ], batch size: 39, lr: 3.64e-03, grad_scale: 32.0 2024-09-24 20:29:16,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=582227.3333333334, ans=0.0 2024-09-24 20:29:17,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=582227.3333333334, ans=0.125 2024-09-24 20:29:22,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=582227.3333333334, ans=0.125 2024-09-24 20:29:31,543 INFO [train.py:1198] (0/4) Epoch 33, batch 100, loss[loss=0.192, ctc_loss=0.1231, cr_loss=0.3445, over 17065.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1289, cr_loss=0.3456, over 1330650.12 frames. ], batch size: 46, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:29:34,671 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.274e+02 1.348e+02 1.462e+02 2.671e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-24 20:29:36,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=582274.0, ans=0.0 2024-09-24 20:29:42,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=582274.0, ans=0.125 2024-09-24 20:29:52,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=582320.6666666666, ans=0.2 2024-09-24 20:30:20,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=582414.0, ans=0.125 2024-09-24 20:30:23,806 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2024-09-24 20:30:26,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=582414.0, ans=0.025 2024-09-24 20:30:35,697 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=582414.0, ans=0.125 2024-09-24 20:30:45,277 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:30:54,480 INFO [train.py:1198] (0/4) Epoch 33, batch 150, loss[loss=0.2172, ctc_loss=0.1439, cr_loss=0.3661, over 17050.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1271, cr_loss=0.3421, over 1788390.65 frames. ], batch size: 56, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:31:04,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=582507.3333333334, ans=0.0 2024-09-24 20:31:06,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=582507.3333333334, ans=0.5 2024-09-24 20:31:09,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=582554.0, ans=0.125 2024-09-24 20:32:11,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=582694.0, ans=0.0 2024-09-24 20:32:15,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-09-24 20:32:15,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=582694.0, ans=0.025 2024-09-24 20:32:20,279 INFO [train.py:1198] (0/4) Epoch 33, batch 200, loss[loss=0.2057, ctc_loss=0.1342, cr_loss=0.3574, over 17055.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1265, cr_loss=0.3412, over 2143354.46 frames. ], batch size: 46, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:32:23,430 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.261e+02 1.350e+02 1.462e+02 5.443e+02, threshold=2.700e+02, percent-clipped=2.0 2024-09-24 20:32:40,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-09-24 20:32:55,756 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:33:02,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=582834.0, ans=10.0 2024-09-24 20:33:18,022 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:33:22,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=582927.3333333334, ans=0.2 2024-09-24 20:33:37,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=15.0 2024-09-24 20:33:42,549 INFO [train.py:1198] (0/4) Epoch 33, batch 250, loss[loss=0.2242, ctc_loss=0.1471, cr_loss=0.385, over 17201.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.128, cr_loss=0.3449, over 2407454.88 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:33:42,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=582974.0, ans=0.2 2024-09-24 20:33:53,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=582974.0, ans=0.125 2024-09-24 20:34:07,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583020.6666666666, ans=0.1 2024-09-24 20:34:09,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=583020.6666666666, ans=0.2 2024-09-24 20:34:31,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583114.0, ans=0.1 2024-09-24 20:34:52,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=583160.6666666666, ans=0.0 2024-09-24 20:35:01,850 INFO [train.py:1198] (0/4) Epoch 33, batch 300, loss[loss=0.2247, ctc_loss=0.1478, cr_loss=0.3844, over 16765.00 frames. ], tot_loss[loss=0.1994, ctc_loss=0.1297, cr_loss=0.3484, over 2620300.13 frames. ], batch size: 61, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:35:03,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=583207.3333333334, ans=0.025 2024-09-24 20:35:05,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.274e+02 1.352e+02 1.473e+02 1.783e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-24 20:35:05,755 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=12.0 2024-09-24 20:35:07,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.57 vs. limit=6.0 2024-09-24 20:35:37,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583300.6666666666, ans=0.1 2024-09-24 20:35:48,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=583300.6666666666, ans=0.0 2024-09-24 20:36:08,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=22.5 2024-09-24 20:36:15,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=583394.0, ans=0.125 2024-09-24 20:36:20,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=583394.0, ans=0.0 2024-09-24 20:36:22,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-24 20:36:25,029 INFO [train.py:1198] (0/4) Epoch 33, batch 350, loss[loss=0.2191, ctc_loss=0.1424, cr_loss=0.3836, over 17142.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.1281, cr_loss=0.3453, over 2788691.35 frames. ], batch size: 48, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:36:28,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=583440.6666666666, ans=0.125 2024-09-24 20:36:28,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=583440.6666666666, ans=0.05 2024-09-24 20:36:32,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=583440.6666666666, ans=0.125 2024-09-24 20:36:37,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=12.0 2024-09-24 20:36:54,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=583487.3333333334, ans=0.0 2024-09-24 20:37:22,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=583580.6666666666, ans=0.125 2024-09-24 20:37:50,368 INFO [train.py:1198] (0/4) Epoch 33, batch 400, loss[loss=0.1972, ctc_loss=0.1275, cr_loss=0.3486, over 17302.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1287, cr_loss=0.3465, over 2911393.70 frames. ], batch size: 49, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:37:53,545 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.274e+02 1.352e+02 1.470e+02 2.470e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-24 20:38:00,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=583674.0, ans=0.0 2024-09-24 20:38:00,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=583674.0, ans=0.2 2024-09-24 20:38:16,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=583720.6666666666, ans=0.2 2024-09-24 20:38:19,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=583720.6666666666, ans=0.125 2024-09-24 20:38:30,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583767.3333333334, ans=0.1 2024-09-24 20:39:12,691 INFO [train.py:1198] (0/4) Epoch 33, batch 450, loss[loss=0.1774, ctc_loss=0.1133, cr_loss=0.3204, over 17157.00 frames. ], tot_loss[loss=0.1989, ctc_loss=0.1295, cr_loss=0.3472, over 3003990.86 frames. ], batch size: 45, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:39:27,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=583954.0, ans=0.125 2024-09-24 20:40:15,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=584094.0, ans=0.035 2024-09-24 20:40:35,212 INFO [train.py:1198] (0/4) Epoch 33, batch 500, loss[loss=0.2408, ctc_loss=0.1585, cr_loss=0.4117, over 15118.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1288, cr_loss=0.346, over 3080189.85 frames. ], batch size: 89, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:40:38,396 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.260e+02 1.369e+02 1.441e+02 1.904e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-24 20:41:00,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=584187.3333333334, ans=0.1 2024-09-24 20:41:10,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=22.5 2024-09-24 20:41:15,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=584234.0, ans=0.0 2024-09-24 20:41:19,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2024-09-24 20:41:59,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=584374.0, ans=0.125 2024-09-24 20:42:00,693 INFO [train.py:1198] (0/4) Epoch 33, batch 550, loss[loss=0.2508, ctc_loss=0.1763, cr_loss=0.3724, over 11786.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1288, cr_loss=0.3454, over 3137188.88 frames. ], batch size: 123, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:42:09,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=584374.0, ans=0.125 2024-09-24 20:42:12,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=584374.0, ans=15.0 2024-09-24 20:42:25,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584420.6666666666, ans=0.1 2024-09-24 20:42:53,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584514.0, ans=0.1 2024-09-24 20:42:57,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584514.0, ans=0.1 2024-09-24 20:43:00,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=584514.0, ans=0.125 2024-09-24 20:43:00,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=584514.0, ans=0.2 2024-09-24 20:43:13,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=584560.6666666666, ans=0.05 2024-09-24 20:43:20,876 INFO [train.py:1198] (0/4) Epoch 33, batch 600, loss[loss=0.2096, ctc_loss=0.1372, cr_loss=0.3624, over 17050.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1282, cr_loss=0.3447, over 3190567.19 frames. ], batch size: 46, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:43:26,770 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.269e+02 1.358e+02 1.468e+02 2.109e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-24 20:43:28,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=584607.3333333334, ans=0.025 2024-09-24 20:43:46,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=584654.0, ans=0.0 2024-09-24 20:43:46,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=584654.0, ans=0.2 2024-09-24 20:44:13,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=584747.3333333334, ans=0.125 2024-09-24 20:44:30,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-24 20:44:42,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=584840.6666666666, ans=0.0 2024-09-24 20:44:43,687 INFO [train.py:1198] (0/4) Epoch 33, batch 650, loss[loss=0.2298, ctc_loss=0.151, cr_loss=0.3941, over 17241.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1275, cr_loss=0.3442, over 3236006.44 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 32.0 2024-09-24 20:45:25,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.99 vs. limit=10.0 2024-09-24 20:45:34,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=584980.6666666666, ans=0.125 2024-09-24 20:45:55,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=585027.3333333334, ans=0.0 2024-09-24 20:46:06,220 INFO [train.py:1198] (0/4) Epoch 33, batch 700, loss[loss=0.2122, ctc_loss=0.1415, cr_loss=0.3536, over 17237.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1271, cr_loss=0.3436, over 3267444.79 frames. ], batch size: 55, lr: 3.63e-03, grad_scale: 16.0 2024-09-24 20:46:08,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=585074.0, ans=0.125 2024-09-24 20:46:10,912 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.247e+02 1.323e+02 1.439e+02 2.225e+02, threshold=2.645e+02, percent-clipped=0.0 2024-09-24 20:46:14,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=585074.0, ans=0.025 2024-09-24 20:46:28,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585120.6666666666, ans=0.1 2024-09-24 20:47:04,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=585214.0, ans=0.0 2024-09-24 20:47:09,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=585214.0, ans=0.04949747468305833 2024-09-24 20:47:12,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=585214.0, ans=0.0 2024-09-24 20:47:15,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585260.6666666666, ans=0.1 2024-09-24 20:47:17,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=585260.6666666666, ans=0.125 2024-09-24 20:47:20,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=585260.6666666666, ans=0.125 2024-09-24 20:47:22,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=585260.6666666666, ans=0.0 2024-09-24 20:47:31,645 INFO [train.py:1198] (0/4) Epoch 33, batch 750, loss[loss=0.1742, ctc_loss=0.1111, cr_loss=0.3155, over 17248.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.127, cr_loss=0.3443, over 3295258.72 frames. ], batch size: 44, lr: 3.63e-03, grad_scale: 16.0 2024-09-24 20:47:41,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=585307.3333333334, ans=0.0 2024-09-24 20:47:43,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2024-09-24 20:47:53,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.72 vs. limit=10.0 2024-09-24 20:47:54,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=585354.0, ans=0.125 2024-09-24 20:48:01,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=585400.6666666666, ans=0.0 2024-09-24 20:48:03,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=585400.6666666666, ans=0.125 2024-09-24 20:48:10,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=585400.6666666666, ans=0.0 2024-09-24 20:48:40,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=585494.0, ans=0.025 2024-09-24 20:48:53,865 INFO [train.py:1198] (0/4) Epoch 33, batch 800, loss[loss=0.171, ctc_loss=0.1066, cr_loss=0.3218, over 16207.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.127, cr_loss=0.3443, over 3297220.53 frames. ], batch size: 36, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:48:58,603 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.263e+02 1.335e+02 1.404e+02 2.398e+02, threshold=2.669e+02, percent-clipped=0.0 2024-09-24 20:49:40,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=585680.6666666666, ans=0.0 2024-09-24 20:50:01,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=585727.3333333334, ans=0.025 2024-09-24 20:50:14,055 INFO [train.py:1198] (0/4) Epoch 33, batch 850, loss[loss=0.1808, ctc_loss=0.1172, cr_loss=0.3179, over 16936.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1276, cr_loss=0.3458, over 3312359.02 frames. ], batch size: 42, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:50:16,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=585774.0, ans=0.125 2024-09-24 20:50:25,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=585774.0, ans=0.0 2024-09-24 20:50:39,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=585820.6666666666, ans=0.0 2024-09-24 20:50:53,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585867.3333333334, ans=0.1 2024-09-24 20:50:56,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=585867.3333333334, ans=0.125 2024-09-24 20:51:14,799 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-09-24 20:51:39,162 INFO [train.py:1198] (0/4) Epoch 33, batch 900, loss[loss=0.1757, ctc_loss=0.1117, cr_loss=0.3201, over 17218.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1278, cr_loss=0.3459, over 3324029.16 frames. ], batch size: 47, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:51:44,053 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.262e+02 1.327e+02 1.436e+02 1.975e+02, threshold=2.653e+02, percent-clipped=0.0 2024-09-24 20:52:19,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=586100.6666666666, ans=10.0 2024-09-24 20:52:45,545 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-24 20:52:59,226 INFO [train.py:1198] (0/4) Epoch 33, batch 950, loss[loss=0.2291, ctc_loss=0.1496, cr_loss=0.3972, over 17206.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1274, cr_loss=0.3448, over 3340255.50 frames. ], batch size: 55, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:53:10,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=586240.6666666666, ans=0.125 2024-09-24 20:53:33,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=586334.0, ans=0.0 2024-09-24 20:53:38,773 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 20:54:21,210 INFO [train.py:1198] (0/4) Epoch 33, batch 1000, loss[loss=0.2329, ctc_loss=0.1565, cr_loss=0.3819, over 11955.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1274, cr_loss=0.345, over 3344384.75 frames. ], batch size: 124, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:54:26,142 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.300e+02 1.389e+02 1.469e+02 2.721e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-24 20:54:28,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586474.0, ans=0.1 2024-09-24 20:54:32,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=586474.0, ans=0.125 2024-09-24 20:54:52,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=586567.3333333334, ans=0.0 2024-09-24 20:55:27,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=586660.6666666666, ans=0.125 2024-09-24 20:55:29,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=586660.6666666666, ans=0.125 2024-09-24 20:55:30,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=586660.6666666666, ans=0.0 2024-09-24 20:55:44,468 INFO [train.py:1198] (0/4) Epoch 33, batch 1050, loss[loss=0.2087, ctc_loss=0.1371, cr_loss=0.3579, over 17314.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1268, cr_loss=0.3439, over 3350744.55 frames. ], batch size: 49, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:56:59,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586894.0, ans=0.1 2024-09-24 20:57:01,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=586894.0, ans=0.125 2024-09-24 20:57:09,128 INFO [train.py:1198] (0/4) Epoch 33, batch 1100, loss[loss=0.2265, ctc_loss=0.1512, cr_loss=0.3761, over 14996.00 frames. ], tot_loss[loss=0.1961, ctc_loss=0.1271, cr_loss=0.3447, over 3353917.28 frames. ], batch size: 89, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:57:13,959 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.249e+02 1.328e+02 1.446e+02 1.774e+02, threshold=2.656e+02, percent-clipped=0.0 2024-09-24 20:57:19,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=586940.6666666666, ans=0.125 2024-09-24 20:57:56,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=587080.6666666666, ans=0.125 2024-09-24 20:57:56,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2024-09-24 20:58:05,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=587080.6666666666, ans=0.125 2024-09-24 20:58:32,052 INFO [train.py:1198] (0/4) Epoch 33, batch 1150, loss[loss=0.213, ctc_loss=0.1398, cr_loss=0.3662, over 15905.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1279, cr_loss=0.3457, over 3354660.84 frames. ], batch size: 74, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:58:56,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=587220.6666666666, ans=0.025 2024-09-24 20:58:57,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=587220.6666666666, ans=0.2 2024-09-24 20:58:58,880 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.06 vs. limit=15.0 2024-09-24 20:59:08,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=587267.3333333334, ans=0.125 2024-09-24 20:59:16,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587267.3333333334, ans=0.1 2024-09-24 20:59:21,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=587314.0, ans=0.125 2024-09-24 20:59:51,716 INFO [train.py:1198] (0/4) Epoch 33, batch 1200, loss[loss=0.2006, ctc_loss=0.1328, cr_loss=0.3392, over 17285.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1276, cr_loss=0.3454, over 3356323.93 frames. ], batch size: 51, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 20:59:56,480 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.281e+02 1.363e+02 1.447e+02 2.223e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-24 21:00:10,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-24 21:00:26,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=587500.6666666666, ans=0.1 2024-09-24 21:00:54,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=587547.3333333334, ans=0.125 2024-09-24 21:01:02,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=22.5 2024-09-24 21:01:10,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.54 vs. limit=10.0 2024-09-24 21:01:13,624 INFO [train.py:1198] (0/4) Epoch 33, batch 1250, loss[loss=0.1547, ctc_loss=0.09645, cr_loss=0.2913, over 17127.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1271, cr_loss=0.3446, over 3355296.28 frames. ], batch size: 40, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:02:11,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=587780.6666666666, ans=0.125 2024-09-24 21:02:37,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=587874.0, ans=0.2 2024-09-24 21:02:38,675 INFO [train.py:1198] (0/4) Epoch 33, batch 1300, loss[loss=0.173, ctc_loss=0.1104, cr_loss=0.313, over 17099.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1276, cr_loss=0.3453, over 3359702.60 frames. ], batch size: 49, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:02:38,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=587874.0, ans=0.125 2024-09-24 21:02:43,488 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.064e+02 1.271e+02 1.359e+02 1.460e+02 2.870e+02, threshold=2.719e+02, percent-clipped=1.0 2024-09-24 21:02:50,206 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:02:50,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2024-09-24 21:02:54,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=587920.6666666666, ans=0.125 2024-09-24 21:03:11,774 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2024-09-24 21:03:38,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=588014.0, ans=0.0 2024-09-24 21:03:39,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=588014.0, ans=0.0 2024-09-24 21:03:42,220 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:03:56,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588060.6666666666, ans=0.1 2024-09-24 21:04:00,915 INFO [train.py:1198] (0/4) Epoch 33, batch 1350, loss[loss=0.1786, ctc_loss=0.1161, cr_loss=0.3123, over 16972.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1268, cr_loss=0.3441, over 3366242.26 frames. ], batch size: 42, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:04:24,457 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2024-09-24 21:05:10,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=588294.0, ans=0.0 2024-09-24 21:05:20,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=588340.6666666666, ans=0.1 2024-09-24 21:05:21,445 INFO [train.py:1198] (0/4) Epoch 33, batch 1400, loss[loss=0.162, ctc_loss=0.1045, cr_loss=0.2872, over 16967.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1267, cr_loss=0.3433, over 3361684.64 frames. ], batch size: 42, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:05:26,392 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.268e+02 1.325e+02 1.452e+02 1.895e+02, threshold=2.651e+02, percent-clipped=0.0 2024-09-24 21:05:42,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.10 vs. limit=12.0 2024-09-24 21:05:54,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588434.0, ans=0.1 2024-09-24 21:05:56,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2024-09-24 21:06:01,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=588434.0, ans=0.2 2024-09-24 21:06:01,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=588434.0, ans=0.0 2024-09-24 21:06:49,518 INFO [train.py:1198] (0/4) Epoch 33, batch 1450, loss[loss=0.1656, ctc_loss=0.105, cr_loss=0.3028, over 17043.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1263, cr_loss=0.3427, over 3365196.40 frames. ], batch size: 44, lr: 3.62e-03, grad_scale: 32.0 2024-09-24 21:07:12,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=588620.6666666666, ans=0.125 2024-09-24 21:08:12,482 INFO [train.py:1198] (0/4) Epoch 33, batch 1500, loss[loss=0.1633, ctc_loss=0.1041, cr_loss=0.2956, over 16973.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1269, cr_loss=0.3437, over 3363029.59 frames. ], batch size: 42, lr: 3.61e-03, grad_scale: 32.0 2024-09-24 21:08:14,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=588807.3333333334, ans=0.0 2024-09-24 21:08:17,327 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.288e+02 1.351e+02 1.447e+02 2.720e+02, threshold=2.702e+02, percent-clipped=1.0 2024-09-24 21:08:22,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=588807.3333333334, ans=0.025 2024-09-24 21:08:33,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=588854.0, ans=0.025 2024-09-24 21:08:40,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=588854.0, ans=0.0 2024-09-24 21:08:44,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=588900.6666666666, ans=0.125 2024-09-24 21:09:04,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588947.3333333334, ans=0.1 2024-09-24 21:09:16,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=588994.0, ans=0.125 2024-09-24 21:09:32,675 INFO [train.py:1198] (0/4) Epoch 33, batch 1550, loss[loss=0.2005, ctc_loss=0.1292, cr_loss=0.3562, over 17313.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1267, cr_loss=0.3427, over 3350176.34 frames. ], batch size: 49, lr: 3.61e-03, grad_scale: 32.0 2024-09-24 21:09:44,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=589040.6666666666, ans=0.125 2024-09-24 21:09:52,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=589087.3333333334, ans=0.125 2024-09-24 21:09:53,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=589087.3333333334, ans=0.125 2024-09-24 21:10:08,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-09-24 21:10:13,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=12.0 2024-09-24 21:10:16,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=589134.0, ans=0.0 2024-09-24 21:10:26,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589180.6666666666, ans=0.1 2024-09-24 21:10:32,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=589180.6666666666, ans=0.125 2024-09-24 21:10:37,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=589227.3333333334, ans=0.125 2024-09-24 21:10:55,335 INFO [train.py:1198] (0/4) Epoch 33, batch 1600, loss[loss=0.1971, ctc_loss=0.131, cr_loss=0.3309, over 17226.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1267, cr_loss=0.3426, over 3353052.15 frames. ], batch size: 50, lr: 3.61e-03, grad_scale: 32.0 2024-09-24 21:11:00,335 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.243e+02 1.338e+02 1.455e+02 2.020e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-24 21:11:11,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=589320.6666666666, ans=0.125 2024-09-24 21:11:11,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2024-09-24 21:11:13,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=589320.6666666666, ans=0.2 2024-09-24 21:11:42,595 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-24 21:11:51,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=589414.0, ans=0.125 2024-09-24 21:12:00,192 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=22.5 2024-09-24 21:12:15,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=589460.6666666666, ans=0.025 2024-09-24 21:12:17,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2024-09-24 21:12:20,303 INFO [train.py:1198] (0/4) Epoch 33, batch 1650, loss[loss=0.1837, ctc_loss=0.1166, cr_loss=0.3357, over 17223.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.127, cr_loss=0.3428, over 3357061.90 frames. ], batch size: 41, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:12:30,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2024-09-24 21:12:57,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=589600.6666666666, ans=0.125 2024-09-24 21:13:12,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=589647.3333333334, ans=0.125 2024-09-24 21:13:35,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589694.0, ans=0.1 2024-09-24 21:13:43,023 INFO [train.py:1198] (0/4) Epoch 33, batch 1700, loss[loss=0.1962, ctc_loss=0.1285, cr_loss=0.3384, over 17032.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1271, cr_loss=0.3425, over 3346970.89 frames. ], batch size: 52, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:13:49,392 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.055e+02 1.272e+02 1.348e+02 1.489e+02 2.581e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-24 21:13:51,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=589740.6666666666, ans=0.125 2024-09-24 21:14:01,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=589787.3333333334, ans=0.2 2024-09-24 21:14:13,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=589834.0, ans=0.125 2024-09-24 21:14:30,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=589880.6666666666, ans=0.125 2024-09-24 21:14:52,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=589927.3333333334, ans=0.2 2024-09-24 21:15:03,863 INFO [train.py:1198] (0/4) Epoch 33, batch 1750, loss[loss=0.202, ctc_loss=0.1305, cr_loss=0.3573, over 17091.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1276, cr_loss=0.3434, over 3355366.71 frames. ], batch size: 43, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:15:10,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=589974.0, ans=0.2 2024-09-24 21:15:12,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=589974.0, ans=0.125 2024-09-24 21:15:18,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-09-24 21:15:32,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=590020.6666666666, ans=0.07 2024-09-24 21:15:46,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=590067.3333333334, ans=0.2 2024-09-24 21:15:52,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=590114.0, ans=0.0 2024-09-24 21:16:19,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=590160.6666666666, ans=0.2 2024-09-24 21:16:30,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=590207.3333333334, ans=0.2 2024-09-24 21:16:31,142 INFO [train.py:1198] (0/4) Epoch 33, batch 1800, loss[loss=0.2006, ctc_loss=0.1382, cr_loss=0.3122, over 11944.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1281, cr_loss=0.3439, over 3354058.99 frames. ], batch size: 124, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:16:37,338 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.268e+02 1.341e+02 1.421e+02 2.295e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-24 21:17:09,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590300.6666666666, ans=0.1 2024-09-24 21:17:50,741 INFO [train.py:1198] (0/4) Epoch 33, batch 1850, loss[loss=0.2109, ctc_loss=0.1372, cr_loss=0.3685, over 17365.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1275, cr_loss=0.3428, over 3359836.00 frames. ], batch size: 48, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:18:12,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2024-09-24 21:18:24,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-09-24 21:18:38,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=590534.0, ans=0.125 2024-09-24 21:18:55,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=590627.3333333334, ans=0.0 2024-09-24 21:19:13,089 INFO [train.py:1198] (0/4) Epoch 33, batch 1900, loss[loss=0.1806, ctc_loss=0.116, cr_loss=0.3235, over 17100.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1268, cr_loss=0.3423, over 3370267.13 frames. ], batch size: 40, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:19:13,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-09-24 21:19:19,405 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.258e+02 1.359e+02 1.471e+02 2.658e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-24 21:19:19,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=590674.0, ans=0.125 2024-09-24 21:19:43,883 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=590767.3333333334, ans=0.0 2024-09-24 21:19:50,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=590767.3333333334, ans=0.2 2024-09-24 21:20:30,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=590860.6666666666, ans=0.2 2024-09-24 21:20:33,376 INFO [train.py:1198] (0/4) Epoch 33, batch 1950, loss[loss=0.2101, ctc_loss=0.1363, cr_loss=0.3692, over 17207.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1265, cr_loss=0.3423, over 3376820.09 frames. ], batch size: 55, lr: 3.61e-03, grad_scale: 8.0 2024-09-24 21:20:36,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=590907.3333333334, ans=0.125 2024-09-24 21:21:25,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=591000.6666666666, ans=0.125 2024-09-24 21:21:32,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=591047.3333333334, ans=0.07 2024-09-24 21:21:59,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=591140.6666666666, ans=0.125 2024-09-24 21:22:00,619 INFO [train.py:1198] (0/4) Epoch 33, batch 2000, loss[loss=0.2068, ctc_loss=0.1365, cr_loss=0.3514, over 16777.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1274, cr_loss=0.3439, over 3373560.38 frames. ], batch size: 61, lr: 3.61e-03, grad_scale: 16.0 2024-09-24 21:22:08,625 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.300e+02 1.359e+02 1.463e+02 1.672e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-24 21:22:46,295 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2024-09-24 21:22:48,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=591280.6666666666, ans=0.125 2024-09-24 21:22:49,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2024-09-24 21:23:15,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=15.0 2024-09-24 21:23:22,763 INFO [train.py:1198] (0/4) Epoch 33, batch 2050, loss[loss=0.2058, ctc_loss=0.135, cr_loss=0.3544, over 17293.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.127, cr_loss=0.3431, over 3372633.16 frames. ], batch size: 51, lr: 3.61e-03, grad_scale: 8.0 2024-09-24 21:23:26,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=591374.0, ans=0.0 2024-09-24 21:23:34,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=591374.0, ans=0.125 2024-09-24 21:23:41,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2024-09-24 21:23:49,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=22.5 2024-09-24 21:24:14,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=591514.0, ans=0.025 2024-09-24 21:24:22,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591514.0, ans=0.1 2024-09-24 21:24:39,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=591560.6666666666, ans=0.025 2024-09-24 21:24:42,771 INFO [train.py:1198] (0/4) Epoch 33, batch 2100, loss[loss=0.183, ctc_loss=0.1172, cr_loss=0.329, over 17239.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1264, cr_loss=0.3419, over 3373569.45 frames. ], batch size: 50, lr: 3.61e-03, grad_scale: 8.0 2024-09-24 21:24:52,446 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.286e+02 1.370e+02 1.480e+02 2.165e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-24 21:25:20,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=591700.6666666666, ans=0.0 2024-09-24 21:26:01,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.79 vs. limit=10.0 2024-09-24 21:26:05,936 INFO [train.py:1198] (0/4) Epoch 33, batch 2150, loss[loss=0.2347, ctc_loss=0.1537, cr_loss=0.4047, over 14866.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1266, cr_loss=0.3427, over 3367018.14 frames. ], batch size: 89, lr: 3.61e-03, grad_scale: 8.0 2024-09-24 21:26:13,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=591840.6666666666, ans=0.125 2024-09-24 21:26:37,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=591887.3333333334, ans=0.0 2024-09-24 21:27:04,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591980.6666666666, ans=0.1 2024-09-24 21:27:09,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=591980.6666666666, ans=0.125 2024-09-24 21:27:09,672 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:27:32,260 INFO [train.py:1198] (0/4) Epoch 33, batch 2200, loss[loss=0.1956, ctc_loss=0.1284, cr_loss=0.336, over 17230.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1269, cr_loss=0.3424, over 3354150.64 frames. ], batch size: 50, lr: 3.60e-03, grad_scale: 8.0 2024-09-24 21:27:41,951 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.250e+02 1.335e+02 1.414e+02 1.894e+02, threshold=2.669e+02, percent-clipped=0.0 2024-09-24 21:27:42,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=592074.0, ans=0.0 2024-09-24 21:27:47,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=592120.6666666666, ans=0.125 2024-09-24 21:27:53,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=592120.6666666666, ans=0.035 2024-09-24 21:27:55,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=592120.6666666666, ans=0.0 2024-09-24 21:28:10,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=592167.3333333334, ans=0.2 2024-09-24 21:28:31,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=592214.0, ans=0.0 2024-09-24 21:28:42,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=592260.6666666666, ans=0.5 2024-09-24 21:28:54,691 INFO [train.py:1198] (0/4) Epoch 33, batch 2250, loss[loss=0.2019, ctc_loss=0.1304, cr_loss=0.3578, over 16480.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.127, cr_loss=0.3425, over 3347573.28 frames. ], batch size: 66, lr: 3.60e-03, grad_scale: 8.0 2024-09-24 21:29:27,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=592400.6666666666, ans=0.0 2024-09-24 21:29:34,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=592400.6666666666, ans=6.0 2024-09-24 21:30:14,785 INFO [train.py:1198] (0/4) Epoch 33, batch 2300, loss[loss=0.205, ctc_loss=0.137, cr_loss=0.3396, over 17366.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1281, cr_loss=0.3438, over 3333173.35 frames. ], batch size: 52, lr: 3.60e-03, grad_scale: 8.0 2024-09-24 21:30:24,326 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.313e+02 1.398e+02 1.523e+02 4.380e+02, threshold=2.797e+02, percent-clipped=1.0 2024-09-24 21:30:56,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=592634.0, ans=0.125 2024-09-24 21:31:35,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592727.3333333334, ans=0.1 2024-09-24 21:31:43,091 INFO [train.py:1198] (0/4) Epoch 33, batch 2350, loss[loss=0.1745, ctc_loss=0.1123, cr_loss=0.3109, over 17164.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1278, cr_loss=0.3438, over 3339058.67 frames. ], batch size: 41, lr: 3.60e-03, grad_scale: 8.0 2024-09-24 21:31:48,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=12.0 2024-09-24 21:32:02,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=592820.6666666666, ans=0.125 2024-09-24 21:32:17,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=592867.3333333334, ans=0.1 2024-09-24 21:32:32,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=592914.0, ans=0.125 2024-09-24 21:32:57,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-09-24 21:33:00,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2024-09-24 21:33:05,030 INFO [train.py:1198] (0/4) Epoch 33, batch 2400, loss[loss=0.2028, ctc_loss=0.1337, cr_loss=0.3455, over 16004.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1278, cr_loss=0.3444, over 3338424.74 frames. ], batch size: 74, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:33:14,699 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.266e+02 1.359e+02 1.474e+02 2.316e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-24 21:33:24,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=593054.0, ans=0.95 2024-09-24 21:33:26,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=593054.0, ans=0.125 2024-09-24 21:33:35,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=593100.6666666666, ans=0.025 2024-09-24 21:33:45,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=15.0 2024-09-24 21:33:52,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-09-24 21:33:58,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=593147.3333333334, ans=0.2 2024-09-24 21:34:17,973 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2024-09-24 21:34:24,927 INFO [train.py:1198] (0/4) Epoch 33, batch 2450, loss[loss=0.2293, ctc_loss=0.1539, cr_loss=0.3769, over 14676.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3445, over 3346585.42 frames. ], batch size: 89, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:34:34,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593240.6666666666, ans=0.1 2024-09-24 21:34:42,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=593287.3333333334, ans=0.07 2024-09-24 21:35:08,734 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.09 vs. limit=15.0 2024-09-24 21:35:41,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=593427.3333333334, ans=0.1 2024-09-24 21:35:48,034 INFO [train.py:1198] (0/4) Epoch 33, batch 2500, loss[loss=0.1718, ctc_loss=0.1079, cr_loss=0.3192, over 17063.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1272, cr_loss=0.3434, over 3348416.19 frames. ], batch size: 39, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:36:00,300 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.280e+02 1.356e+02 1.432e+02 1.776e+02, threshold=2.712e+02, percent-clipped=0.0 2024-09-24 21:36:14,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=593520.6666666666, ans=0.1 2024-09-24 21:36:20,856 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:36:49,976 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2024-09-24 21:37:01,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=593660.6666666666, ans=0.0 2024-09-24 21:37:13,487 INFO [train.py:1198] (0/4) Epoch 33, batch 2550, loss[loss=0.2063, ctc_loss=0.1354, cr_loss=0.3545, over 17310.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.127, cr_loss=0.3431, over 3346129.33 frames. ], batch size: 49, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:37:28,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=593754.0, ans=0.0 2024-09-24 21:37:29,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=593754.0, ans=0.125 2024-09-24 21:38:35,207 INFO [train.py:1198] (0/4) Epoch 33, batch 2600, loss[loss=0.1837, ctc_loss=0.1189, cr_loss=0.3241, over 17019.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1267, cr_loss=0.3429, over 3343788.30 frames. ], batch size: 39, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:38:36,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-09-24 21:38:40,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=593940.6666666666, ans=0.2 2024-09-24 21:38:43,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=593940.6666666666, ans=0.0 2024-09-24 21:38:44,724 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.263e+02 1.356e+02 1.473e+02 2.021e+02, threshold=2.712e+02, percent-clipped=0.0 2024-09-24 21:38:46,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=593940.6666666666, ans=0.0 2024-09-24 21:39:08,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=594034.0, ans=0.0 2024-09-24 21:39:15,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=594034.0, ans=0.2 2024-09-24 21:39:15,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=12.0 2024-09-24 21:39:45,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=594127.3333333334, ans=0.2 2024-09-24 21:39:54,549 INFO [train.py:1198] (0/4) Epoch 33, batch 2650, loss[loss=0.1673, ctc_loss=0.1054, cr_loss=0.3094, over 17090.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1264, cr_loss=0.3423, over 3346239.05 frames. ], batch size: 43, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:39:54,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=594174.0, ans=0.2 2024-09-24 21:41:01,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=594314.0, ans=0.125 2024-09-24 21:41:22,177 INFO [train.py:1198] (0/4) Epoch 33, batch 2700, loss[loss=0.1661, ctc_loss=0.1079, cr_loss=0.2912, over 17074.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1266, cr_loss=0.3431, over 3349294.18 frames. ], batch size: 43, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:41:25,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594407.3333333334, ans=0.1 2024-09-24 21:41:30,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=594407.3333333334, ans=0.125 2024-09-24 21:41:31,687 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.245e+02 1.329e+02 1.418e+02 2.018e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-24 21:41:35,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=594407.3333333334, ans=0.125 2024-09-24 21:41:41,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=594454.0, ans=0.125 2024-09-24 21:41:51,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=594454.0, ans=0.0 2024-09-24 21:41:54,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594500.6666666666, ans=0.1 2024-09-24 21:41:59,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=594500.6666666666, ans=0.125 2024-09-24 21:42:31,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2024-09-24 21:42:32,702 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:42:44,869 INFO [train.py:1198] (0/4) Epoch 33, batch 2750, loss[loss=0.2094, ctc_loss=0.1376, cr_loss=0.3591, over 17244.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1275, cr_loss=0.3452, over 3356007.64 frames. ], batch size: 55, lr: 3.60e-03, grad_scale: 16.0 2024-09-24 21:42:53,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594640.6666666666, ans=0.1 2024-09-24 21:43:00,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2024-09-24 21:43:01,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=594687.3333333334, ans=0.1 2024-09-24 21:43:10,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=594687.3333333334, ans=0.125 2024-09-24 21:43:27,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-24 21:44:04,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.49 vs. limit=10.0 2024-09-24 21:44:04,860 INFO [train.py:1198] (0/4) Epoch 33, batch 2800, loss[loss=0.2051, ctc_loss=0.1325, cr_loss=0.3628, over 17021.00 frames. ], tot_loss[loss=0.198, ctc_loss=0.1286, cr_loss=0.3468, over 3346733.44 frames. ], batch size: 56, lr: 3.60e-03, grad_scale: 32.0 2024-09-24 21:44:13,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594874.0, ans=0.1 2024-09-24 21:44:14,278 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.285e+02 1.356e+02 1.483e+02 1.883e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-24 21:44:22,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=594920.6666666666, ans=0.05 2024-09-24 21:44:43,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=594967.3333333334, ans=0.0 2024-09-24 21:44:57,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=595014.0, ans=0.125 2024-09-24 21:45:10,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=595060.6666666666, ans=0.125 2024-09-24 21:45:15,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=595060.6666666666, ans=0.125 2024-09-24 21:45:20,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595060.6666666666, ans=0.1 2024-09-24 21:45:24,680 INFO [train.py:1198] (0/4) Epoch 33, batch 2850, loss[loss=0.2122, ctc_loss=0.1365, cr_loss=0.3788, over 15983.00 frames. ], tot_loss[loss=0.1978, ctc_loss=0.1286, cr_loss=0.3462, over 3331482.65 frames. ], batch size: 74, lr: 3.60e-03, grad_scale: 32.0 2024-09-24 21:46:04,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=595200.6666666666, ans=0.125 2024-09-24 21:46:23,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=595247.3333333334, ans=0.0 2024-09-24 21:46:34,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=595294.0, ans=0.025 2024-09-24 21:46:52,042 INFO [train.py:1198] (0/4) Epoch 33, batch 2900, loss[loss=0.2049, ctc_loss=0.1336, cr_loss=0.3568, over 17288.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.345, over 3338198.73 frames. ], batch size: 46, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:47:01,746 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.278e+02 1.378e+02 1.507e+02 2.269e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-24 21:47:14,920 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:47:43,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=595480.6666666666, ans=0.025 2024-09-24 21:47:53,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=595480.6666666666, ans=0.025 2024-09-24 21:48:01,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=595527.3333333334, ans=0.2 2024-09-24 21:48:15,323 INFO [train.py:1198] (0/4) Epoch 33, batch 2950, loss[loss=0.1905, ctc_loss=0.1224, cr_loss=0.3408, over 17305.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1278, cr_loss=0.3453, over 3339859.87 frames. ], batch size: 46, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:48:32,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=22.5 2024-09-24 21:48:41,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=595620.6666666666, ans=0.025 2024-09-24 21:48:46,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=595667.3333333334, ans=0.125 2024-09-24 21:48:52,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=595667.3333333334, ans=0.125 2024-09-24 21:49:06,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=595714.0, ans=0.0 2024-09-24 21:49:06,888 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 21:49:13,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2024-09-24 21:49:27,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-09-24 21:49:33,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=595807.3333333334, ans=0.125 2024-09-24 21:49:34,551 INFO [train.py:1198] (0/4) Epoch 33, batch 3000, loss[loss=0.1809, ctc_loss=0.1163, cr_loss=0.3226, over 17295.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1281, cr_loss=0.3466, over 3344302.12 frames. ], batch size: 46, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:49:34,552 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 21:49:50,277 INFO [train.py:1230] (0/4) Epoch 33, validation: loss=0.03597, ctc_loss=0.03597, cr_loss=9.382e-15, over 944034.00 frames. 2024-09-24 21:49:50,278 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 21:49:59,653 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.292e+02 1.353e+02 1.495e+02 2.152e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-24 21:50:04,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=595854.0, ans=0.2 2024-09-24 21:50:25,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=595900.6666666666, ans=0.025 2024-09-24 21:50:31,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2024-09-24 21:50:53,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.16 vs. limit=10.0 2024-09-24 21:50:58,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-09-24 21:51:08,636 INFO [train.py:1198] (0/4) Epoch 33, batch 3050, loss[loss=0.205, ctc_loss=0.1327, cr_loss=0.3615, over 17306.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1276, cr_loss=0.3463, over 3340935.51 frames. ], batch size: 46, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:51:15,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=596040.6666666666, ans=0.125 2024-09-24 21:51:53,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=596134.0, ans=0.0 2024-09-24 21:51:55,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=596134.0, ans=0.0 2024-09-24 21:51:56,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=596180.6666666666, ans=0.125 2024-09-24 21:51:58,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=596180.6666666666, ans=0.0 2024-09-24 21:52:07,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=596180.6666666666, ans=0.125 2024-09-24 21:52:34,217 INFO [train.py:1198] (0/4) Epoch 33, batch 3100, loss[loss=0.1748, ctc_loss=0.109, cr_loss=0.3294, over 17152.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.127, cr_loss=0.3449, over 3340071.93 frames. ], batch size: 40, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:52:39,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=596274.0, ans=0.0 2024-09-24 21:52:43,659 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.278e+02 1.346e+02 1.474e+02 1.974e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-24 21:52:56,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=596320.6666666666, ans=0.0 2024-09-24 21:53:17,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=596367.3333333334, ans=0.95 2024-09-24 21:53:28,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=596414.0, ans=0.025 2024-09-24 21:53:29,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-09-24 21:53:49,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-24 21:53:50,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=596460.6666666666, ans=0.125 2024-09-24 21:53:53,437 INFO [train.py:1198] (0/4) Epoch 33, batch 3150, loss[loss=0.2096, ctc_loss=0.1329, cr_loss=0.3834, over 17143.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1265, cr_loss=0.3439, over 3345383.47 frames. ], batch size: 48, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:53:58,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=596507.3333333334, ans=0.125 2024-09-24 21:54:34,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=596600.6666666666, ans=0.1 2024-09-24 21:54:45,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=596647.3333333334, ans=0.0 2024-09-24 21:55:11,609 INFO [train.py:1198] (0/4) Epoch 33, batch 3200, loss[loss=0.2053, ctc_loss=0.1355, cr_loss=0.3489, over 16460.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1266, cr_loss=0.3445, over 3357270.93 frames. ], batch size: 66, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:55:22,508 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.266e+02 1.354e+02 1.465e+02 1.945e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 21:55:46,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=596834.0, ans=0.1 2024-09-24 21:56:17,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=596927.3333333334, ans=0.2 2024-09-24 21:56:30,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=596974.0, ans=0.0 2024-09-24 21:56:31,870 INFO [train.py:1198] (0/4) Epoch 33, batch 3250, loss[loss=0.2031, ctc_loss=0.1324, cr_loss=0.3531, over 17067.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.127, cr_loss=0.3461, over 3363499.52 frames. ], batch size: 46, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:56:38,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596974.0, ans=0.1 2024-09-24 21:57:07,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=22.5 2024-09-24 21:57:18,593 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-24 21:57:26,236 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=15.0 2024-09-24 21:57:28,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=597114.0, ans=0.0 2024-09-24 21:57:41,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=597160.6666666666, ans=0.125 2024-09-24 21:57:41,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=597160.6666666666, ans=0.1 2024-09-24 21:57:45,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=597160.6666666666, ans=0.0 2024-09-24 21:57:49,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=597207.3333333334, ans=0.125 2024-09-24 21:57:50,337 INFO [train.py:1198] (0/4) Epoch 33, batch 3300, loss[loss=0.1846, ctc_loss=0.1207, cr_loss=0.3194, over 17303.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1269, cr_loss=0.3452, over 3356244.77 frames. ], batch size: 49, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:58:00,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=597207.3333333334, ans=0.0 2024-09-24 21:58:01,413 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.317e+02 1.415e+02 1.516e+02 3.233e+02, threshold=2.830e+02, percent-clipped=1.0 2024-09-24 21:58:19,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=597254.0, ans=0.125 2024-09-24 21:58:31,683 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-128000.pt 2024-09-24 21:58:37,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=597300.6666666666, ans=0.0 2024-09-24 21:59:11,413 INFO [train.py:1198] (0/4) Epoch 33, batch 3350, loss[loss=0.2527, ctc_loss=0.172, cr_loss=0.4036, over 14875.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1276, cr_loss=0.3458, over 3351629.53 frames. ], batch size: 89, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 21:59:13,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.66 vs. limit=10.0 2024-09-24 21:59:21,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=597440.6666666666, ans=0.09899494936611666 2024-09-24 21:59:23,583 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2024-09-24 21:59:28,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=597487.3333333334, ans=0.0 2024-09-24 22:00:00,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=597580.6666666666, ans=0.125 2024-09-24 22:00:08,658 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2024-09-24 22:00:26,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.65 vs. limit=15.0 2024-09-24 22:00:29,915 INFO [train.py:1198] (0/4) Epoch 33, batch 3400, loss[loss=0.2173, ctc_loss=0.1431, cr_loss=0.3709, over 17302.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.128, cr_loss=0.347, over 3358885.51 frames. ], batch size: 49, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:00:32,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.56 vs. limit=6.0 2024-09-24 22:00:34,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=597674.0, ans=0.2 2024-09-24 22:00:40,771 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.249e+02 1.331e+02 1.438e+02 2.060e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-24 22:00:46,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=22.5 2024-09-24 22:01:06,490 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=22.5 2024-09-24 22:01:42,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=15.0 2024-09-24 22:01:48,348 INFO [train.py:1198] (0/4) Epoch 33, batch 3450, loss[loss=0.204, ctc_loss=0.1337, cr_loss=0.3514, over 16819.00 frames. ], tot_loss[loss=0.1972, ctc_loss=0.128, cr_loss=0.3463, over 3354470.94 frames. ], batch size: 58, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:02:15,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-09-24 22:02:16,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=597954.0, ans=0.025 2024-09-24 22:03:04,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=598094.0, ans=0.2 2024-09-24 22:03:12,224 INFO [train.py:1198] (0/4) Epoch 33, batch 3500, loss[loss=0.1681, ctc_loss=0.1079, cr_loss=0.3009, over 17261.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1273, cr_loss=0.3443, over 3350045.11 frames. ], batch size: 42, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:03:18,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=598140.6666666666, ans=0.0 2024-09-24 22:03:20,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=598140.6666666666, ans=0.2 2024-09-24 22:03:23,015 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.259e+02 1.352e+02 1.466e+02 4.097e+02, threshold=2.703e+02, percent-clipped=1.0 2024-09-24 22:03:37,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=598187.3333333334, ans=0.015 2024-09-24 22:03:39,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=598187.3333333334, ans=10.0 2024-09-24 22:03:42,186 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:04:08,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=598280.6666666666, ans=0.0 2024-09-24 22:04:24,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=598327.3333333334, ans=0.0 2024-09-24 22:04:30,422 INFO [train.py:1198] (0/4) Epoch 33, batch 3550, loss[loss=0.1682, ctc_loss=0.1092, cr_loss=0.2948, over 17035.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1271, cr_loss=0.3432, over 3350485.35 frames. ], batch size: 39, lr: 3.59e-03, grad_scale: 32.0 2024-09-24 22:04:34,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=22.5 2024-09-24 22:04:43,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598374.0, ans=0.1 2024-09-24 22:04:44,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=598420.6666666666, ans=0.025 2024-09-24 22:05:10,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-09-24 22:05:22,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=598514.0, ans=0.125 2024-09-24 22:05:25,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=598514.0, ans=0.0 2024-09-24 22:05:27,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=598514.0, ans=0.0 2024-09-24 22:05:48,759 INFO [train.py:1198] (0/4) Epoch 33, batch 3600, loss[loss=0.2183, ctc_loss=0.1463, cr_loss=0.36, over 12006.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1283, cr_loss=0.3452, over 3338274.51 frames. ], batch size: 123, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:05:59,709 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.266e+02 1.339e+02 1.451e+02 1.959e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-24 22:06:00,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=598607.3333333334, ans=0.0 2024-09-24 22:06:38,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=598747.3333333334, ans=0.2 2024-09-24 22:06:42,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=598747.3333333334, ans=0.125 2024-09-24 22:07:05,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-24 22:07:08,976 INFO [train.py:1198] (0/4) Epoch 33, batch 3650, loss[loss=0.1845, ctc_loss=0.1222, cr_loss=0.3113, over 17215.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1284, cr_loss=0.3456, over 3339146.90 frames. ], batch size: 47, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:07:23,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=598887.3333333334, ans=0.025 2024-09-24 22:08:04,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=598980.6666666666, ans=0.0 2024-09-24 22:08:10,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=22.5 2024-09-24 22:08:11,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2024-09-24 22:08:27,972 INFO [train.py:1198] (0/4) Epoch 33, batch 3700, loss[loss=0.2133, ctc_loss=0.1398, cr_loss=0.3674, over 16898.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1278, cr_loss=0.3451, over 3349760.14 frames. ], batch size: 58, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:08:31,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=599074.0, ans=0.125 2024-09-24 22:08:39,002 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.282e+02 1.323e+02 1.479e+02 2.496e+02, threshold=2.646e+02, percent-clipped=0.0 2024-09-24 22:08:54,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=599120.6666666666, ans=0.125 2024-09-24 22:09:08,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=599167.3333333334, ans=0.0 2024-09-24 22:09:29,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=599260.6666666666, ans=22.5 2024-09-24 22:09:45,899 INFO [train.py:1198] (0/4) Epoch 33, batch 3750, loss[loss=0.161, ctc_loss=0.1041, cr_loss=0.2844, over 16956.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1276, cr_loss=0.3443, over 3352905.23 frames. ], batch size: 42, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:09:49,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=599307.3333333334, ans=0.125 2024-09-24 22:10:33,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=599447.3333333334, ans=0.2 2024-09-24 22:10:50,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=599494.0, ans=0.125 2024-09-24 22:11:02,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=599540.6666666666, ans=0.125 2024-09-24 22:11:03,990 INFO [train.py:1198] (0/4) Epoch 33, batch 3800, loss[loss=0.1679, ctc_loss=0.1088, cr_loss=0.2956, over 16296.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1284, cr_loss=0.345, over 3341941.73 frames. ], batch size: 36, lr: 3.58e-03, grad_scale: 32.0 2024-09-24 22:11:14,753 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.301e+02 1.426e+02 1.535e+02 2.880e+02, threshold=2.852e+02, percent-clipped=1.0 2024-09-24 22:12:04,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=599680.6666666666, ans=0.125 2024-09-24 22:12:18,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=599727.3333333334, ans=0.2 2024-09-24 22:12:22,914 INFO [train.py:1198] (0/4) Epoch 33, batch 3850, loss[loss=0.2049, ctc_loss=0.1333, cr_loss=0.3579, over 16942.00 frames. ], tot_loss[loss=0.2004, ctc_loss=0.1307, cr_loss=0.3487, over 3306708.77 frames. ], batch size: 58, lr: 3.58e-03, grad_scale: 16.0 2024-09-24 22:12:25,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2024-09-24 22:12:29,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=599774.0, ans=0.5 2024-09-24 22:12:42,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=599820.6666666666, ans=0.125 2024-09-24 22:13:02,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=599867.3333333334, ans=0.125 2024-09-24 22:13:32,916 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-33.pt 2024-09-24 22:14:24,102 INFO [train.py:1198] (0/4) Epoch 34, batch 0, loss[loss=0.1953, ctc_loss=0.1275, cr_loss=0.339, over 17069.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1275, cr_loss=0.339, over 17069.00 frames. ], batch size: 46, lr: 3.53e-03, grad_scale: 32.0 2024-09-24 22:14:24,103 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 22:14:39,386 INFO [train.py:1230] (0/4) Epoch 34, validation: loss=0.03567, ctc_loss=0.03567, cr_loss=1.032e-14, over 944034.00 frames. 2024-09-24 22:14:39,386 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 22:14:58,491 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.377e+02 1.527e+02 1.748e+02 2.707e+02, threshold=3.055e+02, percent-clipped=0.0 2024-09-24 22:15:00,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=600035.3333333334, ans=0.125 2024-09-24 22:15:03,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600035.3333333334, ans=0.1 2024-09-24 22:15:05,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600035.3333333334, ans=0.1 2024-09-24 22:15:59,189 INFO [train.py:1198] (0/4) Epoch 34, batch 50, loss[loss=0.1827, ctc_loss=0.1174, cr_loss=0.3263, over 17111.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1241, cr_loss=0.3395, over 756159.91 frames. ], batch size: 40, lr: 3.53e-03, grad_scale: 32.0 2024-09-24 22:16:12,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=600222.0, ans=0.2 2024-09-24 22:16:18,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=600268.6666666666, ans=0.0 2024-09-24 22:16:30,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=600268.6666666666, ans=0.0 2024-09-24 22:16:33,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=600315.3333333334, ans=0.125 2024-09-24 22:17:02,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=600362.0, ans=0.125 2024-09-24 22:17:10,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=600408.6666666666, ans=0.125 2024-09-24 22:17:28,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=600455.3333333334, ans=0.1 2024-09-24 22:17:29,797 INFO [train.py:1198] (0/4) Epoch 34, batch 100, loss[loss=0.1706, ctc_loss=0.1093, cr_loss=0.3065, over 17293.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1248, cr_loss=0.3415, over 1331869.29 frames. ], batch size: 49, lr: 3.53e-03, grad_scale: 32.0 2024-09-24 22:17:49,214 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.286e+02 1.376e+02 1.498e+02 2.035e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-24 22:17:59,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=600502.0, ans=0.2 2024-09-24 22:18:49,899 INFO [train.py:1198] (0/4) Epoch 34, batch 150, loss[loss=0.2088, ctc_loss=0.1382, cr_loss=0.3528, over 16862.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1275, cr_loss=0.3468, over 1780825.70 frames. ], batch size: 58, lr: 3.52e-03, grad_scale: 32.0 2024-09-24 22:18:56,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=600688.6666666666, ans=0.125 2024-09-24 22:19:06,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=600735.3333333334, ans=0.0 2024-09-24 22:19:15,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=600735.3333333334, ans=0.1 2024-09-24 22:19:18,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=600735.3333333334, ans=0.0 2024-09-24 22:19:35,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=600828.6666666666, ans=0.0 2024-09-24 22:19:59,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=600875.3333333334, ans=0.0 2024-09-24 22:20:03,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-09-24 22:20:09,194 INFO [train.py:1198] (0/4) Epoch 34, batch 200, loss[loss=0.1731, ctc_loss=0.1104, cr_loss=0.3134, over 16949.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1272, cr_loss=0.3453, over 2125298.02 frames. ], batch size: 42, lr: 3.52e-03, grad_scale: 32.0 2024-09-24 22:20:20,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600922.0, ans=0.1 2024-09-24 22:20:24,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=600968.6666666666, ans=0.125 2024-09-24 22:20:30,036 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.259e+02 1.333e+02 1.425e+02 2.058e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-24 22:20:44,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=601015.3333333334, ans=0.125 2024-09-24 22:21:02,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=601062.0, ans=0.125 2024-09-24 22:21:04,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=601062.0, ans=0.2 2024-09-24 22:21:10,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=601062.0, ans=0.2 2024-09-24 22:21:32,286 INFO [train.py:1198] (0/4) Epoch 34, batch 250, loss[loss=0.195, ctc_loss=0.1238, cr_loss=0.3559, over 17059.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3435, over 2398774.86 frames. ], batch size: 46, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:22:33,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=601295.3333333334, ans=0.125 2024-09-24 22:22:50,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-09-24 22:23:00,622 INFO [train.py:1198] (0/4) Epoch 34, batch 300, loss[loss=0.2103, ctc_loss=0.1346, cr_loss=0.3784, over 17205.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1263, cr_loss=0.3441, over 2609396.91 frames. ], batch size: 55, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:23:04,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-09-24 22:23:14,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.12 vs. limit=10.0 2024-09-24 22:23:16,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=601435.3333333334, ans=0.0 2024-09-24 22:23:21,392 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.266e+02 1.387e+02 1.525e+02 2.483e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-24 22:23:43,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=601482.0, ans=0.0 2024-09-24 22:24:00,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=601528.6666666666, ans=0.125 2024-09-24 22:24:20,384 INFO [train.py:1198] (0/4) Epoch 34, batch 350, loss[loss=0.1963, ctc_loss=0.1256, cr_loss=0.3539, over 17158.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1264, cr_loss=0.3442, over 2780170.97 frames. ], batch size: 45, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:24:27,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=601622.0, ans=0.125 2024-09-24 22:24:44,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=601668.6666666666, ans=0.0 2024-09-24 22:24:46,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=601668.6666666666, ans=0.05 2024-09-24 22:25:17,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=22.5 2024-09-24 22:25:21,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=601762.0, ans=0.0 2024-09-24 22:25:37,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=601808.6666666666, ans=0.2 2024-09-24 22:25:38,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=601855.3333333334, ans=0.125 2024-09-24 22:25:39,905 INFO [train.py:1198] (0/4) Epoch 34, batch 400, loss[loss=0.2139, ctc_loss=0.139, cr_loss=0.3748, over 17216.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1267, cr_loss=0.3444, over 2912864.63 frames. ], batch size: 47, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:25:53,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=601855.3333333334, ans=0.125 2024-09-24 22:26:02,214 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.275e+02 1.410e+02 1.501e+02 2.362e+02, threshold=2.821e+02, percent-clipped=0.0 2024-09-24 22:26:04,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=601902.0, ans=0.09899494936611666 2024-09-24 22:26:17,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2024-09-24 22:26:26,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=12.0 2024-09-24 22:26:37,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601995.3333333334, ans=0.1 2024-09-24 22:26:48,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=602042.0, ans=0.0 2024-09-24 22:27:05,427 INFO [train.py:1198] (0/4) Epoch 34, batch 450, loss[loss=0.1526, ctc_loss=0.09293, cr_loss=0.2983, over 17077.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1273, cr_loss=0.3457, over 3010031.48 frames. ], batch size: 40, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:27:43,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=602182.0, ans=0.2 2024-09-24 22:28:03,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=602228.6666666666, ans=0.95 2024-09-24 22:28:08,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602228.6666666666, ans=0.1 2024-09-24 22:28:09,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=602228.6666666666, ans=0.0 2024-09-24 22:28:13,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=602275.3333333334, ans=0.2 2024-09-24 22:28:18,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-09-24 22:28:22,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602275.3333333334, ans=0.1 2024-09-24 22:28:30,685 INFO [train.py:1198] (0/4) Epoch 34, batch 500, loss[loss=0.1715, ctc_loss=0.1086, cr_loss=0.3144, over 17296.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1264, cr_loss=0.3436, over 3081752.74 frames. ], batch size: 42, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:28:42,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=602322.0, ans=0.125 2024-09-24 22:28:52,904 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.302e+02 1.372e+02 1.474e+02 2.066e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-24 22:29:32,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=602508.6666666666, ans=0.125 2024-09-24 22:29:45,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=602508.6666666666, ans=10.0 2024-09-24 22:29:49,896 INFO [train.py:1198] (0/4) Epoch 34, batch 550, loss[loss=0.2172, ctc_loss=0.1446, cr_loss=0.3632, over 17082.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1265, cr_loss=0.3432, over 3141001.30 frames. ], batch size: 49, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:30:55,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=602742.0, ans=0.0 2024-09-24 22:31:03,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=602742.0, ans=0.125 2024-09-24 22:31:09,784 INFO [train.py:1198] (0/4) Epoch 34, batch 600, loss[loss=0.2167, ctc_loss=0.1402, cr_loss=0.3825, over 17003.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1267, cr_loss=0.3434, over 3189759.95 frames. ], batch size: 53, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:31:13,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=602788.6666666666, ans=0.0 2024-09-24 22:31:34,416 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.259e+02 1.339e+02 1.429e+02 3.195e+02, threshold=2.679e+02, percent-clipped=1.0 2024-09-24 22:32:01,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=602928.6666666666, ans=0.125 2024-09-24 22:32:35,539 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:32:39,993 INFO [train.py:1198] (0/4) Epoch 34, batch 650, loss[loss=0.1786, ctc_loss=0.1136, cr_loss=0.3252, over 17247.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1268, cr_loss=0.3428, over 3219962.32 frames. ], batch size: 44, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:32:50,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2024-09-24 22:33:06,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-24 22:33:23,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=603115.3333333334, ans=0.0 2024-09-24 22:33:24,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=22.5 2024-09-24 22:33:25,620 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:33:31,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=603162.0, ans=0.2 2024-09-24 22:34:00,564 INFO [train.py:1198] (0/4) Epoch 34, batch 700, loss[loss=0.1635, ctc_loss=0.1027, cr_loss=0.3038, over 17287.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.126, cr_loss=0.3412, over 3253135.97 frames. ], batch size: 42, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:34:23,177 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.279e+02 1.394e+02 1.540e+02 2.036e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-24 22:35:07,195 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:35:20,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2024-09-24 22:35:21,220 INFO [train.py:1198] (0/4) Epoch 34, batch 750, loss[loss=0.1682, ctc_loss=0.1071, cr_loss=0.3055, over 17006.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.125, cr_loss=0.3393, over 3285065.09 frames. ], batch size: 44, lr: 3.52e-03, grad_scale: 16.0 2024-09-24 22:35:21,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.98 vs. limit=22.5 2024-09-24 22:35:32,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=603488.6666666666, ans=0.025 2024-09-24 22:35:47,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=603535.3333333334, ans=0.125 2024-09-24 22:36:05,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=603582.0, ans=0.125 2024-09-24 22:36:24,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=603628.6666666666, ans=0.025 2024-09-24 22:36:34,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603675.3333333334, ans=0.1 2024-09-24 22:36:40,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=603675.3333333334, ans=0.125 2024-09-24 22:36:43,573 INFO [train.py:1198] (0/4) Epoch 34, batch 800, loss[loss=0.1987, ctc_loss=0.1369, cr_loss=0.3088, over 11787.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1253, cr_loss=0.3401, over 3294155.30 frames. ], batch size: 123, lr: 3.52e-03, grad_scale: 32.0 2024-09-24 22:36:49,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=603722.0, ans=0.0 2024-09-24 22:37:08,640 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.269e+02 1.369e+02 1.482e+02 1.915e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-24 22:37:20,933 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.00 vs. limit=10.0 2024-09-24 22:37:26,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=603815.3333333334, ans=0.1 2024-09-24 22:37:30,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=603815.3333333334, ans=0.0 2024-09-24 22:37:54,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603908.6666666666, ans=0.1 2024-09-24 22:38:11,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=603955.3333333334, ans=22.5 2024-09-24 22:38:12,079 INFO [train.py:1198] (0/4) Epoch 34, batch 850, loss[loss=0.1972, ctc_loss=0.1273, cr_loss=0.3491, over 17236.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1261, cr_loss=0.3411, over 3298417.19 frames. ], batch size: 55, lr: 3.52e-03, grad_scale: 32.0 2024-09-24 22:38:29,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=604002.0, ans=0.0 2024-09-24 22:38:44,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=604048.6666666666, ans=0.125 2024-09-24 22:39:00,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604095.3333333334, ans=0.1 2024-09-24 22:39:31,804 INFO [train.py:1198] (0/4) Epoch 34, batch 900, loss[loss=0.2092, ctc_loss=0.135, cr_loss=0.3706, over 17101.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1259, cr_loss=0.3409, over 3309842.22 frames. ], batch size: 49, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:39:36,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-24 22:39:54,076 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.293e+02 1.407e+02 1.509e+02 3.747e+02, threshold=2.814e+02, percent-clipped=1.0 2024-09-24 22:39:56,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=604235.3333333334, ans=10.0 2024-09-24 22:40:12,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=604282.0, ans=0.125 2024-09-24 22:40:28,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=604328.6666666666, ans=0.1 2024-09-24 22:40:32,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=604328.6666666666, ans=0.0 2024-09-24 22:40:43,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=604375.3333333334, ans=0.125 2024-09-24 22:40:52,458 INFO [train.py:1198] (0/4) Epoch 34, batch 950, loss[loss=0.2073, ctc_loss=0.1351, cr_loss=0.3606, over 17362.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1261, cr_loss=0.3407, over 3309679.07 frames. ], batch size: 48, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:40:56,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=604422.0, ans=0.125 2024-09-24 22:41:01,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-09-24 22:41:32,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=604515.3333333334, ans=0.125 2024-09-24 22:41:36,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-24 22:42:16,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=604608.6666666666, ans=0.125 2024-09-24 22:42:23,023 INFO [train.py:1198] (0/4) Epoch 34, batch 1000, loss[loss=0.2372, ctc_loss=0.1579, cr_loss=0.3965, over 16747.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1271, cr_loss=0.3426, over 3321897.78 frames. ], batch size: 61, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:42:37,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=604702.0, ans=0.125 2024-09-24 22:42:45,412 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.270e+02 1.349e+02 1.472e+02 1.904e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-24 22:42:52,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=604702.0, ans=0.0 2024-09-24 22:43:06,681 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:43:21,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=604795.3333333334, ans=0.125 2024-09-24 22:43:43,610 INFO [train.py:1198] (0/4) Epoch 34, batch 1050, loss[loss=0.2136, ctc_loss=0.1414, cr_loss=0.3608, over 16070.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.127, cr_loss=0.3433, over 3336308.27 frames. ], batch size: 74, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:43:48,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=604888.6666666666, ans=0.125 2024-09-24 22:43:51,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=604888.6666666666, ans=0.125 2024-09-24 22:44:47,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=605075.3333333334, ans=0.125 2024-09-24 22:45:02,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=605122.0, ans=0.0 2024-09-24 22:45:03,652 INFO [train.py:1198] (0/4) Epoch 34, batch 1100, loss[loss=0.2084, ctc_loss=0.1352, cr_loss=0.366, over 17360.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1269, cr_loss=0.3432, over 3347635.06 frames. ], batch size: 48, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:45:10,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=605122.0, ans=0.2 2024-09-24 22:45:26,104 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.266e+02 1.354e+02 1.438e+02 1.919e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-24 22:45:47,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=605215.3333333334, ans=0.07 2024-09-24 22:46:04,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=605262.0, ans=0.125 2024-09-24 22:46:26,898 INFO [train.py:1198] (0/4) Epoch 34, batch 1150, loss[loss=0.2467, ctc_loss=0.1632, cr_loss=0.4178, over 15077.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1269, cr_loss=0.343, over 3347834.60 frames. ], batch size: 89, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:46:46,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-09-24 22:46:55,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=605402.0, ans=0.125 2024-09-24 22:46:58,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=605402.0, ans=0.125 2024-09-24 22:47:16,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=605448.6666666666, ans=0.04949747468305833 2024-09-24 22:47:54,035 INFO [train.py:1198] (0/4) Epoch 34, batch 1200, loss[loss=0.2051, ctc_loss=0.133, cr_loss=0.3607, over 17313.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1265, cr_loss=0.3419, over 3354051.49 frames. ], batch size: 49, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:47:56,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2024-09-24 22:48:16,700 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.285e+02 1.360e+02 1.428e+02 2.074e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-24 22:48:25,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=605682.0, ans=0.1 2024-09-24 22:48:33,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.58 vs. limit=10.0 2024-09-24 22:48:59,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605775.3333333334, ans=0.1 2024-09-24 22:49:14,482 INFO [train.py:1198] (0/4) Epoch 34, batch 1250, loss[loss=0.2258, ctc_loss=0.1441, cr_loss=0.4085, over 17137.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1264, cr_loss=0.3419, over 3354277.05 frames. ], batch size: 48, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:49:14,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=605822.0, ans=0.125 2024-09-24 22:49:16,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=605822.0, ans=0.2 2024-09-24 22:49:27,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2024-09-24 22:49:37,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=605868.6666666666, ans=10.0 2024-09-24 22:49:53,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=605915.3333333334, ans=0.125 2024-09-24 22:50:02,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=605962.0, ans=0.125 2024-09-24 22:50:05,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=605962.0, ans=0.025 2024-09-24 22:50:32,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2024-09-24 22:50:34,624 INFO [train.py:1198] (0/4) Epoch 34, batch 1300, loss[loss=0.218, ctc_loss=0.1419, cr_loss=0.3807, over 17039.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1269, cr_loss=0.343, over 3354025.61 frames. ], batch size: 52, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:50:52,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=606102.0, ans=0.125 2024-09-24 22:50:57,099 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.261e+02 1.315e+02 1.424e+02 2.011e+02, threshold=2.630e+02, percent-clipped=0.0 2024-09-24 22:51:04,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2024-09-24 22:51:16,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=606148.6666666666, ans=0.0 2024-09-24 22:51:59,709 INFO [train.py:1198] (0/4) Epoch 34, batch 1350, loss[loss=0.1995, ctc_loss=0.1304, cr_loss=0.3458, over 17009.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1276, cr_loss=0.3443, over 3341732.24 frames. ], batch size: 51, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:52:06,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606288.6666666666, ans=0.1 2024-09-24 22:52:21,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=606335.3333333334, ans=0.025 2024-09-24 22:52:41,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=606382.0, ans=0.05 2024-09-24 22:52:48,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=606382.0, ans=0.0 2024-09-24 22:53:24,834 INFO [train.py:1198] (0/4) Epoch 34, batch 1400, loss[loss=0.2197, ctc_loss=0.1408, cr_loss=0.3947, over 17218.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1278, cr_loss=0.3449, over 3347303.25 frames. ], batch size: 50, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:53:46,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=606568.6666666666, ans=0.2 2024-09-24 22:53:47,247 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.283e+02 1.396e+02 1.529e+02 1.918e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-24 22:53:49,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=606568.6666666666, ans=0.125 2024-09-24 22:53:49,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=606568.6666666666, ans=0.2 2024-09-24 22:53:55,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=606615.3333333334, ans=0.025 2024-09-24 22:54:11,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=606662.0, ans=0.125 2024-09-24 22:54:19,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606662.0, ans=0.1 2024-09-24 22:54:25,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=606662.0, ans=0.125 2024-09-24 22:54:32,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=606708.6666666666, ans=0.125 2024-09-24 22:54:40,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=606708.6666666666, ans=0.125 2024-09-24 22:54:44,664 INFO [train.py:1198] (0/4) Epoch 34, batch 1450, loss[loss=0.206, ctc_loss=0.1365, cr_loss=0.3475, over 17301.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1278, cr_loss=0.345, over 3357584.29 frames. ], batch size: 51, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:55:10,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=606802.0, ans=0.0 2024-09-24 22:55:53,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=606942.0, ans=0.0 2024-09-24 22:55:53,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=606942.0, ans=0.025 2024-09-24 22:56:03,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=606988.6666666666, ans=0.04949747468305833 2024-09-24 22:56:04,599 INFO [train.py:1198] (0/4) Epoch 34, batch 1500, loss[loss=0.1761, ctc_loss=0.112, cr_loss=0.3203, over 17047.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1263, cr_loss=0.3429, over 3364190.20 frames. ], batch size: 39, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:56:29,447 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.258e+02 1.350e+02 1.442e+02 1.821e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 22:56:51,752 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 22:56:53,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=607082.0, ans=0.0 2024-09-24 22:57:02,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=607128.6666666666, ans=0.025 2024-09-24 22:57:13,954 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=22.5 2024-09-24 22:57:20,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=607175.3333333334, ans=0.2 2024-09-24 22:57:31,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=607175.3333333334, ans=0.025 2024-09-24 22:57:34,687 INFO [train.py:1198] (0/4) Epoch 34, batch 1550, loss[loss=0.2156, ctc_loss=0.1397, cr_loss=0.3798, over 17092.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1275, cr_loss=0.3456, over 3364960.50 frames. ], batch size: 49, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:57:39,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=607222.0, ans=0.125 2024-09-24 22:57:44,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=607222.0, ans=0.0 2024-09-24 22:57:46,668 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-24 22:58:16,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-24 22:58:54,425 INFO [train.py:1198] (0/4) Epoch 34, batch 1600, loss[loss=0.2081, ctc_loss=0.1361, cr_loss=0.3598, over 17088.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1278, cr_loss=0.3453, over 3367877.10 frames. ], batch size: 49, lr: 3.51e-03, grad_scale: 32.0 2024-09-24 22:58:56,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=22.5 2024-09-24 22:58:59,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=607455.3333333334, ans=0.0 2024-09-24 22:59:16,752 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.263e+02 1.329e+02 1.418e+02 2.097e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-24 22:59:41,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=607595.3333333334, ans=0.125 2024-09-24 22:59:44,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=607595.3333333334, ans=0.125 2024-09-24 23:00:14,911 INFO [train.py:1198] (0/4) Epoch 34, batch 1650, loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3402, over 16240.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.127, cr_loss=0.3444, over 3372555.29 frames. ], batch size: 36, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:00:15,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=607688.6666666666, ans=0.125 2024-09-24 23:00:24,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=607688.6666666666, ans=0.07 2024-09-24 23:00:40,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=607735.3333333334, ans=0.2 2024-09-24 23:01:04,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=607828.6666666666, ans=0.0 2024-09-24 23:01:19,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=607875.3333333334, ans=0.0 2024-09-24 23:01:19,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=607875.3333333334, ans=0.125 2024-09-24 23:01:24,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=607875.3333333334, ans=0.125 2024-09-24 23:01:35,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=607922.0, ans=0.125 2024-09-24 23:01:36,803 INFO [train.py:1198] (0/4) Epoch 34, batch 1700, loss[loss=0.1934, ctc_loss=0.1272, cr_loss=0.3311, over 17368.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1277, cr_loss=0.3456, over 3360773.64 frames. ], batch size: 48, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:02:05,582 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.278e+02 1.343e+02 1.428e+02 2.095e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-24 23:02:16,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=608015.3333333334, ans=0.0 2024-09-24 23:02:19,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=608015.3333333334, ans=0.125 2024-09-24 23:02:26,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=608015.3333333334, ans=0.125 2024-09-24 23:02:30,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=608062.0, ans=0.125 2024-09-24 23:02:47,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=608108.6666666666, ans=0.0 2024-09-24 23:02:53,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=608108.6666666666, ans=0.2 2024-09-24 23:03:04,328 INFO [train.py:1198] (0/4) Epoch 34, batch 1750, loss[loss=0.1676, ctc_loss=0.1079, cr_loss=0.2988, over 17101.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1269, cr_loss=0.3443, over 3365241.10 frames. ], batch size: 40, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:03:18,448 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-09-24 23:03:46,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=12.0 2024-09-24 23:03:53,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2024-09-24 23:04:24,157 INFO [train.py:1198] (0/4) Epoch 34, batch 1800, loss[loss=0.2261, ctc_loss=0.1517, cr_loss=0.3724, over 16531.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1273, cr_loss=0.3451, over 3370980.40 frames. ], batch size: 66, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:04:48,240 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.244e+02 1.332e+02 1.441e+02 2.602e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-24 23:04:54,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=608482.0, ans=0.0 2024-09-24 23:04:59,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608482.0, ans=0.1 2024-09-24 23:05:12,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608528.6666666666, ans=0.1 2024-09-24 23:05:27,665 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.17 vs. limit=10.0 2024-09-24 23:05:44,351 INFO [train.py:1198] (0/4) Epoch 34, batch 1850, loss[loss=0.1677, ctc_loss=0.1064, cr_loss=0.3063, over 17087.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.127, cr_loss=0.3439, over 3365472.51 frames. ], batch size: 43, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:05:46,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=608622.0, ans=0.125 2024-09-24 23:06:45,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608762.0, ans=0.125 2024-09-24 23:07:13,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=608855.3333333334, ans=0.125 2024-09-24 23:07:15,203 INFO [train.py:1198] (0/4) Epoch 34, batch 1900, loss[loss=0.1813, ctc_loss=0.1149, cr_loss=0.3321, over 17313.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1266, cr_loss=0.343, over 3369818.59 frames. ], batch size: 51, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:07:26,264 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-24 23:07:31,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=608902.0, ans=0.5 2024-09-24 23:07:39,345 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.279e+02 1.348e+02 1.451e+02 1.794e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-24 23:08:01,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=608995.3333333334, ans=0.125 2024-09-24 23:08:14,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=608995.3333333334, ans=0.125 2024-09-24 23:08:35,259 INFO [train.py:1198] (0/4) Epoch 34, batch 1950, loss[loss=0.1489, ctc_loss=0.09232, cr_loss=0.2827, over 17124.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1266, cr_loss=0.3431, over 3371785.36 frames. ], batch size: 40, lr: 3.50e-03, grad_scale: 16.0 2024-09-24 23:08:41,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=609088.6666666666, ans=0.05 2024-09-24 23:09:04,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=609135.3333333334, ans=0.125 2024-09-24 23:09:12,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=609182.0, ans=0.125 2024-09-24 23:09:27,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-24 23:09:34,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=609228.6666666666, ans=0.0 2024-09-24 23:09:37,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.75 vs. limit=10.0 2024-09-24 23:09:45,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=609275.3333333334, ans=0.125 2024-09-24 23:09:53,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=609275.3333333334, ans=0.2 2024-09-24 23:09:56,093 INFO [train.py:1198] (0/4) Epoch 34, batch 2000, loss[loss=0.1956, ctc_loss=0.1237, cr_loss=0.3593, over 16979.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.127, cr_loss=0.3443, over 3373293.08 frames. ], batch size: 42, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:10:01,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=609322.0, ans=0.125 2024-09-24 23:10:19,956 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.293e+02 1.367e+02 1.523e+02 2.152e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-24 23:10:34,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=609415.3333333334, ans=0.0 2024-09-24 23:11:18,785 INFO [train.py:1198] (0/4) Epoch 34, batch 2050, loss[loss=0.1728, ctc_loss=0.1106, cr_loss=0.3107, over 16967.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1272, cr_loss=0.3452, over 3373912.55 frames. ], batch size: 42, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:12:38,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=609742.0, ans=0.0 2024-09-24 23:12:45,744 INFO [train.py:1198] (0/4) Epoch 34, batch 2100, loss[loss=0.1761, ctc_loss=0.1111, cr_loss=0.325, over 17311.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1268, cr_loss=0.3446, over 3374450.83 frames. ], batch size: 46, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:12:50,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=609788.6666666666, ans=0.2 2024-09-24 23:13:10,054 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.285e+02 1.366e+02 1.478e+02 2.142e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-24 23:13:34,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=609928.6666666666, ans=0.125 2024-09-24 23:13:36,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=609928.6666666666, ans=0.125 2024-09-24 23:13:41,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=609928.6666666666, ans=0.0 2024-09-24 23:14:00,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=609975.3333333334, ans=0.2 2024-09-24 23:14:03,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=609975.3333333334, ans=0.125 2024-09-24 23:14:06,287 INFO [train.py:1198] (0/4) Epoch 34, batch 2150, loss[loss=0.1668, ctc_loss=0.108, cr_loss=0.2939, over 17260.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1263, cr_loss=0.3433, over 3375748.90 frames. ], batch size: 44, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:15:12,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=610208.6666666666, ans=0.0 2024-09-24 23:15:25,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=610255.3333333334, ans=0.0 2024-09-24 23:15:26,300 INFO [train.py:1198] (0/4) Epoch 34, batch 2200, loss[loss=0.2107, ctc_loss=0.1409, cr_loss=0.3491, over 16981.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1263, cr_loss=0.3424, over 3372285.07 frames. ], batch size: 53, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:15:28,377 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:15:47,523 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:15:50,348 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.250e+02 1.358e+02 1.486e+02 2.433e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-24 23:16:14,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=610395.3333333334, ans=0.2 2024-09-24 23:16:28,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=610395.3333333334, ans=0.125 2024-09-24 23:16:39,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=610442.0, ans=0.125 2024-09-24 23:16:51,473 INFO [train.py:1198] (0/4) Epoch 34, batch 2250, loss[loss=0.1947, ctc_loss=0.123, cr_loss=0.3584, over 17247.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1262, cr_loss=0.3424, over 3372347.77 frames. ], batch size: 44, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:17:37,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=610582.0, ans=0.2 2024-09-24 23:18:04,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=610675.3333333334, ans=0.125 2024-09-24 23:18:07,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=610675.3333333334, ans=0.07 2024-09-24 23:18:11,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=610675.3333333334, ans=0.125 2024-09-24 23:18:14,098 INFO [train.py:1198] (0/4) Epoch 34, batch 2300, loss[loss=0.1974, ctc_loss=0.1291, cr_loss=0.3414, over 17301.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1272, cr_loss=0.3436, over 3364078.70 frames. ], batch size: 51, lr: 3.50e-03, grad_scale: 32.0 2024-09-24 23:18:20,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=610722.0, ans=0.0 2024-09-24 23:18:32,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610768.6666666666, ans=0.1 2024-09-24 23:18:38,051 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.297e+02 1.390e+02 1.515e+02 2.091e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-24 23:18:47,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=610815.3333333334, ans=0.5 2024-09-24 23:18:51,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=610815.3333333334, ans=0.125 2024-09-24 23:18:55,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=610815.3333333334, ans=0.0 2024-09-24 23:18:57,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=610815.3333333334, ans=0.125 2024-09-24 23:19:15,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=610862.0, ans=0.025 2024-09-24 23:19:34,056 INFO [train.py:1198] (0/4) Epoch 34, batch 2350, loss[loss=0.194, ctc_loss=0.1257, cr_loss=0.341, over 16698.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1274, cr_loss=0.3444, over 3356523.22 frames. ], batch size: 61, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:19:46,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=610955.3333333334, ans=0.125 2024-09-24 23:20:04,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=611048.6666666666, ans=0.2 2024-09-24 23:20:47,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611142.0, ans=0.1 2024-09-24 23:20:53,279 INFO [train.py:1198] (0/4) Epoch 34, batch 2400, loss[loss=0.1611, ctc_loss=0.1014, cr_loss=0.2987, over 17251.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1273, cr_loss=0.3446, over 3365473.92 frames. ], batch size: 42, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:21:19,778 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.247e+02 1.313e+02 1.426e+02 1.860e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-24 23:21:20,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=611235.3333333334, ans=0.0 2024-09-24 23:21:21,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=611235.3333333334, ans=0.125 2024-09-24 23:21:36,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-09-24 23:21:43,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611282.0, ans=0.1 2024-09-24 23:21:47,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=611328.6666666666, ans=0.1 2024-09-24 23:22:07,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=15.0 2024-09-24 23:22:16,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=611375.3333333334, ans=0.2 2024-09-24 23:22:23,851 INFO [train.py:1198] (0/4) Epoch 34, batch 2450, loss[loss=0.1959, ctc_loss=0.1301, cr_loss=0.3289, over 17357.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1273, cr_loss=0.3445, over 3348708.90 frames. ], batch size: 48, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:22:27,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=611422.0, ans=0.125 2024-09-24 23:22:27,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=611422.0, ans=0.2 2024-09-24 23:22:28,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=611422.0, ans=0.125 2024-09-24 23:22:29,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-24 23:22:32,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.60 vs. limit=15.0 2024-09-24 23:22:59,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=611515.3333333334, ans=0.125 2024-09-24 23:23:18,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=611562.0, ans=0.125 2024-09-24 23:23:28,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=611608.6666666666, ans=0.125 2024-09-24 23:23:31,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=611608.6666666666, ans=0.025 2024-09-24 23:23:43,932 INFO [train.py:1198] (0/4) Epoch 34, batch 2500, loss[loss=0.1648, ctc_loss=0.1067, cr_loss=0.2905, over 17090.00 frames. ], tot_loss[loss=0.1961, ctc_loss=0.1273, cr_loss=0.3439, over 3339232.04 frames. ], batch size: 43, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:23:47,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=12.0 2024-09-24 23:23:52,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=611655.3333333334, ans=10.0 2024-09-24 23:24:05,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=611702.0, ans=0.2 2024-09-24 23:24:08,084 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.282e+02 1.373e+02 1.484e+02 1.977e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-24 23:24:15,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=22.5 2024-09-24 23:24:19,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=611748.6666666666, ans=0.125 2024-09-24 23:24:22,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=611748.6666666666, ans=0.125 2024-09-24 23:24:27,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=611748.6666666666, ans=0.0 2024-09-24 23:24:53,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2024-09-24 23:24:54,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=611842.0, ans=0.125 2024-09-24 23:25:03,342 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=22.5 2024-09-24 23:25:04,065 INFO [train.py:1198] (0/4) Epoch 34, batch 2550, loss[loss=0.2071, ctc_loss=0.1348, cr_loss=0.3616, over 17128.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1262, cr_loss=0.3419, over 3344792.72 frames. ], batch size: 48, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:25:10,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=611888.6666666666, ans=0.125 2024-09-24 23:25:10,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=611888.6666666666, ans=0.125 2024-09-24 23:25:10,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=611888.6666666666, ans=0.125 2024-09-24 23:25:23,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611935.3333333334, ans=0.1 2024-09-24 23:25:23,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=611935.3333333334, ans=0.125 2024-09-24 23:26:08,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=612075.3333333334, ans=0.0 2024-09-24 23:26:09,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=612075.3333333334, ans=0.0 2024-09-24 23:26:25,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=612122.0, ans=0.125 2024-09-24 23:26:29,171 INFO [train.py:1198] (0/4) Epoch 34, batch 2600, loss[loss=0.206, ctc_loss=0.1354, cr_loss=0.353, over 17314.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1276, cr_loss=0.3439, over 3332315.43 frames. ], batch size: 51, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:26:32,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=612122.0, ans=0.0 2024-09-24 23:26:38,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=612122.0, ans=0.0 2024-09-24 23:26:47,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2024-09-24 23:26:51,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=612168.6666666666, ans=0.125 2024-09-24 23:26:55,524 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.298e+02 1.376e+02 1.484e+02 2.452e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-24 23:27:16,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=612215.3333333334, ans=0.0 2024-09-24 23:27:18,225 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:27:44,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-09-24 23:27:51,789 INFO [train.py:1198] (0/4) Epoch 34, batch 2650, loss[loss=0.1686, ctc_loss=0.1069, cr_loss=0.3084, over 17090.00 frames. ], tot_loss[loss=0.1974, ctc_loss=0.1283, cr_loss=0.3455, over 3331087.85 frames. ], batch size: 40, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:27:58,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2024-09-24 23:28:05,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2024-09-24 23:28:15,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=612402.0, ans=0.125 2024-09-24 23:28:28,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=612448.6666666666, ans=0.125 2024-09-24 23:28:41,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=612495.3333333334, ans=0.2 2024-09-24 23:28:46,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=612495.3333333334, ans=0.0 2024-09-24 23:29:12,127 INFO [train.py:1198] (0/4) Epoch 34, batch 2700, loss[loss=0.2047, ctc_loss=0.1341, cr_loss=0.3532, over 16535.00 frames. ], tot_loss[loss=0.1977, ctc_loss=0.1285, cr_loss=0.3461, over 3335433.65 frames. ], batch size: 66, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:29:14,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=612588.6666666666, ans=0.125 2024-09-24 23:29:22,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=612588.6666666666, ans=0.125 2024-09-24 23:29:36,054 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.298e+02 1.372e+02 1.486e+02 2.496e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-24 23:30:03,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=612728.6666666666, ans=0.0 2024-09-24 23:30:31,910 INFO [train.py:1198] (0/4) Epoch 34, batch 2750, loss[loss=0.1684, ctc_loss=0.1069, cr_loss=0.3074, over 15938.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.1284, cr_loss=0.346, over 3330683.60 frames. ], batch size: 35, lr: 3.49e-03, grad_scale: 16.0 2024-09-24 23:30:41,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=612822.0, ans=0.125 2024-09-24 23:30:45,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=612822.0, ans=0.125 2024-09-24 23:30:47,477 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=22.5 2024-09-24 23:30:59,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=612868.6666666666, ans=0.125 2024-09-24 23:31:19,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612915.3333333334, ans=0.1 2024-09-24 23:32:02,049 INFO [train.py:1198] (0/4) Epoch 34, batch 2800, loss[loss=0.1718, ctc_loss=0.1076, cr_loss=0.3213, over 17087.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3449, over 3336709.79 frames. ], batch size: 43, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:32:16,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=613102.0, ans=0.0 2024-09-24 23:32:17,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2024-09-24 23:32:23,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=613102.0, ans=10.0 2024-09-24 23:32:27,770 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.048e+02 1.265e+02 1.343e+02 1.457e+02 2.911e+02, threshold=2.687e+02, percent-clipped=1.0 2024-09-24 23:32:55,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=613195.3333333334, ans=0.0 2024-09-24 23:32:56,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=10.0 2024-09-24 23:32:58,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=613195.3333333334, ans=0.1 2024-09-24 23:33:09,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613242.0, ans=0.1 2024-09-24 23:33:13,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=613242.0, ans=0.125 2024-09-24 23:33:21,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=613288.6666666666, ans=0.125 2024-09-24 23:33:22,281 INFO [train.py:1198] (0/4) Epoch 34, batch 2850, loss[loss=0.1733, ctc_loss=0.1106, cr_loss=0.3133, over 17295.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.3451, over 3338735.36 frames. ], batch size: 46, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:33:22,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=613288.6666666666, ans=0.125 2024-09-24 23:33:29,242 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2024-09-24 23:33:37,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=613335.3333333334, ans=0.0 2024-09-24 23:33:51,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=613335.3333333334, ans=0.125 2024-09-24 23:34:23,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=22.5 2024-09-24 23:34:42,221 INFO [train.py:1198] (0/4) Epoch 34, batch 2900, loss[loss=0.2099, ctc_loss=0.1408, cr_loss=0.3456, over 16763.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.345, over 3334218.79 frames. ], batch size: 61, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:34:49,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2024-09-24 23:34:55,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=613522.0, ans=0.2 2024-09-24 23:35:07,635 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.262e+02 1.340e+02 1.438e+02 2.197e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-24 23:35:07,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613568.6666666666, ans=0.1 2024-09-24 23:35:19,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=613615.3333333334, ans=0.2 2024-09-24 23:35:25,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=613615.3333333334, ans=0.125 2024-09-24 23:36:04,803 INFO [train.py:1198] (0/4) Epoch 34, batch 2950, loss[loss=0.1732, ctc_loss=0.1113, cr_loss=0.3098, over 17097.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1275, cr_loss=0.3449, over 3350093.55 frames. ], batch size: 43, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:36:46,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=613848.6666666666, ans=0.125 2024-09-24 23:36:49,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=613848.6666666666, ans=0.125 2024-09-24 23:37:02,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=613895.3333333334, ans=0.0 2024-09-24 23:37:24,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=613942.0, ans=0.1 2024-09-24 23:37:32,110 INFO [train.py:1198] (0/4) Epoch 34, batch 3000, loss[loss=0.2144, ctc_loss=0.1419, cr_loss=0.3621, over 17295.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3445, over 3339424.51 frames. ], batch size: 49, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:37:32,110 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-24 23:37:47,937 INFO [train.py:1230] (0/4) Epoch 34, validation: loss=0.03583, ctc_loss=0.03583, cr_loss=9.471e-15, over 944034.00 frames. 2024-09-24 23:37:47,938 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-24 23:37:51,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=613988.6666666666, ans=0.125 2024-09-24 23:38:12,761 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.290e+02 1.384e+02 1.469e+02 2.229e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-24 23:38:20,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614082.0, ans=0.1 2024-09-24 23:38:53,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=614175.3333333334, ans=0.125 2024-09-24 23:39:05,970 INFO [train.py:1198] (0/4) Epoch 34, batch 3050, loss[loss=0.2184, ctc_loss=0.1411, cr_loss=0.3865, over 16943.00 frames. ], tot_loss[loss=0.1965, ctc_loss=0.1275, cr_loss=0.3449, over 3342251.51 frames. ], batch size: 58, lr: 3.49e-03, grad_scale: 32.0 2024-09-24 23:39:08,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2024-09-24 23:39:15,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=614222.0, ans=0.2 2024-09-24 23:39:49,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=614315.3333333334, ans=0.125 2024-09-24 23:40:00,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614362.0, ans=0.1 2024-09-24 23:40:14,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=614408.6666666666, ans=0.0 2024-09-24 23:40:24,000 INFO [train.py:1198] (0/4) Epoch 34, batch 3100, loss[loss=0.17, ctc_loss=0.1096, cr_loss=0.3017, over 16941.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1274, cr_loss=0.3448, over 3346382.49 frames. ], batch size: 42, lr: 3.49e-03, grad_scale: 16.0 2024-09-24 23:40:26,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-24 23:40:31,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=614455.3333333334, ans=0.125 2024-09-24 23:40:38,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=614502.0, ans=0.125 2024-09-24 23:40:44,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=614502.0, ans=0.125 2024-09-24 23:40:50,476 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.258e+02 1.350e+02 1.404e+02 1.862e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-24 23:40:53,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=614548.6666666666, ans=0.0 2024-09-24 23:40:58,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=614548.6666666666, ans=0.0 2024-09-24 23:41:29,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=614642.0, ans=0.125 2024-09-24 23:41:35,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=614642.0, ans=0.2 2024-09-24 23:41:41,890 INFO [train.py:1198] (0/4) Epoch 34, batch 3150, loss[loss=0.1706, ctc_loss=0.1115, cr_loss=0.2957, over 17024.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1271, cr_loss=0.3438, over 3348960.91 frames. ], batch size: 44, lr: 3.48e-03, grad_scale: 16.0 2024-09-24 23:42:01,414 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-09-24 23:42:07,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=614735.3333333334, ans=0.125 2024-09-24 23:42:18,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-09-24 23:42:37,472 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:42:37,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=614828.6666666666, ans=0.0 2024-09-24 23:42:40,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=614828.6666666666, ans=0.5 2024-09-24 23:42:48,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=614875.3333333334, ans=0.125 2024-09-24 23:43:00,720 INFO [train.py:1198] (0/4) Epoch 34, batch 3200, loss[loss=0.1628, ctc_loss=0.1052, cr_loss=0.2881, over 17268.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.127, cr_loss=0.3437, over 3352341.39 frames. ], batch size: 44, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:43:13,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614922.0, ans=0.0 2024-09-24 23:43:24,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=614968.6666666666, ans=0.125 2024-09-24 23:43:27,300 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.284e+02 1.377e+02 1.475e+02 3.177e+02, threshold=2.753e+02, percent-clipped=2.0 2024-09-24 23:43:52,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.51 vs. limit=15.0 2024-09-24 23:43:57,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=615062.0, ans=0.125 2024-09-24 23:44:13,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615108.6666666666, ans=0.1 2024-09-24 23:44:19,050 INFO [train.py:1198] (0/4) Epoch 34, batch 3250, loss[loss=0.2175, ctc_loss=0.1445, cr_loss=0.3653, over 17290.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.127, cr_loss=0.3441, over 3356533.59 frames. ], batch size: 49, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:44:27,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=615155.3333333334, ans=0.09899494936611666 2024-09-24 23:44:28,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=615155.3333333334, ans=0.125 2024-09-24 23:44:39,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=615202.0, ans=0.2 2024-09-24 23:45:23,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2024-09-24 23:45:39,576 INFO [train.py:1198] (0/4) Epoch 34, batch 3300, loss[loss=0.2248, ctc_loss=0.1509, cr_loss=0.3698, over 14782.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.3428, over 3355256.21 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:45:55,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=615435.3333333334, ans=10.0 2024-09-24 23:46:10,485 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.276e+02 1.373e+02 1.543e+02 3.468e+02, threshold=2.745e+02, percent-clipped=1.0 2024-09-24 23:46:40,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=615528.6666666666, ans=0.0 2024-09-24 23:46:56,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615575.3333333334, ans=0.125 2024-09-24 23:46:58,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=615575.3333333334, ans=0.0 2024-09-24 23:47:04,476 INFO [train.py:1198] (0/4) Epoch 34, batch 3350, loss[loss=0.2005, ctc_loss=0.1295, cr_loss=0.3552, over 17032.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3424, over 3356959.39 frames. ], batch size: 52, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:47:17,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=22.5 2024-09-24 23:47:18,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=615668.6666666666, ans=0.0 2024-09-24 23:47:44,180 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=22.5 2024-09-24 23:47:54,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=615762.0, ans=0.2 2024-09-24 23:48:08,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=615808.6666666666, ans=0.0 2024-09-24 23:48:16,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=615808.6666666666, ans=0.0 2024-09-24 23:48:22,376 INFO [train.py:1198] (0/4) Epoch 34, batch 3400, loss[loss=0.224, ctc_loss=0.1462, cr_loss=0.3889, over 16847.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1267, cr_loss=0.3431, over 3338465.24 frames. ], batch size: 58, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:48:24,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=615855.3333333334, ans=0.05 2024-09-24 23:48:28,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=615855.3333333334, ans=0.09899494936611666 2024-09-24 23:48:48,723 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.295e+02 1.404e+02 1.516e+02 2.292e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-24 23:49:09,261 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-132000.pt 2024-09-24 23:49:42,039 INFO [train.py:1198] (0/4) Epoch 34, batch 3450, loss[loss=0.2454, ctc_loss=0.1655, cr_loss=0.3999, over 11701.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1264, cr_loss=0.3427, over 3344718.07 frames. ], batch size: 123, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:50:07,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=616135.3333333334, ans=0.025 2024-09-24 23:50:34,287 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-09-24 23:50:43,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=616275.3333333334, ans=0.025 2024-09-24 23:50:43,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=616275.3333333334, ans=0.125 2024-09-24 23:50:52,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=616275.3333333334, ans=0.125 2024-09-24 23:51:00,312 INFO [train.py:1198] (0/4) Epoch 34, batch 3500, loss[loss=0.2549, ctc_loss=0.1691, cr_loss=0.4293, over 14993.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1278, cr_loss=0.3451, over 3335175.50 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:51:08,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=22.5 2024-09-24 23:51:25,589 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:51:26,844 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.270e+02 1.396e+02 1.524e+02 2.184e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-24 23:51:46,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=616462.0, ans=0.2 2024-09-24 23:51:50,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=616462.0, ans=0.2 2024-09-24 23:51:57,410 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=22.5 2024-09-24 23:51:58,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=616462.0, ans=0.0 2024-09-24 23:52:04,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=616508.6666666666, ans=0.0 2024-09-24 23:52:18,497 INFO [train.py:1198] (0/4) Epoch 34, batch 3550, loss[loss=0.177, ctc_loss=0.1127, cr_loss=0.3216, over 17096.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1277, cr_loss=0.3453, over 3343462.26 frames. ], batch size: 43, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:52:18,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=616555.3333333334, ans=0.0 2024-09-24 23:52:20,411 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:52:34,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=616602.0, ans=0.125 2024-09-24 23:52:40,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=616602.0, ans=0.0 2024-09-24 23:52:45,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=616602.0, ans=0.0 2024-09-24 23:52:47,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.63 vs. limit=6.0 2024-09-24 23:53:36,434 INFO [train.py:1198] (0/4) Epoch 34, batch 3600, loss[loss=0.205, ctc_loss=0.1349, cr_loss=0.3505, over 17208.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.1278, cr_loss=0.3457, over 3356730.84 frames. ], batch size: 47, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:53:37,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=616788.6666666666, ans=15.0 2024-09-24 23:54:03,052 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.287e+02 1.340e+02 1.449e+02 1.947e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-24 23:54:09,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=616882.0, ans=0.1 2024-09-24 23:54:34,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=616928.6666666666, ans=0.125 2024-09-24 23:54:41,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=616975.3333333334, ans=0.0 2024-09-24 23:54:57,438 INFO [train.py:1198] (0/4) Epoch 34, batch 3650, loss[loss=0.21, ctc_loss=0.1383, cr_loss=0.3583, over 17007.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1274, cr_loss=0.3442, over 3364052.63 frames. ], batch size: 56, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:54:57,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=617022.0, ans=0.025 2024-09-24 23:55:02,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=617022.0, ans=0.125 2024-09-24 23:55:11,719 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:55:35,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=617115.3333333334, ans=0.2 2024-09-24 23:55:55,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=617162.0, ans=0.125 2024-09-24 23:56:21,545 INFO [train.py:1198] (0/4) Epoch 34, batch 3700, loss[loss=0.2244, ctc_loss=0.1486, cr_loss=0.3791, over 14989.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1277, cr_loss=0.3448, over 3352697.83 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:56:22,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-09-24 23:56:32,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=617255.3333333334, ans=0.0 2024-09-24 23:56:48,014 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.259e+02 1.354e+02 1.435e+02 3.016e+02, threshold=2.708e+02, percent-clipped=2.0 2024-09-24 23:56:55,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=617348.6666666666, ans=0.125 2024-09-24 23:57:02,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=617348.6666666666, ans=0.025 2024-09-24 23:57:20,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-09-24 23:57:20,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=617395.3333333334, ans=0.125 2024-09-24 23:57:39,596 INFO [train.py:1198] (0/4) Epoch 34, batch 3750, loss[loss=0.2367, ctc_loss=0.1552, cr_loss=0.4075, over 17042.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1274, cr_loss=0.3446, over 3357493.27 frames. ], batch size: 52, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:57:39,816 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-24 23:57:49,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=617488.6666666666, ans=0.125 2024-09-24 23:58:04,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=617535.3333333334, ans=0.125 2024-09-24 23:58:29,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=617628.6666666666, ans=0.0 2024-09-24 23:58:56,795 INFO [train.py:1198] (0/4) Epoch 34, batch 3800, loss[loss=0.2123, ctc_loss=0.1401, cr_loss=0.3611, over 15083.00 frames. ], tot_loss[loss=0.1969, ctc_loss=0.1279, cr_loss=0.3449, over 3345145.26 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 32.0 2024-09-24 23:58:57,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=617722.0, ans=0.2 2024-09-24 23:58:57,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=617722.0, ans=0.0 2024-09-24 23:59:23,295 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.281e+02 1.379e+02 1.537e+02 2.661e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-24 23:59:33,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=617815.3333333334, ans=0.0 2024-09-24 23:59:33,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-09-24 23:59:58,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=617908.6666666666, ans=0.1 2024-09-25 00:00:06,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=617908.6666666666, ans=0.0 2024-09-25 00:00:15,669 INFO [train.py:1198] (0/4) Epoch 34, batch 3850, loss[loss=0.2273, ctc_loss=0.1485, cr_loss=0.3937, over 15199.00 frames. ], tot_loss[loss=0.1985, ctc_loss=0.1291, cr_loss=0.3466, over 3300718.10 frames. ], batch size: 89, lr: 3.48e-03, grad_scale: 32.0 2024-09-25 00:00:17,864 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=22.5 2024-09-25 00:00:31,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618002.0, ans=0.1 2024-09-25 00:00:45,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=618048.6666666666, ans=0.125 2024-09-25 00:00:46,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=618048.6666666666, ans=0.0 2024-09-25 00:01:00,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=618095.3333333334, ans=0.0 2024-09-25 00:01:01,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=618095.3333333334, ans=0.125 2024-09-25 00:01:20,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=618142.0, ans=0.125 2024-09-25 00:01:25,547 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-34.pt 2024-09-25 00:02:16,956 INFO [train.py:1198] (0/4) Epoch 35, batch 0, loss[loss=0.2061, ctc_loss=0.1292, cr_loss=0.3847, over 17354.00 frames. ], tot_loss[loss=0.2061, ctc_loss=0.1292, cr_loss=0.3847, over 17354.00 frames. ], batch size: 48, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:02:16,957 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 00:02:32,197 INFO [train.py:1230] (0/4) Epoch 35, validation: loss=0.03449, ctc_loss=0.03449, cr_loss=9.757e-15, over 944034.00 frames. 2024-09-25 00:02:32,198 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 00:02:48,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=618216.6666666666, ans=0.0 2024-09-25 00:03:10,088 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.401e+02 1.522e+02 1.667e+02 2.435e+02, threshold=3.044e+02, percent-clipped=0.0 2024-09-25 00:03:16,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=618263.3333333334, ans=0.2 2024-09-25 00:03:26,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2024-09-25 00:03:34,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.25 vs. limit=15.0 2024-09-25 00:03:56,383 INFO [train.py:1198] (0/4) Epoch 35, batch 50, loss[loss=0.2024, ctc_loss=0.1285, cr_loss=0.3693, over 17262.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.126, cr_loss=0.3457, over 762051.68 frames. ], batch size: 44, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:04:01,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=618403.3333333334, ans=0.125 2024-09-25 00:05:05,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=618590.0, ans=0.125 2024-09-25 00:05:07,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=618590.0, ans=0.0 2024-09-25 00:05:16,504 INFO [train.py:1198] (0/4) Epoch 35, batch 100, loss[loss=0.1768, ctc_loss=0.1127, cr_loss=0.3206, over 17224.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1267, cr_loss=0.3476, over 1342170.09 frames. ], batch size: 47, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:05:29,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=618636.6666666666, ans=0.125 2024-09-25 00:05:49,885 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.242e+02 1.304e+02 1.413e+02 1.730e+02, threshold=2.607e+02, percent-clipped=0.0 2024-09-25 00:06:07,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618776.6666666666, ans=0.1 2024-09-25 00:06:18,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=618776.6666666666, ans=15.0 2024-09-25 00:06:34,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=618823.3333333334, ans=0.125 2024-09-25 00:06:34,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=618823.3333333334, ans=0.125 2024-09-25 00:06:34,877 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2024-09-25 00:06:37,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=618870.0, ans=0.125 2024-09-25 00:06:38,903 INFO [train.py:1198] (0/4) Epoch 35, batch 150, loss[loss=0.1786, ctc_loss=0.1149, cr_loss=0.3188, over 17015.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.126, cr_loss=0.3459, over 1793435.81 frames. ], batch size: 51, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:06:47,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618870.0, ans=0.1 2024-09-25 00:06:47,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=618870.0, ans=0.125 2024-09-25 00:07:17,910 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:07:35,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-25 00:08:05,714 INFO [train.py:1198] (0/4) Epoch 35, batch 200, loss[loss=0.2093, ctc_loss=0.1357, cr_loss=0.3676, over 17210.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1256, cr_loss=0.3435, over 2140594.69 frames. ], batch size: 50, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:08:15,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=619103.3333333334, ans=0.125 2024-09-25 00:08:23,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=619150.0, ans=0.125 2024-09-25 00:08:38,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=619196.6666666666, ans=0.125 2024-09-25 00:08:41,112 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.252e+02 1.342e+02 1.492e+02 1.753e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-25 00:09:00,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=619243.3333333334, ans=0.125 2024-09-25 00:09:02,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=619243.3333333334, ans=0.125 2024-09-25 00:09:06,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=619243.3333333334, ans=0.125 2024-09-25 00:09:27,867 INFO [train.py:1198] (0/4) Epoch 35, batch 250, loss[loss=0.2082, ctc_loss=0.1362, cr_loss=0.3596, over 15943.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1261, cr_loss=0.3436, over 2401034.06 frames. ], batch size: 74, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:09:38,066 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=22.5 2024-09-25 00:10:07,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=619430.0, ans=0.0 2024-09-25 00:10:23,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=619476.6666666666, ans=0.05 2024-09-25 00:10:42,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=619523.3333333334, ans=0.0 2024-09-25 00:10:43,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.35 vs. limit=10.0 2024-09-25 00:10:45,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=619570.0, ans=0.125 2024-09-25 00:10:47,178 INFO [train.py:1198] (0/4) Epoch 35, batch 300, loss[loss=0.1882, ctc_loss=0.1201, cr_loss=0.3404, over 16951.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1255, cr_loss=0.3427, over 2619341.31 frames. ], batch size: 42, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:10:49,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=22.5 2024-09-25 00:11:16,983 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.42 vs. limit=6.0 2024-09-25 00:11:20,985 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.276e+02 1.333e+02 1.416e+02 3.334e+02, threshold=2.666e+02, percent-clipped=1.0 2024-09-25 00:11:21,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619663.3333333334, ans=0.1 2024-09-25 00:11:54,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=619756.6666666666, ans=0.0 2024-09-25 00:12:07,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=619756.6666666666, ans=0.0 2024-09-25 00:12:10,251 INFO [train.py:1198] (0/4) Epoch 35, batch 350, loss[loss=0.1877, ctc_loss=0.1208, cr_loss=0.3343, over 17213.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1261, cr_loss=0.3442, over 2786786.25 frames. ], batch size: 47, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:12:58,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2024-09-25 00:12:59,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=619896.6666666666, ans=10.0 2024-09-25 00:13:01,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=619896.6666666666, ans=0.125 2024-09-25 00:13:39,194 INFO [train.py:1198] (0/4) Epoch 35, batch 400, loss[loss=0.2141, ctc_loss=0.1412, cr_loss=0.3646, over 16431.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3437, over 2916300.50 frames. ], batch size: 66, lr: 3.42e-03, grad_scale: 32.0 2024-09-25 00:13:39,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=620036.6666666666, ans=0.0 2024-09-25 00:14:04,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=620083.3333333334, ans=0.025 2024-09-25 00:14:06,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=620083.3333333334, ans=0.0 2024-09-25 00:14:12,441 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.297e+02 1.357e+02 1.460e+02 2.001e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 00:14:16,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=620130.0, ans=0.0 2024-09-25 00:14:21,538 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=22.5 2024-09-25 00:14:33,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=620176.6666666666, ans=0.125 2024-09-25 00:14:59,149 INFO [train.py:1198] (0/4) Epoch 35, batch 450, loss[loss=0.1717, ctc_loss=0.1098, cr_loss=0.3095, over 16871.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1264, cr_loss=0.3444, over 3013681.81 frames. ], batch size: 58, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:15:21,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=620316.6666666666, ans=0.2 2024-09-25 00:15:25,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620316.6666666666, ans=0.1 2024-09-25 00:15:39,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=620363.3333333334, ans=0.125 2024-09-25 00:15:47,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2024-09-25 00:15:54,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.31 vs. limit=15.0 2024-09-25 00:16:07,632 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-25 00:16:19,361 INFO [train.py:1198] (0/4) Epoch 35, batch 500, loss[loss=0.2466, ctc_loss=0.1635, cr_loss=0.4156, over 15237.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1264, cr_loss=0.3447, over 3082732.48 frames. ], batch size: 89, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:16:19,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=620503.3333333334, ans=0.2 2024-09-25 00:16:21,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=620503.3333333334, ans=0.125 2024-09-25 00:16:54,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=620596.6666666666, ans=0.0 2024-09-25 00:16:57,098 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.242e+02 1.333e+02 1.438e+02 2.516e+02, threshold=2.666e+02, percent-clipped=0.0 2024-09-25 00:17:18,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=620643.3333333334, ans=0.125 2024-09-25 00:17:21,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=620643.3333333334, ans=0.1 2024-09-25 00:17:26,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=620690.0, ans=0.125 2024-09-25 00:17:31,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2024-09-25 00:17:44,911 INFO [train.py:1198] (0/4) Epoch 35, batch 550, loss[loss=0.1889, ctc_loss=0.1225, cr_loss=0.3317, over 17065.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1267, cr_loss=0.3453, over 3146796.33 frames. ], batch size: 46, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:17:45,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=620736.6666666666, ans=0.05 2024-09-25 00:17:54,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620736.6666666666, ans=0.1 2024-09-25 00:18:08,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=620783.3333333334, ans=0.0 2024-09-25 00:18:13,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-25 00:18:18,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=620783.3333333334, ans=0.0 2024-09-25 00:18:32,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=620830.0, ans=0.2 2024-09-25 00:18:49,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=620876.6666666666, ans=0.125 2024-09-25 00:18:53,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=22.5 2024-09-25 00:19:10,362 INFO [train.py:1198] (0/4) Epoch 35, batch 600, loss[loss=0.1661, ctc_loss=0.1052, cr_loss=0.3046, over 17031.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1266, cr_loss=0.3448, over 3193641.60 frames. ], batch size: 39, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:19:15,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2024-09-25 00:19:38,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=621016.6666666666, ans=22.5 2024-09-25 00:19:41,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2024-09-25 00:19:45,442 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.279e+02 1.403e+02 1.511e+02 1.952e+02, threshold=2.806e+02, percent-clipped=0.0 2024-09-25 00:20:30,172 INFO [train.py:1198] (0/4) Epoch 35, batch 650, loss[loss=0.1932, ctc_loss=0.1215, cr_loss=0.3585, over 17005.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1259, cr_loss=0.3433, over 3231458.25 frames. ], batch size: 52, lr: 3.42e-03, grad_scale: 16.0 2024-09-25 00:20:38,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=621203.3333333334, ans=0.95 2024-09-25 00:20:40,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=621203.3333333334, ans=0.125 2024-09-25 00:20:58,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2024-09-25 00:21:10,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=621296.6666666666, ans=0.125 2024-09-25 00:21:20,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=621343.3333333334, ans=0.04949747468305833 2024-09-25 00:21:45,981 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2024-09-25 00:21:53,029 INFO [train.py:1198] (0/4) Epoch 35, batch 700, loss[loss=0.1893, ctc_loss=0.1208, cr_loss=0.3426, over 17200.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1257, cr_loss=0.3427, over 3266063.82 frames. ], batch size: 47, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:22:21,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.96 vs. limit=15.0 2024-09-25 00:22:30,889 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.276e+02 1.349e+02 1.443e+02 1.700e+02, threshold=2.699e+02, percent-clipped=0.0 2024-09-25 00:22:44,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=621576.6666666666, ans=0.125 2024-09-25 00:23:12,763 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-25 00:23:21,253 INFO [train.py:1198] (0/4) Epoch 35, batch 750, loss[loss=0.2027, ctc_loss=0.1327, cr_loss=0.3502, over 16987.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3442, over 3285963.52 frames. ], batch size: 53, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:23:22,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.83 vs. limit=6.0 2024-09-25 00:23:26,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=621670.0, ans=0.0 2024-09-25 00:23:29,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621670.0, ans=0.1 2024-09-25 00:23:34,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=621670.0, ans=0.125 2024-09-25 00:23:43,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=621716.6666666666, ans=0.0 2024-09-25 00:23:46,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=12.0 2024-09-25 00:23:48,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=621716.6666666666, ans=0.0 2024-09-25 00:24:11,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=621810.0, ans=0.04949747468305833 2024-09-25 00:24:28,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2024-09-25 00:24:41,708 INFO [train.py:1198] (0/4) Epoch 35, batch 800, loss[loss=0.1983, ctc_loss=0.1278, cr_loss=0.3526, over 17039.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3428, over 3310235.36 frames. ], batch size: 52, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:24:51,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=621903.3333333334, ans=0.125 2024-09-25 00:25:16,940 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.280e+02 1.366e+02 1.481e+02 2.443e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-25 00:25:36,547 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:25:42,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=622043.3333333334, ans=0.125 2024-09-25 00:25:46,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=622090.0, ans=0.0 2024-09-25 00:25:46,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=622090.0, ans=0.1 2024-09-25 00:26:01,919 INFO [train.py:1198] (0/4) Epoch 35, batch 850, loss[loss=0.1617, ctc_loss=0.1021, cr_loss=0.2982, over 16949.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3424, over 3323579.82 frames. ], batch size: 42, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:26:43,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=622230.0, ans=0.125 2024-09-25 00:26:46,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=622230.0, ans=0.125 2024-09-25 00:26:56,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=622276.6666666666, ans=0.125 2024-09-25 00:27:26,640 INFO [train.py:1198] (0/4) Epoch 35, batch 900, loss[loss=0.2111, ctc_loss=0.1405, cr_loss=0.353, over 17028.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.126, cr_loss=0.3423, over 3338455.99 frames. ], batch size: 52, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:27:36,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=622370.0, ans=0.125 2024-09-25 00:28:08,746 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.305e+02 1.377e+02 1.455e+02 1.762e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 00:28:28,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=622510.0, ans=0.0 2024-09-25 00:28:41,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622556.6666666666, ans=0.125 2024-09-25 00:28:52,275 INFO [train.py:1198] (0/4) Epoch 35, batch 950, loss[loss=0.2448, ctc_loss=0.1646, cr_loss=0.4006, over 11473.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1265, cr_loss=0.3432, over 3334404.48 frames. ], batch size: 124, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:28:59,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=622603.3333333334, ans=0.125 2024-09-25 00:29:08,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=622650.0, ans=0.125 2024-09-25 00:29:24,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=622696.6666666666, ans=0.125 2024-09-25 00:29:34,422 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:29:34,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=622696.6666666666, ans=0.0 2024-09-25 00:29:45,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=622743.3333333334, ans=0.125 2024-09-25 00:29:48,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622743.3333333334, ans=0.1 2024-09-25 00:29:53,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622743.3333333334, ans=0.1 2024-09-25 00:30:06,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=622790.0, ans=0.125 2024-09-25 00:30:12,544 INFO [train.py:1198] (0/4) Epoch 35, batch 1000, loss[loss=0.2068, ctc_loss=0.1353, cr_loss=0.3575, over 17358.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1274, cr_loss=0.3446, over 3326175.73 frames. ], batch size: 48, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:30:14,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=622836.6666666666, ans=0.125 2024-09-25 00:30:41,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622883.3333333334, ans=0.125 2024-09-25 00:30:47,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=622930.0, ans=0.125 2024-09-25 00:30:49,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.266e+02 1.340e+02 1.444e+02 2.744e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-25 00:30:49,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2024-09-25 00:30:52,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2024-09-25 00:30:58,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=622976.6666666666, ans=0.2 2024-09-25 00:31:21,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=623023.3333333334, ans=0.125 2024-09-25 00:31:35,002 INFO [train.py:1198] (0/4) Epoch 35, batch 1050, loss[loss=0.1987, ctc_loss=0.1273, cr_loss=0.3571, over 17077.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1268, cr_loss=0.3439, over 3340449.36 frames. ], batch size: 46, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:32:03,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=623116.6666666666, ans=0.125 2024-09-25 00:32:35,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=623210.0, ans=0.125 2024-09-25 00:33:02,769 INFO [train.py:1198] (0/4) Epoch 35, batch 1100, loss[loss=0.2003, ctc_loss=0.1308, cr_loss=0.3476, over 17006.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1259, cr_loss=0.3429, over 3359187.65 frames. ], batch size: 52, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:33:06,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=623303.3333333334, ans=0.125 2024-09-25 00:33:39,212 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.286e+02 1.347e+02 1.466e+02 2.468e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-25 00:33:41,264 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:33:43,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2024-09-25 00:33:54,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=623443.3333333334, ans=0.125 2024-09-25 00:34:00,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=623443.3333333334, ans=0.125 2024-09-25 00:34:12,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=623490.0, ans=0.035 2024-09-25 00:34:22,934 INFO [train.py:1198] (0/4) Epoch 35, batch 1150, loss[loss=0.1854, ctc_loss=0.1205, cr_loss=0.3242, over 17151.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1256, cr_loss=0.3421, over 3361097.16 frames. ], batch size: 48, lr: 3.41e-03, grad_scale: 16.0 2024-09-25 00:34:47,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=623583.3333333334, ans=0.0 2024-09-25 00:34:53,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=623630.0, ans=0.0 2024-09-25 00:35:00,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=15.0 2024-09-25 00:35:02,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=623630.0, ans=0.125 2024-09-25 00:35:10,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=623676.6666666666, ans=0.0 2024-09-25 00:35:10,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623676.6666666666, ans=0.1 2024-09-25 00:35:42,223 INFO [train.py:1198] (0/4) Epoch 35, batch 1200, loss[loss=0.177, ctc_loss=0.1114, cr_loss=0.328, over 17145.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1252, cr_loss=0.3415, over 3363419.61 frames. ], batch size: 45, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:36:00,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=623816.6666666666, ans=0.125 2024-09-25 00:36:15,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-09-25 00:36:18,955 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.278e+02 1.375e+02 1.481e+02 3.565e+02, threshold=2.751e+02, percent-clipped=1.0 2024-09-25 00:36:25,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=623863.3333333334, ans=0.05 2024-09-25 00:36:45,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=623910.0, ans=0.05 2024-09-25 00:36:50,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=623956.6666666666, ans=0.125 2024-09-25 00:37:02,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=623956.6666666666, ans=15.0 2024-09-25 00:37:04,862 INFO [train.py:1198] (0/4) Epoch 35, batch 1250, loss[loss=0.2281, ctc_loss=0.1505, cr_loss=0.3882, over 17235.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1252, cr_loss=0.3411, over 3363539.42 frames. ], batch size: 50, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:37:23,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=624050.0, ans=0.125 2024-09-25 00:37:34,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=624050.0, ans=0.0 2024-09-25 00:37:53,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=624096.6666666666, ans=0.125 2024-09-25 00:38:05,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=624143.3333333334, ans=0.0 2024-09-25 00:38:13,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=624143.3333333334, ans=0.125 2024-09-25 00:38:14,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=624190.0, ans=0.125 2024-09-25 00:38:16,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=624190.0, ans=0.125 2024-09-25 00:38:23,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=624190.0, ans=0.0 2024-09-25 00:38:29,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=624190.0, ans=0.09899494936611666 2024-09-25 00:38:32,165 INFO [train.py:1198] (0/4) Epoch 35, batch 1300, loss[loss=0.1752, ctc_loss=0.1115, cr_loss=0.3185, over 16737.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1255, cr_loss=0.3418, over 3362123.75 frames. ], batch size: 37, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:38:43,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=624236.6666666666, ans=0.125 2024-09-25 00:38:45,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=624236.6666666666, ans=0.125 2024-09-25 00:38:47,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=15.0 2024-09-25 00:39:01,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=624283.3333333334, ans=0.125 2024-09-25 00:39:02,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=624330.0, ans=0.125 2024-09-25 00:39:08,851 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.267e+02 1.378e+02 1.452e+02 1.774e+02, threshold=2.755e+02, percent-clipped=0.0 2024-09-25 00:39:12,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=624330.0, ans=0.125 2024-09-25 00:39:17,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=624330.0, ans=0.125 2024-09-25 00:39:28,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=624376.6666666666, ans=0.125 2024-09-25 00:39:33,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=624376.6666666666, ans=0.2 2024-09-25 00:39:52,158 INFO [train.py:1198] (0/4) Epoch 35, batch 1350, loss[loss=0.184, ctc_loss=0.1201, cr_loss=0.3194, over 17224.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1258, cr_loss=0.3423, over 3358395.19 frames. ], batch size: 47, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:40:27,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=624563.3333333334, ans=0.125 2024-09-25 00:40:37,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.86 vs. limit=10.0 2024-09-25 00:40:48,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=624610.0, ans=0.125 2024-09-25 00:41:01,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=624656.6666666666, ans=0.025 2024-09-25 00:41:04,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-25 00:41:11,920 INFO [train.py:1198] (0/4) Epoch 35, batch 1400, loss[loss=0.1894, ctc_loss=0.1217, cr_loss=0.3381, over 17347.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1258, cr_loss=0.3423, over 3351332.12 frames. ], batch size: 48, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:41:26,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=624750.0, ans=0.125 2024-09-25 00:41:40,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=624750.0, ans=0.0 2024-09-25 00:41:48,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=624796.6666666666, ans=0.2 2024-09-25 00:41:51,232 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.292e+02 1.373e+02 1.479e+02 2.692e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-25 00:42:07,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=624843.3333333334, ans=0.0 2024-09-25 00:42:09,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=624843.3333333334, ans=0.0 2024-09-25 00:42:14,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=624843.3333333334, ans=0.125 2024-09-25 00:42:21,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=22.5 2024-09-25 00:42:31,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=624890.0, ans=15.0 2024-09-25 00:42:37,117 INFO [train.py:1198] (0/4) Epoch 35, batch 1450, loss[loss=0.2107, ctc_loss=0.1383, cr_loss=0.3622, over 17366.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.3434, over 3366335.19 frames. ], batch size: 48, lr: 3.41e-03, grad_scale: 32.0 2024-09-25 00:43:16,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2024-09-25 00:44:02,342 INFO [train.py:1198] (0/4) Epoch 35, batch 1500, loss[loss=0.2068, ctc_loss=0.1349, cr_loss=0.3594, over 17020.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1269, cr_loss=0.3449, over 3370154.50 frames. ], batch size: 51, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:44:39,192 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.249e+02 1.364e+02 1.434e+02 2.230e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 00:44:44,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=625263.3333333334, ans=0.125 2024-09-25 00:44:54,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-09-25 00:45:08,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625356.6666666666, ans=0.1 2024-09-25 00:45:15,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=625356.6666666666, ans=0.125 2024-09-25 00:45:15,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=625356.6666666666, ans=0.0 2024-09-25 00:45:21,966 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.99 vs. limit=10.0 2024-09-25 00:45:23,012 INFO [train.py:1198] (0/4) Epoch 35, batch 1550, loss[loss=0.1739, ctc_loss=0.1126, cr_loss=0.3066, over 16939.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1266, cr_loss=0.3441, over 3371434.84 frames. ], batch size: 42, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:45:23,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=625403.3333333334, ans=0.125 2024-09-25 00:45:31,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=625403.3333333334, ans=0.07 2024-09-25 00:45:55,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=625496.6666666666, ans=0.2 2024-09-25 00:45:58,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=625496.6666666666, ans=0.125 2024-09-25 00:46:01,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625496.6666666666, ans=0.1 2024-09-25 00:46:17,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=625543.3333333334, ans=0.0 2024-09-25 00:46:30,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=625590.0, ans=0.2 2024-09-25 00:46:30,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=625590.0, ans=0.0 2024-09-25 00:46:45,818 INFO [train.py:1198] (0/4) Epoch 35, batch 1600, loss[loss=0.2047, ctc_loss=0.1306, cr_loss=0.3703, over 17354.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1267, cr_loss=0.3446, over 3370711.39 frames. ], batch size: 48, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:47:11,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=625683.3333333334, ans=0.0 2024-09-25 00:47:12,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=625683.3333333334, ans=0.125 2024-09-25 00:47:14,098 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-09-25 00:47:15,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625683.3333333334, ans=0.1 2024-09-25 00:47:25,313 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.273e+02 1.348e+02 1.440e+02 1.761e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-25 00:47:28,004 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-09-25 00:47:40,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=625776.6666666666, ans=0.2 2024-09-25 00:48:14,402 INFO [train.py:1198] (0/4) Epoch 35, batch 1650, loss[loss=0.1871, ctc_loss=0.1192, cr_loss=0.3396, over 17122.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1268, cr_loss=0.3449, over 3361797.02 frames. ], batch size: 48, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:48:38,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625916.6666666666, ans=0.1 2024-09-25 00:48:48,405 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:48:48,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=625963.3333333334, ans=0.125 2024-09-25 00:49:32,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=22.5 2024-09-25 00:49:32,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-09-25 00:49:34,599 INFO [train.py:1198] (0/4) Epoch 35, batch 1700, loss[loss=0.2196, ctc_loss=0.1432, cr_loss=0.3824, over 16778.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1272, cr_loss=0.3458, over 3354596.05 frames. ], batch size: 61, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:50:10,953 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.095e+02 1.273e+02 1.359e+02 1.469e+02 2.264e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 00:50:25,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=626243.3333333334, ans=0.025 2024-09-25 00:50:29,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=626243.3333333334, ans=0.125 2024-09-25 00:50:30,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=626243.3333333334, ans=0.125 2024-09-25 00:50:49,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=626290.0, ans=0.0 2024-09-25 00:50:53,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=626336.6666666666, ans=0.0 2024-09-25 00:50:54,473 INFO [train.py:1198] (0/4) Epoch 35, batch 1750, loss[loss=0.2221, ctc_loss=0.1453, cr_loss=0.3836, over 16992.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1272, cr_loss=0.3452, over 3354279.97 frames. ], batch size: 53, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:51:12,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=626383.3333333334, ans=0.025 2024-09-25 00:51:28,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=626430.0, ans=10.0 2024-09-25 00:51:28,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=626430.0, ans=0.09899494936611666 2024-09-25 00:51:45,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=626476.6666666666, ans=0.0 2024-09-25 00:52:19,238 INFO [train.py:1198] (0/4) Epoch 35, batch 1800, loss[loss=0.1756, ctc_loss=0.114, cr_loss=0.3079, over 17219.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1263, cr_loss=0.3429, over 3351391.08 frames. ], batch size: 47, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:52:37,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.90 vs. limit=15.0 2024-09-25 00:53:00,986 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.248e+02 1.342e+02 1.447e+02 1.901e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-25 00:53:20,573 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 00:53:22,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=626710.0, ans=0.0 2024-09-25 00:53:43,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626803.3333333334, ans=0.1 2024-09-25 00:53:44,286 INFO [train.py:1198] (0/4) Epoch 35, batch 1850, loss[loss=0.2133, ctc_loss=0.1387, cr_loss=0.3727, over 16743.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1269, cr_loss=0.3439, over 3353062.88 frames. ], batch size: 61, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:54:06,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2024-09-25 00:54:13,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626850.0, ans=0.1 2024-09-25 00:55:04,178 INFO [train.py:1198] (0/4) Epoch 35, batch 1900, loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.3383, over 17173.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1267, cr_loss=0.3432, over 3345021.95 frames. ], batch size: 45, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:55:20,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=627083.3333333334, ans=0.2 2024-09-25 00:55:36,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=627130.0, ans=0.125 2024-09-25 00:55:41,025 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.333e+02 1.409e+02 1.508e+02 2.528e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-25 00:55:44,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=627130.0, ans=0.09899494936611666 2024-09-25 00:55:54,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=627176.6666666666, ans=0.125 2024-09-25 00:56:23,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=627270.0, ans=0.0 2024-09-25 00:56:24,335 INFO [train.py:1198] (0/4) Epoch 35, batch 1950, loss[loss=0.184, ctc_loss=0.1188, cr_loss=0.3263, over 17041.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1263, cr_loss=0.3426, over 3344827.70 frames. ], batch size: 52, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:56:35,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=627270.0, ans=0.0 2024-09-25 00:56:43,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=627316.6666666666, ans=0.125 2024-09-25 00:57:07,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=627363.3333333334, ans=0.125 2024-09-25 00:57:32,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627456.6666666666, ans=0.1 2024-09-25 00:57:49,087 INFO [train.py:1198] (0/4) Epoch 35, batch 2000, loss[loss=0.2151, ctc_loss=0.1401, cr_loss=0.3751, over 17212.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1265, cr_loss=0.3431, over 3351032.57 frames. ], batch size: 55, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:58:05,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=627503.3333333334, ans=0.02 2024-09-25 00:58:15,675 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.82 vs. limit=6.0 2024-09-25 00:58:16,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=627550.0, ans=0.125 2024-09-25 00:58:30,772 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.295e+02 1.390e+02 1.531e+02 2.683e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 00:58:37,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=627596.6666666666, ans=0.05 2024-09-25 00:59:11,351 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-09-25 00:59:13,925 INFO [train.py:1198] (0/4) Epoch 35, batch 2050, loss[loss=0.1673, ctc_loss=0.1043, cr_loss=0.3149, over 17080.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.343, over 3354121.45 frames. ], batch size: 43, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 00:59:38,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=627783.3333333334, ans=0.2 2024-09-25 00:59:41,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=627783.3333333334, ans=0.2 2024-09-25 00:59:55,047 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=22.5 2024-09-25 00:59:55,671 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=627830.0, ans=0.0 2024-09-25 01:00:06,936 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:00:09,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2024-09-25 01:00:33,291 INFO [train.py:1198] (0/4) Epoch 35, batch 2100, loss[loss=0.2198, ctc_loss=0.1441, cr_loss=0.3786, over 15078.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1264, cr_loss=0.343, over 3351856.64 frames. ], batch size: 89, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 01:00:41,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=627970.0, ans=0.125 2024-09-25 01:01:07,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=628063.3333333334, ans=0.0 2024-09-25 01:01:10,509 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.271e+02 1.342e+02 1.455e+02 1.760e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-25 01:01:18,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=628063.3333333334, ans=0.125 2024-09-25 01:01:56,293 INFO [train.py:1198] (0/4) Epoch 35, batch 2150, loss[loss=0.2073, ctc_loss=0.136, cr_loss=0.3566, over 16735.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1264, cr_loss=0.3426, over 3346174.29 frames. ], batch size: 61, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 01:02:20,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=628250.0, ans=0.0 2024-09-25 01:02:29,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=628296.6666666666, ans=0.125 2024-09-25 01:02:39,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=628296.6666666666, ans=0.0 2024-09-25 01:03:24,311 INFO [train.py:1198] (0/4) Epoch 35, batch 2200, loss[loss=0.2171, ctc_loss=0.1466, cr_loss=0.3525, over 11932.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1266, cr_loss=0.3423, over 3340080.89 frames. ], batch size: 123, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 01:03:49,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-25 01:04:01,431 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.291e+02 1.359e+02 1.454e+02 2.045e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 01:04:03,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2024-09-25 01:04:04,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=628530.0, ans=0.0 2024-09-25 01:04:09,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=628530.0, ans=0.125 2024-09-25 01:04:44,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2024-09-25 01:04:45,060 INFO [train.py:1198] (0/4) Epoch 35, batch 2250, loss[loss=0.1574, ctc_loss=0.09822, cr_loss=0.2961, over 17021.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1261, cr_loss=0.3422, over 3346627.53 frames. ], batch size: 39, lr: 3.40e-03, grad_scale: 32.0 2024-09-25 01:04:59,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=628716.6666666666, ans=0.125 2024-09-25 01:05:04,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=628716.6666666666, ans=0.04949747468305833 2024-09-25 01:05:04,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=628716.6666666666, ans=0.07 2024-09-25 01:05:42,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=15.0 2024-09-25 01:05:43,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=628810.0, ans=0.0 2024-09-25 01:06:05,297 INFO [train.py:1198] (0/4) Epoch 35, batch 2300, loss[loss=0.1725, ctc_loss=0.1103, cr_loss=0.3109, over 17301.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.3429, over 3354724.45 frames. ], batch size: 46, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:06:35,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=628950.0, ans=0.125 2024-09-25 01:06:44,417 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.311e+02 1.390e+02 1.542e+02 2.527e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 01:06:49,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=628996.6666666666, ans=0.125 2024-09-25 01:06:51,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=628996.6666666666, ans=0.2 2024-09-25 01:07:04,088 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=629043.3333333334, ans=0.125 2024-09-25 01:07:20,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=629090.0, ans=0.025 2024-09-25 01:07:29,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2024-09-25 01:07:30,083 INFO [train.py:1198] (0/4) Epoch 35, batch 2350, loss[loss=0.2192, ctc_loss=0.1505, cr_loss=0.3435, over 11600.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1263, cr_loss=0.3429, over 3351714.53 frames. ], batch size: 123, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:07:38,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=629136.6666666666, ans=0.0 2024-09-25 01:08:11,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=22.5 2024-09-25 01:08:20,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2024-09-25 01:08:29,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629276.6666666666, ans=0.1 2024-09-25 01:08:34,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=629276.6666666666, ans=0.125 2024-09-25 01:08:38,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=629323.3333333334, ans=0.0 2024-09-25 01:08:55,603 INFO [train.py:1198] (0/4) Epoch 35, batch 2400, loss[loss=0.2006, ctc_loss=0.1288, cr_loss=0.3586, over 17014.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1267, cr_loss=0.3432, over 3345931.30 frames. ], batch size: 51, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:09:00,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=629370.0, ans=0.2 2024-09-25 01:09:23,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=629416.6666666666, ans=0.0 2024-09-25 01:09:32,456 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.057e+02 1.256e+02 1.334e+02 1.415e+02 2.339e+02, threshold=2.669e+02, percent-clipped=0.0 2024-09-25 01:09:39,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.11 vs. limit=15.0 2024-09-25 01:09:47,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=629510.0, ans=0.125 2024-09-25 01:10:15,372 INFO [train.py:1198] (0/4) Epoch 35, batch 2450, loss[loss=0.1649, ctc_loss=0.1043, cr_loss=0.3033, over 16953.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1267, cr_loss=0.344, over 3344431.56 frames. ], batch size: 42, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:10:26,775 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:10:45,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=22.5 2024-09-25 01:10:49,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=629696.6666666666, ans=0.125 2024-09-25 01:11:16,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-09-25 01:11:33,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=629790.0, ans=0.1 2024-09-25 01:11:33,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629790.0, ans=0.1 2024-09-25 01:11:37,757 INFO [train.py:1198] (0/4) Epoch 35, batch 2500, loss[loss=0.1884, ctc_loss=0.1221, cr_loss=0.3314, over 17025.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1273, cr_loss=0.3456, over 3354195.91 frames. ], batch size: 44, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:12:16,800 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.245e+02 1.351e+02 1.450e+02 1.965e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-25 01:12:23,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=629930.0, ans=0.1 2024-09-25 01:12:23,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2024-09-25 01:13:05,482 INFO [train.py:1198] (0/4) Epoch 35, batch 2550, loss[loss=0.1804, ctc_loss=0.1147, cr_loss=0.3283, over 17260.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1264, cr_loss=0.3439, over 3360579.78 frames. ], batch size: 44, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:13:27,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=630116.6666666666, ans=0.0 2024-09-25 01:13:31,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=630116.6666666666, ans=0.025 2024-09-25 01:13:50,253 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=630163.3333333334, ans=0.125 2024-09-25 01:14:14,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=630256.6666666666, ans=0.025 2024-09-25 01:14:25,411 INFO [train.py:1198] (0/4) Epoch 35, batch 2600, loss[loss=0.1709, ctc_loss=0.1077, cr_loss=0.316, over 17203.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3441, over 3353980.05 frames. ], batch size: 47, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:15:02,358 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.261e+02 1.318e+02 1.420e+02 1.911e+02, threshold=2.636e+02, percent-clipped=0.0 2024-09-25 01:15:10,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=630396.6666666666, ans=0.2 2024-09-25 01:15:28,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-25 01:15:45,580 INFO [train.py:1198] (0/4) Epoch 35, batch 2650, loss[loss=0.1859, ctc_loss=0.1224, cr_loss=0.3174, over 16749.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3428, over 3354731.93 frames. ], batch size: 61, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:15:53,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=630536.6666666666, ans=0.125 2024-09-25 01:16:00,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=630583.3333333334, ans=0.125 2024-09-25 01:16:15,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630630.0, ans=0.1 2024-09-25 01:17:00,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=630723.3333333334, ans=0.09899494936611666 2024-09-25 01:17:08,145 INFO [train.py:1198] (0/4) Epoch 35, batch 2700, loss[loss=0.1882, ctc_loss=0.1212, cr_loss=0.335, over 16988.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.125, cr_loss=0.3414, over 3358697.95 frames. ], batch size: 51, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:17:16,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.77 vs. limit=15.0 2024-09-25 01:17:19,680 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.46 vs. limit=15.0 2024-09-25 01:17:22,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=630770.0, ans=0.2 2024-09-25 01:17:52,698 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.276e+02 1.350e+02 1.459e+02 1.855e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-25 01:18:05,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=630910.0, ans=10.0 2024-09-25 01:18:10,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=630910.0, ans=0.2 2024-09-25 01:18:16,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=22.5 2024-09-25 01:18:23,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=630956.6666666666, ans=0.0 2024-09-25 01:18:28,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=630956.6666666666, ans=0.125 2024-09-25 01:18:36,152 INFO [train.py:1198] (0/4) Epoch 35, batch 2750, loss[loss=0.1611, ctc_loss=0.1029, cr_loss=0.2909, over 17256.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1254, cr_loss=0.342, over 3360263.74 frames. ], batch size: 42, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:18:44,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=22.5 2024-09-25 01:18:45,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-09-25 01:19:02,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=631050.0, ans=0.125 2024-09-25 01:19:05,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=631050.0, ans=0.2 2024-09-25 01:19:45,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=631190.0, ans=0.2 2024-09-25 01:19:56,338 INFO [train.py:1198] (0/4) Epoch 35, batch 2800, loss[loss=0.1971, ctc_loss=0.1266, cr_loss=0.3523, over 17117.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.126, cr_loss=0.343, over 3346668.46 frames. ], batch size: 40, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:20:01,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=631236.6666666666, ans=0.125 2024-09-25 01:20:33,123 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.248e+02 1.324e+02 1.411e+02 2.015e+02, threshold=2.647e+02, percent-clipped=0.0 2024-09-25 01:20:38,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=631330.0, ans=0.0 2024-09-25 01:20:49,942 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2024-09-25 01:20:54,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=631376.6666666666, ans=0.025 2024-09-25 01:20:55,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=631376.6666666666, ans=0.125 2024-09-25 01:21:16,335 INFO [train.py:1198] (0/4) Epoch 35, batch 2850, loss[loss=0.2051, ctc_loss=0.1306, cr_loss=0.3726, over 17167.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1261, cr_loss=0.3428, over 3347643.74 frames. ], batch size: 45, lr: 3.39e-03, grad_scale: 64.0 2024-09-25 01:21:27,769 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=12.0 2024-09-25 01:21:38,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631516.6666666666, ans=0.1 2024-09-25 01:21:49,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=631563.3333333334, ans=0.2 2024-09-25 01:22:25,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=631656.6666666666, ans=0.0 2024-09-25 01:22:25,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=22.5 2024-09-25 01:22:36,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=631656.6666666666, ans=0.125 2024-09-25 01:22:46,462 INFO [train.py:1198] (0/4) Epoch 35, batch 2900, loss[loss=0.1937, ctc_loss=0.1259, cr_loss=0.3389, over 17011.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1268, cr_loss=0.3449, over 3349250.08 frames. ], batch size: 44, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:23:17,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=15.0 2024-09-25 01:23:24,648 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.283e+02 1.348e+02 1.424e+02 1.926e+02, threshold=2.696e+02, percent-clipped=0.0 2024-09-25 01:23:31,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=631796.6666666666, ans=0.125 2024-09-25 01:24:06,674 INFO [train.py:1198] (0/4) Epoch 35, batch 2950, loss[loss=0.1762, ctc_loss=0.1116, cr_loss=0.3235, over 16920.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1267, cr_loss=0.3449, over 3349338.03 frames. ], batch size: 42, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:24:15,664 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2024-09-25 01:24:23,791 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=15.0 2024-09-25 01:24:27,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.92 vs. limit=5.0 2024-09-25 01:24:50,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=632030.0, ans=0.2 2024-09-25 01:24:57,361 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.26 vs. limit=15.0 2024-09-25 01:25:19,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=632123.3333333334, ans=0.125 2024-09-25 01:25:26,982 INFO [train.py:1198] (0/4) Epoch 35, batch 3000, loss[loss=0.2063, ctc_loss=0.1334, cr_loss=0.3645, over 17200.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1262, cr_loss=0.344, over 3362000.68 frames. ], batch size: 55, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:25:26,983 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 01:25:42,187 INFO [train.py:1230] (0/4) Epoch 35, validation: loss=0.03538, ctc_loss=0.03538, cr_loss=9.094e-15, over 944034.00 frames. 2024-09-25 01:25:42,188 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 01:25:56,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=632216.6666666666, ans=0.0 2024-09-25 01:26:19,534 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.295e+02 1.357e+02 1.448e+02 2.181e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 01:26:22,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=632263.3333333334, ans=0.125 2024-09-25 01:27:02,791 INFO [train.py:1198] (0/4) Epoch 35, batch 3050, loss[loss=0.2006, ctc_loss=0.1316, cr_loss=0.3449, over 16540.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1258, cr_loss=0.3427, over 3354048.89 frames. ], batch size: 66, lr: 3.39e-03, grad_scale: 32.0 2024-09-25 01:28:00,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=632543.3333333334, ans=0.125 2024-09-25 01:28:21,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=632636.6666666666, ans=0.125 2024-09-25 01:28:23,386 INFO [train.py:1198] (0/4) Epoch 35, batch 3100, loss[loss=0.2032, ctc_loss=0.1307, cr_loss=0.3626, over 17301.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1267, cr_loss=0.3445, over 3346794.83 frames. ], batch size: 49, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:28:53,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=632730.0, ans=0.125 2024-09-25 01:29:01,046 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.287e+02 1.342e+02 1.471e+02 1.803e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-25 01:29:40,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=632823.3333333334, ans=0.05 2024-09-25 01:29:41,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=632823.3333333334, ans=0.2 2024-09-25 01:29:45,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=632870.0, ans=0.1 2024-09-25 01:29:46,599 INFO [train.py:1198] (0/4) Epoch 35, batch 3150, loss[loss=0.2393, ctc_loss=0.1629, cr_loss=0.382, over 11363.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3443, over 3337058.07 frames. ], batch size: 123, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:29:49,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=632870.0, ans=0.015 2024-09-25 01:30:18,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=632963.3333333334, ans=0.2 2024-09-25 01:30:43,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=633010.0, ans=0.125 2024-09-25 01:30:46,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=633010.0, ans=0.2 2024-09-25 01:30:54,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=633056.6666666666, ans=0.04949747468305833 2024-09-25 01:30:58,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=633056.6666666666, ans=0.125 2024-09-25 01:31:02,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=633056.6666666666, ans=0.125 2024-09-25 01:31:04,923 INFO [train.py:1198] (0/4) Epoch 35, batch 3200, loss[loss=0.221, ctc_loss=0.1473, cr_loss=0.3685, over 15964.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1262, cr_loss=0.3438, over 3347157.84 frames. ], batch size: 74, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:31:20,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=633150.0, ans=0.125 2024-09-25 01:31:28,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=633150.0, ans=0.125 2024-09-25 01:31:38,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=633196.6666666666, ans=0.0 2024-09-25 01:31:42,383 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.282e+02 1.354e+02 1.513e+02 2.603e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-25 01:31:42,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633196.6666666666, ans=0.0 2024-09-25 01:31:58,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=633243.3333333334, ans=0.0 2024-09-25 01:32:02,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=633243.3333333334, ans=0.0 2024-09-25 01:32:23,312 INFO [train.py:1198] (0/4) Epoch 35, batch 3250, loss[loss=0.1652, ctc_loss=0.1062, cr_loss=0.2952, over 17274.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1266, cr_loss=0.3445, over 3350417.25 frames. ], batch size: 42, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:32:26,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=633336.6666666666, ans=0.1 2024-09-25 01:32:31,270 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:32:52,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=633430.0, ans=0.0 2024-09-25 01:32:54,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.86 vs. limit=10.0 2024-09-25 01:32:57,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=633430.0, ans=0.125 2024-09-25 01:33:40,963 INFO [train.py:1198] (0/4) Epoch 35, batch 3300, loss[loss=0.2147, ctc_loss=0.1411, cr_loss=0.3681, over 16097.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1271, cr_loss=0.346, over 3357804.52 frames. ], batch size: 74, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:33:57,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2024-09-25 01:34:06,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633616.6666666666, ans=0.1 2024-09-25 01:34:19,045 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.276e+02 1.335e+02 1.450e+02 3.347e+02, threshold=2.669e+02, percent-clipped=1.0 2024-09-25 01:34:22,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=633663.3333333334, ans=0.05 2024-09-25 01:34:38,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633710.0, ans=0.1 2024-09-25 01:34:43,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=633756.6666666666, ans=0.0 2024-09-25 01:34:57,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633756.6666666666, ans=0.1 2024-09-25 01:35:00,137 INFO [train.py:1198] (0/4) Epoch 35, batch 3350, loss[loss=0.1745, ctc_loss=0.1116, cr_loss=0.315, over 17292.00 frames. ], tot_loss[loss=0.1975, ctc_loss=0.128, cr_loss=0.3476, over 3352134.09 frames. ], batch size: 49, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:35:02,647 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-09-25 01:35:17,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=633850.0, ans=0.0 2024-09-25 01:35:23,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2024-09-25 01:35:32,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=12.0 2024-09-25 01:35:57,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.68 vs. limit=15.0 2024-09-25 01:36:18,435 INFO [train.py:1198] (0/4) Epoch 35, batch 3400, loss[loss=0.2294, ctc_loss=0.1534, cr_loss=0.38, over 17034.00 frames. ], tot_loss[loss=0.1978, ctc_loss=0.1283, cr_loss=0.3473, over 3333307.56 frames. ], batch size: 52, lr: 3.38e-03, grad_scale: 16.0 2024-09-25 01:36:27,165 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-09-25 01:36:34,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=634083.3333333334, ans=0.07 2024-09-25 01:36:57,298 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.280e+02 1.352e+02 1.458e+02 2.510e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-25 01:37:18,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=634176.6666666666, ans=0.2 2024-09-25 01:37:35,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=634223.3333333334, ans=0.0 2024-09-25 01:37:38,593 INFO [train.py:1198] (0/4) Epoch 35, batch 3450, loss[loss=0.215, ctc_loss=0.1399, cr_loss=0.3757, over 16979.00 frames. ], tot_loss[loss=0.1979, ctc_loss=0.1284, cr_loss=0.3475, over 3337755.87 frames. ], batch size: 53, lr: 3.38e-03, grad_scale: 16.0 2024-09-25 01:38:10,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.16 vs. limit=10.0 2024-09-25 01:38:16,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=634363.3333333334, ans=0.125 2024-09-25 01:38:20,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=634363.3333333334, ans=0.2 2024-09-25 01:38:24,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=634363.3333333334, ans=0.07 2024-09-25 01:38:41,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=634456.6666666666, ans=0.125 2024-09-25 01:38:46,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=634456.6666666666, ans=0.125 2024-09-25 01:38:52,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=634456.6666666666, ans=0.125 2024-09-25 01:38:58,785 INFO [train.py:1198] (0/4) Epoch 35, batch 3500, loss[loss=0.2102, ctc_loss=0.1361, cr_loss=0.3707, over 17098.00 frames. ], tot_loss[loss=0.1971, ctc_loss=0.1277, cr_loss=0.3471, over 3352695.78 frames. ], batch size: 49, lr: 3.38e-03, grad_scale: 16.0 2024-09-25 01:39:16,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=634550.0, ans=0.1 2024-09-25 01:39:29,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=634550.0, ans=0.0 2024-09-25 01:39:35,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=634596.6666666666, ans=0.2 2024-09-25 01:39:40,110 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.254e+02 1.331e+02 1.459e+02 2.382e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-25 01:39:56,300 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-136000.pt 2024-09-25 01:40:06,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=634690.0, ans=0.125 2024-09-25 01:40:12,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=22.5 2024-09-25 01:40:19,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.32 vs. limit=15.0 2024-09-25 01:40:22,747 INFO [train.py:1198] (0/4) Epoch 35, batch 3550, loss[loss=0.1541, ctc_loss=0.09785, cr_loss=0.2813, over 17114.00 frames. ], tot_loss[loss=0.1968, ctc_loss=0.1275, cr_loss=0.3466, over 3353160.25 frames. ], batch size: 40, lr: 3.38e-03, grad_scale: 16.0 2024-09-25 01:40:46,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=634783.3333333334, ans=0.0 2024-09-25 01:40:50,026 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2024-09-25 01:40:51,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=22.5 2024-09-25 01:41:28,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=634923.3333333334, ans=0.125 2024-09-25 01:41:32,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-09-25 01:41:40,837 INFO [train.py:1198] (0/4) Epoch 35, batch 3600, loss[loss=0.182, ctc_loss=0.116, cr_loss=0.3301, over 17069.00 frames. ], tot_loss[loss=0.1966, ctc_loss=0.1274, cr_loss=0.3457, over 3355341.87 frames. ], batch size: 46, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:41:45,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=634970.0, ans=0.0 2024-09-25 01:42:19,816 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.292e+02 1.369e+02 1.496e+02 2.107e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 01:42:26,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=635110.0, ans=0.125 2024-09-25 01:42:37,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=635110.0, ans=0.2 2024-09-25 01:42:40,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635110.0, ans=0.1 2024-09-25 01:42:58,850 INFO [train.py:1198] (0/4) Epoch 35, batch 3650, loss[loss=0.1674, ctc_loss=0.1067, cr_loss=0.3033, over 16972.00 frames. ], tot_loss[loss=0.1967, ctc_loss=0.1275, cr_loss=0.3456, over 3342858.41 frames. ], batch size: 42, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:43:15,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-09-25 01:43:25,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=635250.0, ans=0.025 2024-09-25 01:43:44,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2024-09-25 01:44:16,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=635436.6666666666, ans=0.0 2024-09-25 01:44:17,737 INFO [train.py:1198] (0/4) Epoch 35, batch 3700, loss[loss=0.1618, ctc_loss=0.1016, cr_loss=0.3011, over 16940.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1272, cr_loss=0.3451, over 3335471.14 frames. ], batch size: 42, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:44:46,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=635483.3333333334, ans=0.2 2024-09-25 01:44:56,879 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.290e+02 1.367e+02 1.473e+02 2.318e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 01:45:08,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=635576.6666666666, ans=0.125 2024-09-25 01:45:21,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=22.5 2024-09-25 01:45:25,072 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2024-09-25 01:45:36,670 INFO [train.py:1198] (0/4) Epoch 35, batch 3750, loss[loss=0.1565, ctc_loss=0.09958, cr_loss=0.2847, over 16950.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1265, cr_loss=0.3437, over 3343474.52 frames. ], batch size: 42, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:45:41,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.50 vs. limit=10.0 2024-09-25 01:45:51,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=635716.6666666666, ans=0.0 2024-09-25 01:46:00,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=635716.6666666666, ans=0.125 2024-09-25 01:46:09,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=635763.3333333334, ans=0.0 2024-09-25 01:46:20,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635763.3333333334, ans=0.125 2024-09-25 01:46:30,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=22.5 2024-09-25 01:46:46,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=635856.6666666666, ans=0.0 2024-09-25 01:46:48,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=635856.6666666666, ans=0.0 2024-09-25 01:46:55,870 INFO [train.py:1198] (0/4) Epoch 35, batch 3800, loss[loss=0.1867, ctc_loss=0.12, cr_loss=0.3331, over 17202.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1269, cr_loss=0.3432, over 3320476.70 frames. ], batch size: 41, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:47:21,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=635950.0, ans=0.1 2024-09-25 01:47:35,578 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.304e+02 1.388e+02 1.504e+02 2.196e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 01:47:39,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=635996.6666666666, ans=0.0 2024-09-25 01:47:40,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=635996.6666666666, ans=0.125 2024-09-25 01:47:41,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=12.0 2024-09-25 01:48:16,222 INFO [train.py:1198] (0/4) Epoch 35, batch 3850, loss[loss=0.2189, ctc_loss=0.1474, cr_loss=0.3576, over 11796.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.127, cr_loss=0.3426, over 3277238.91 frames. ], batch size: 123, lr: 3.38e-03, grad_scale: 32.0 2024-09-25 01:48:38,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-09-25 01:48:39,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=636183.3333333334, ans=0.125 2024-09-25 01:48:44,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=636183.3333333334, ans=0.025 2024-09-25 01:48:49,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=636230.0, ans=0.125 2024-09-25 01:49:03,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=636276.6666666666, ans=0.0 2024-09-25 01:49:26,606 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-35.pt 2024-09-25 01:50:19,161 INFO [train.py:1198] (0/4) Epoch 36, batch 0, loss[loss=0.1735, ctc_loss=0.1108, cr_loss=0.3133, over 17212.00 frames. ], tot_loss[loss=0.1735, ctc_loss=0.1108, cr_loss=0.3133, over 17212.00 frames. ], batch size: 47, lr: 3.33e-03, grad_scale: 32.0 2024-09-25 01:50:19,162 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 01:50:34,830 INFO [train.py:1230] (0/4) Epoch 36, validation: loss=0.0356, ctc_loss=0.0356, cr_loss=9.615e-15, over 944034.00 frames. 2024-09-25 01:50:34,831 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 01:50:35,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=636351.3333333334, ans=0.0 2024-09-25 01:50:48,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=636351.3333333334, ans=0.025 2024-09-25 01:50:53,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-25 01:51:21,245 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.459e+02 1.589e+02 1.794e+02 2.988e+02, threshold=3.179e+02, percent-clipped=1.0 2024-09-25 01:51:49,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636538.0, ans=0.1 2024-09-25 01:51:51,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=636538.0, ans=0.125 2024-09-25 01:51:55,583 INFO [train.py:1198] (0/4) Epoch 36, batch 50, loss[loss=0.1853, ctc_loss=0.1187, cr_loss=0.333, over 17095.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1259, cr_loss=0.3419, over 760694.47 frames. ], batch size: 49, lr: 3.33e-03, grad_scale: 32.0 2024-09-25 01:51:59,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=636584.6666666666, ans=0.2 2024-09-25 01:52:05,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-09-25 01:52:08,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=636584.6666666666, ans=0.0 2024-09-25 01:52:17,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=22.5 2024-09-25 01:52:37,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=636678.0, ans=0.125 2024-09-25 01:52:38,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=636678.0, ans=0.125 2024-09-25 01:52:40,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=636678.0, ans=0.125 2024-09-25 01:53:09,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=636771.3333333334, ans=0.0 2024-09-25 01:53:21,630 INFO [train.py:1198] (0/4) Epoch 36, batch 100, loss[loss=0.2403, ctc_loss=0.1571, cr_loss=0.4158, over 17018.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3427, over 1332895.26 frames. ], batch size: 52, lr: 3.33e-03, grad_scale: 32.0 2024-09-25 01:53:30,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-09-25 01:53:30,654 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-25 01:53:46,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=636864.6666666666, ans=0.2 2024-09-25 01:53:51,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=636864.6666666666, ans=0.025 2024-09-25 01:54:11,065 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.276e+02 1.354e+02 1.441e+02 1.843e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-25 01:54:29,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=637004.6666666666, ans=0.0 2024-09-25 01:54:29,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=637004.6666666666, ans=0.125 2024-09-25 01:54:44,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=637004.6666666666, ans=0.0 2024-09-25 01:54:47,640 INFO [train.py:1198] (0/4) Epoch 36, batch 150, loss[loss=0.2178, ctc_loss=0.1436, cr_loss=0.371, over 17200.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1261, cr_loss=0.3438, over 1790691.80 frames. ], batch size: 55, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 01:55:04,358 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.12 vs. limit=10.0 2024-09-25 01:55:14,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=637098.0, ans=0.125 2024-09-25 01:55:16,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=637098.0, ans=0.125 2024-09-25 01:55:47,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=637191.3333333334, ans=0.125 2024-09-25 01:55:59,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=637238.0, ans=0.025 2024-09-25 01:56:07,386 INFO [train.py:1198] (0/4) Epoch 36, batch 200, loss[loss=0.201, ctc_loss=0.1293, cr_loss=0.3584, over 17303.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1257, cr_loss=0.3425, over 2132437.43 frames. ], batch size: 49, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 01:56:36,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=637331.3333333334, ans=0.1 2024-09-25 01:56:36,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=637331.3333333334, ans=0.125 2024-09-25 01:56:55,269 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.279e+02 1.373e+02 1.478e+02 2.081e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-25 01:57:00,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=637424.6666666666, ans=0.1 2024-09-25 01:57:00,465 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 01:57:25,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=637471.3333333334, ans=0.0 2024-09-25 01:57:29,960 INFO [train.py:1198] (0/4) Epoch 36, batch 250, loss[loss=0.1563, ctc_loss=0.09654, cr_loss=0.299, over 17092.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1266, cr_loss=0.344, over 2400946.58 frames. ], batch size: 40, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 01:57:41,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=637518.0, ans=0.0 2024-09-25 01:58:05,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=637611.3333333334, ans=0.0 2024-09-25 01:58:08,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=637611.3333333334, ans=0.125 2024-09-25 01:58:30,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=637658.0, ans=0.125 2024-09-25 01:58:52,445 INFO [train.py:1198] (0/4) Epoch 36, batch 300, loss[loss=0.1583, ctc_loss=0.09873, cr_loss=0.2977, over 17123.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1269, cr_loss=0.3449, over 2605294.05 frames. ], batch size: 40, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 01:59:45,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=637891.3333333334, ans=0.1 2024-09-25 01:59:45,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=637891.3333333334, ans=0.1 2024-09-25 01:59:46,603 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.037e+02 1.272e+02 1.363e+02 1.432e+02 1.912e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-25 01:59:49,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=22.5 2024-09-25 02:00:01,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=637938.0, ans=0.0 2024-09-25 02:00:10,883 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:00:17,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=637984.6666666666, ans=0.05 2024-09-25 02:00:18,410 INFO [train.py:1198] (0/4) Epoch 36, batch 350, loss[loss=0.1706, ctc_loss=0.1099, cr_loss=0.3037, over 17068.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.1271, cr_loss=0.3454, over 2776052.63 frames. ], batch size: 39, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:00:47,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=638031.3333333334, ans=0.125 2024-09-25 02:01:07,527 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2024-09-25 02:01:10,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-09-25 02:01:11,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638124.6666666666, ans=0.1 2024-09-25 02:01:21,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=638171.3333333334, ans=0.2 2024-09-25 02:01:29,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=638171.3333333334, ans=0.0 2024-09-25 02:01:37,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=638218.0, ans=0.125 2024-09-25 02:01:38,835 INFO [train.py:1198] (0/4) Epoch 36, batch 400, loss[loss=0.2224, ctc_loss=0.146, cr_loss=0.382, over 17008.00 frames. ], tot_loss[loss=0.1963, ctc_loss=0.1272, cr_loss=0.3452, over 2894376.85 frames. ], batch size: 53, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:02:21,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=638311.3333333334, ans=0.0 2024-09-25 02:02:23,015 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:02:24,975 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-09-25 02:02:25,250 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=12.0 2024-09-25 02:02:29,168 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.293e+02 1.367e+02 1.475e+02 2.656e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 02:02:29,531 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:03:01,277 INFO [train.py:1198] (0/4) Epoch 36, batch 450, loss[loss=0.1956, ctc_loss=0.1269, cr_loss=0.3438, over 17194.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1265, cr_loss=0.3439, over 3004433.56 frames. ], batch size: 55, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:03:22,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=638498.0, ans=0.125 2024-09-25 02:03:30,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=638498.0, ans=0.125 2024-09-25 02:04:16,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2024-09-25 02:04:27,073 INFO [train.py:1198] (0/4) Epoch 36, batch 500, loss[loss=0.1487, ctc_loss=0.09661, cr_loss=0.2604, over 17030.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.126, cr_loss=0.3433, over 3091578.46 frames. ], batch size: 39, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:04:34,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=638684.6666666666, ans=0.125 2024-09-25 02:04:44,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=638731.3333333334, ans=0.0 2024-09-25 02:05:19,275 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.049e+02 1.244e+02 1.315e+02 1.459e+02 2.015e+02, threshold=2.630e+02, percent-clipped=0.0 2024-09-25 02:05:26,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638824.6666666666, ans=0.1 2024-09-25 02:05:31,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=638824.6666666666, ans=0.125 2024-09-25 02:05:35,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=638871.3333333334, ans=0.0 2024-09-25 02:05:49,778 INFO [train.py:1198] (0/4) Epoch 36, batch 550, loss[loss=0.1656, ctc_loss=0.1038, cr_loss=0.3088, over 17284.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1248, cr_loss=0.3412, over 3152429.97 frames. ], batch size: 42, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:05:50,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638918.0, ans=0.1 2024-09-25 02:06:14,032 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:06:18,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=638964.6666666666, ans=0.125 2024-09-25 02:06:40,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-09-25 02:06:40,294 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2024-09-25 02:06:52,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=639104.6666666666, ans=0.125 2024-09-25 02:07:09,442 INFO [train.py:1198] (0/4) Epoch 36, batch 600, loss[loss=0.1609, ctc_loss=0.1037, cr_loss=0.2862, over 17268.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1243, cr_loss=0.3401, over 3200018.61 frames. ], batch size: 42, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:07:24,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2024-09-25 02:07:55,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=639244.6666666666, ans=0.125 2024-09-25 02:08:01,748 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.285e+02 1.361e+02 1.464e+02 2.442e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-25 02:08:06,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=639291.3333333334, ans=0.2 2024-09-25 02:08:08,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=639291.3333333334, ans=0.125 2024-09-25 02:08:10,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-09-25 02:08:11,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639291.3333333334, ans=0.1 2024-09-25 02:08:11,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=639291.3333333334, ans=0.125 2024-09-25 02:08:19,794 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=12.0 2024-09-25 02:08:22,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=639338.0, ans=0.0 2024-09-25 02:08:34,770 INFO [train.py:1198] (0/4) Epoch 36, batch 650, loss[loss=0.1891, ctc_loss=0.1194, cr_loss=0.3486, over 17298.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1247, cr_loss=0.3414, over 3235514.94 frames. ], batch size: 46, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:08:44,708 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:09:23,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=639524.6666666666, ans=0.125 2024-09-25 02:09:32,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=639524.6666666666, ans=0.125 2024-09-25 02:09:53,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=639571.3333333334, ans=0.125 2024-09-25 02:09:59,837 INFO [train.py:1198] (0/4) Epoch 36, batch 700, loss[loss=0.2114, ctc_loss=0.1379, cr_loss=0.3675, over 17051.00 frames. ], tot_loss[loss=0.1936, ctc_loss=0.1251, cr_loss=0.3425, over 3268317.04 frames. ], batch size: 52, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:10:08,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639618.0, ans=0.1 2024-09-25 02:10:11,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=639618.0, ans=0.125 2024-09-25 02:10:25,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639664.6666666666, ans=0.1 2024-09-25 02:10:28,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.76 vs. limit=10.0 2024-09-25 02:10:49,614 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.257e+02 1.341e+02 1.454e+02 1.758e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-25 02:11:00,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=639758.0, ans=0.1 2024-09-25 02:11:19,967 INFO [train.py:1198] (0/4) Epoch 36, batch 750, loss[loss=0.1809, ctc_loss=0.1145, cr_loss=0.332, over 17294.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1255, cr_loss=0.3432, over 3286166.81 frames. ], batch size: 46, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:11:35,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-09-25 02:11:36,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=639898.0, ans=0.125 2024-09-25 02:11:37,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=639898.0, ans=0.0 2024-09-25 02:12:08,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=639991.3333333334, ans=0.0 2024-09-25 02:12:42,671 INFO [train.py:1198] (0/4) Epoch 36, batch 800, loss[loss=0.1542, ctc_loss=0.09999, cr_loss=0.2713, over 17199.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1255, cr_loss=0.3425, over 3300205.39 frames. ], batch size: 41, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:12:47,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640084.6666666666, ans=0.1 2024-09-25 02:13:08,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=640131.3333333334, ans=0.2 2024-09-25 02:13:09,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=640131.3333333334, ans=0.0 2024-09-25 02:13:33,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=640224.6666666666, ans=0.125 2024-09-25 02:13:34,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=640224.6666666666, ans=0.125 2024-09-25 02:13:35,135 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.278e+02 1.351e+02 1.457e+02 1.942e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-25 02:13:42,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=640224.6666666666, ans=0.0 2024-09-25 02:13:54,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=640271.3333333334, ans=0.0 2024-09-25 02:14:08,466 INFO [train.py:1198] (0/4) Epoch 36, batch 850, loss[loss=0.1814, ctc_loss=0.1149, cr_loss=0.3323, over 17163.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1254, cr_loss=0.342, over 3303275.75 frames. ], batch size: 45, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:14:23,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=640364.6666666666, ans=0.0 2024-09-25 02:15:13,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=640504.6666666666, ans=0.0 2024-09-25 02:15:14,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=640504.6666666666, ans=0.1 2024-09-25 02:15:30,657 INFO [train.py:1198] (0/4) Epoch 36, batch 900, loss[loss=0.2339, ctc_loss=0.1522, cr_loss=0.4085, over 16869.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.125, cr_loss=0.3411, over 3318721.30 frames. ], batch size: 58, lr: 3.32e-03, grad_scale: 32.0 2024-09-25 02:15:37,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=640551.3333333334, ans=0.125 2024-09-25 02:16:01,809 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:16:05,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=22.5 2024-09-25 02:16:07,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=640644.6666666666, ans=0.125 2024-09-25 02:16:11,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=640644.6666666666, ans=0.0 2024-09-25 02:16:22,098 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.267e+02 1.323e+02 1.416e+02 1.789e+02, threshold=2.647e+02, percent-clipped=0.0 2024-09-25 02:16:22,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=640691.3333333334, ans=0.125 2024-09-25 02:16:51,069 INFO [train.py:1198] (0/4) Epoch 36, batch 950, loss[loss=0.1904, ctc_loss=0.1224, cr_loss=0.3399, over 17015.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1253, cr_loss=0.3423, over 3329158.68 frames. ], batch size: 44, lr: 3.32e-03, grad_scale: 16.0 2024-09-25 02:17:13,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=640831.3333333334, ans=0.125 2024-09-25 02:17:22,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=640831.3333333334, ans=0.125 2024-09-25 02:17:35,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=640878.0, ans=0.1 2024-09-25 02:17:41,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=640924.6666666666, ans=0.025 2024-09-25 02:17:54,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=640924.6666666666, ans=0.125 2024-09-25 02:18:16,109 INFO [train.py:1198] (0/4) Epoch 36, batch 1000, loss[loss=0.2225, ctc_loss=0.1446, cr_loss=0.3892, over 16500.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1256, cr_loss=0.3429, over 3333775.06 frames. ], batch size: 66, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:18:26,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=641018.0, ans=0.0 2024-09-25 02:19:10,045 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.268e+02 1.360e+02 1.441e+02 1.989e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-25 02:19:10,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=641158.0, ans=0.2 2024-09-25 02:19:22,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-25 02:19:41,706 INFO [train.py:1198] (0/4) Epoch 36, batch 1050, loss[loss=0.1973, ctc_loss=0.1294, cr_loss=0.3392, over 17018.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1257, cr_loss=0.3431, over 3336528.49 frames. ], batch size: 51, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:19:58,512 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.68 vs. limit=6.0 2024-09-25 02:20:17,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-09-25 02:20:23,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=641344.6666666666, ans=0.0 2024-09-25 02:21:01,290 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-25 02:21:01,839 INFO [train.py:1198] (0/4) Epoch 36, batch 1100, loss[loss=0.2047, ctc_loss=0.1363, cr_loss=0.3421, over 15975.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1253, cr_loss=0.3432, over 3346072.91 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:21:34,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.17 vs. limit=15.0 2024-09-25 02:21:34,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-09-25 02:21:37,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=641578.0, ans=0.125 2024-09-25 02:21:48,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=641624.6666666666, ans=0.125 2024-09-25 02:21:52,920 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.290e+02 1.394e+02 1.498e+02 2.022e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-25 02:22:13,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=641671.3333333334, ans=12.0 2024-09-25 02:22:24,255 INFO [train.py:1198] (0/4) Epoch 36, batch 1150, loss[loss=0.2357, ctc_loss=0.1598, cr_loss=0.3793, over 11910.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1258, cr_loss=0.3441, over 3345157.55 frames. ], batch size: 123, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:22:38,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=641764.6666666666, ans=0.05 2024-09-25 02:23:07,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641811.3333333334, ans=0.1 2024-09-25 02:23:09,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-09-25 02:23:13,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641858.0, ans=0.1 2024-09-25 02:23:21,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=641858.0, ans=0.025 2024-09-25 02:23:23,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=641858.0, ans=0.025 2024-09-25 02:23:23,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=641858.0, ans=0.125 2024-09-25 02:23:26,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=641858.0, ans=0.0 2024-09-25 02:23:28,845 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2024-09-25 02:23:49,552 INFO [train.py:1198] (0/4) Epoch 36, batch 1200, loss[loss=0.2106, ctc_loss=0.1414, cr_loss=0.3465, over 17232.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1258, cr_loss=0.3443, over 3350889.72 frames. ], batch size: 55, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:24:22,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642044.6666666666, ans=0.1 2024-09-25 02:24:40,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=642091.3333333334, ans=10.0 2024-09-25 02:24:43,286 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.264e+02 1.352e+02 1.439e+02 3.839e+02, threshold=2.704e+02, percent-clipped=1.0 2024-09-25 02:24:50,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=642091.3333333334, ans=0.2 2024-09-25 02:25:12,609 INFO [train.py:1198] (0/4) Epoch 36, batch 1250, loss[loss=0.2204, ctc_loss=0.1422, cr_loss=0.3911, over 17028.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1259, cr_loss=0.3451, over 3360693.63 frames. ], batch size: 53, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:25:58,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=642324.6666666666, ans=0.125 2024-09-25 02:26:03,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=642324.6666666666, ans=0.1 2024-09-25 02:26:06,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=642324.6666666666, ans=0.025 2024-09-25 02:26:32,184 INFO [train.py:1198] (0/4) Epoch 36, batch 1300, loss[loss=0.1571, ctc_loss=0.1014, cr_loss=0.2785, over 17026.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1258, cr_loss=0.3441, over 3367281.99 frames. ], batch size: 44, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:26:56,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=642464.6666666666, ans=0.0 2024-09-25 02:27:11,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=642511.3333333334, ans=0.125 2024-09-25 02:27:16,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=642511.3333333334, ans=0.0 2024-09-25 02:27:21,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=642558.0, ans=0.125 2024-09-25 02:27:25,952 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.281e+02 1.359e+02 1.510e+02 2.229e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 02:27:29,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=642558.0, ans=0.2 2024-09-25 02:27:34,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=642558.0, ans=0.0 2024-09-25 02:27:57,336 INFO [train.py:1198] (0/4) Epoch 36, batch 1350, loss[loss=0.194, ctc_loss=0.1236, cr_loss=0.3519, over 17261.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1266, cr_loss=0.3452, over 3360991.98 frames. ], batch size: 44, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:28:10,367 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=7.708e-03 2024-09-25 02:28:33,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=642744.6666666666, ans=0.125 2024-09-25 02:28:46,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642791.3333333334, ans=0.1 2024-09-25 02:28:55,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=642791.3333333334, ans=0.0 2024-09-25 02:29:21,997 INFO [train.py:1198] (0/4) Epoch 36, batch 1400, loss[loss=0.1947, ctc_loss=0.1268, cr_loss=0.3395, over 16999.00 frames. ], tot_loss[loss=0.1962, ctc_loss=0.127, cr_loss=0.3461, over 3348175.71 frames. ], batch size: 51, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:29:22,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=642884.6666666666, ans=0.125 2024-09-25 02:29:41,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=642931.3333333334, ans=0.0 2024-09-25 02:30:00,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=642978.0, ans=0.0 2024-09-25 02:30:05,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=642978.0, ans=0.125 2024-09-25 02:30:07,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642978.0, ans=0.1 2024-09-25 02:30:14,813 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.323e+02 1.435e+02 1.580e+02 6.083e+02, threshold=2.870e+02, percent-clipped=1.0 2024-09-25 02:30:16,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=643024.6666666666, ans=0.1 2024-09-25 02:30:31,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=643071.3333333334, ans=10.0 2024-09-25 02:30:42,402 INFO [train.py:1198] (0/4) Epoch 36, batch 1450, loss[loss=0.2074, ctc_loss=0.1363, cr_loss=0.3554, over 15994.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1265, cr_loss=0.3452, over 3349920.52 frames. ], batch size: 74, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:30:49,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=643118.0, ans=0.125 2024-09-25 02:31:02,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=643164.6666666666, ans=0.2 2024-09-25 02:31:08,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=643164.6666666666, ans=0.0 2024-09-25 02:31:16,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=643211.3333333334, ans=0.125 2024-09-25 02:31:49,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=643304.6666666666, ans=0.125 2024-09-25 02:31:59,027 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2024-09-25 02:32:02,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2024-09-25 02:32:04,903 INFO [train.py:1198] (0/4) Epoch 36, batch 1500, loss[loss=0.2432, ctc_loss=0.1584, cr_loss=0.424, over 17215.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.1268, cr_loss=0.3458, over 3347179.78 frames. ], batch size: 55, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:32:05,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=643351.3333333334, ans=0.125 2024-09-25 02:32:46,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=643444.6666666666, ans=0.1 2024-09-25 02:33:00,745 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.291e+02 1.396e+02 1.518e+02 2.099e+02, threshold=2.792e+02, percent-clipped=0.0 2024-09-25 02:33:02,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=643491.3333333334, ans=0.0 2024-09-25 02:33:03,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=15.0 2024-09-25 02:33:13,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=643538.0, ans=0.125 2024-09-25 02:33:25,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643538.0, ans=0.1 2024-09-25 02:33:30,539 INFO [train.py:1198] (0/4) Epoch 36, batch 1550, loss[loss=0.2178, ctc_loss=0.1401, cr_loss=0.3886, over 16824.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1262, cr_loss=0.3448, over 3354962.97 frames. ], batch size: 61, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:33:52,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-25 02:34:08,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=643678.0, ans=10.0 2024-09-25 02:34:53,200 INFO [train.py:1198] (0/4) Epoch 36, batch 1600, loss[loss=0.2168, ctc_loss=0.144, cr_loss=0.3638, over 14787.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1265, cr_loss=0.345, over 3348902.24 frames. ], batch size: 88, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:34:56,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=643818.0, ans=0.04949747468305833 2024-09-25 02:35:04,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.07 vs. limit=6.0 2024-09-25 02:35:33,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=643911.3333333334, ans=0.0 2024-09-25 02:35:37,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=643911.3333333334, ans=0.04949747468305833 2024-09-25 02:35:46,529 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.268e+02 1.352e+02 1.455e+02 2.374e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-25 02:35:46,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=643958.0, ans=0.125 2024-09-25 02:36:03,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=644004.6666666666, ans=0.125 2024-09-25 02:36:14,451 INFO [train.py:1198] (0/4) Epoch 36, batch 1650, loss[loss=0.2094, ctc_loss=0.1344, cr_loss=0.3751, over 17086.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1263, cr_loss=0.3449, over 3353848.70 frames. ], batch size: 49, lr: 3.31e-03, grad_scale: 32.0 2024-09-25 02:36:22,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=644051.3333333334, ans=0.0 2024-09-25 02:36:24,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=644051.3333333334, ans=0.125 2024-09-25 02:36:29,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=644098.0, ans=0.125 2024-09-25 02:36:32,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=644098.0, ans=0.125 2024-09-25 02:36:38,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644098.0, ans=0.1 2024-09-25 02:36:38,859 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2024-09-25 02:36:58,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=644144.6666666666, ans=0.125 2024-09-25 02:37:32,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644238.0, ans=0.1 2024-09-25 02:37:37,127 INFO [train.py:1198] (0/4) Epoch 36, batch 1700, loss[loss=0.2109, ctc_loss=0.1453, cr_loss=0.328, over 12007.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3447, over 3357529.31 frames. ], batch size: 123, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:38:02,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=644331.3333333334, ans=0.04949747468305833 2024-09-25 02:38:04,481 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2024-09-25 02:38:18,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=644378.0, ans=0.2 2024-09-25 02:38:37,024 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.256e+02 1.337e+02 1.426e+02 1.735e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-25 02:39:01,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=644518.0, ans=0.2 2024-09-25 02:39:02,769 INFO [train.py:1198] (0/4) Epoch 36, batch 1750, loss[loss=0.2131, ctc_loss=0.138, cr_loss=0.3754, over 17235.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1263, cr_loss=0.3444, over 3366921.69 frames. ], batch size: 47, lr: 3.31e-03, grad_scale: 16.0 2024-09-25 02:39:02,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=644518.0, ans=0.0 2024-09-25 02:39:26,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=644564.6666666666, ans=0.125 2024-09-25 02:39:46,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=6.0 2024-09-25 02:39:50,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=644611.3333333334, ans=0.125 2024-09-25 02:39:56,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=644658.0, ans=0.0 2024-09-25 02:40:24,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=644751.3333333334, ans=0.125 2024-09-25 02:40:25,301 INFO [train.py:1198] (0/4) Epoch 36, batch 1800, loss[loss=0.1751, ctc_loss=0.1138, cr_loss=0.3065, over 17171.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1267, cr_loss=0.3449, over 3363120.94 frames. ], batch size: 41, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:40:51,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-25 02:41:03,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-09-25 02:41:14,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=12.0 2024-09-25 02:41:19,904 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.269e+02 1.345e+02 1.435e+02 2.144e+02, threshold=2.691e+02, percent-clipped=0.0 2024-09-25 02:41:26,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=644891.3333333334, ans=0.125 2024-09-25 02:41:45,751 INFO [train.py:1198] (0/4) Epoch 36, batch 1850, loss[loss=0.2001, ctc_loss=0.1304, cr_loss=0.3487, over 17233.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3444, over 3346140.59 frames. ], batch size: 50, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:41:57,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-25 02:42:51,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=645124.6666666666, ans=0.125 2024-09-25 02:42:53,309 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2024-09-25 02:43:04,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=12.0 2024-09-25 02:43:10,555 INFO [train.py:1198] (0/4) Epoch 36, batch 1900, loss[loss=0.216, ctc_loss=0.1432, cr_loss=0.3636, over 15138.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1265, cr_loss=0.3441, over 3345724.56 frames. ], batch size: 89, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:43:15,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=22.5 2024-09-25 02:43:40,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=645264.6666666666, ans=0.125 2024-09-25 02:43:43,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-09-25 02:44:10,239 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.279e+02 1.358e+02 1.455e+02 1.986e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 02:44:22,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=645404.6666666666, ans=0.0 2024-09-25 02:44:31,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=645404.6666666666, ans=0.125 2024-09-25 02:44:36,299 INFO [train.py:1198] (0/4) Epoch 36, batch 1950, loss[loss=0.1909, ctc_loss=0.121, cr_loss=0.3493, over 17307.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1256, cr_loss=0.3433, over 3354281.48 frames. ], batch size: 49, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:44:54,264 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 02:44:57,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=645498.0, ans=0.04949747468305833 2024-09-25 02:44:59,456 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=12.0 2024-09-25 02:45:05,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=645498.0, ans=0.0 2024-09-25 02:45:11,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=645544.6666666666, ans=0.0 2024-09-25 02:45:50,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.68 vs. limit=10.0 2024-09-25 02:45:56,196 INFO [train.py:1198] (0/4) Epoch 36, batch 2000, loss[loss=0.1925, ctc_loss=0.1259, cr_loss=0.3329, over 17148.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1253, cr_loss=0.3427, over 3355134.26 frames. ], batch size: 48, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:45:58,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=645684.6666666666, ans=0.0 2024-09-25 02:46:07,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=645684.6666666666, ans=0.125 2024-09-25 02:46:12,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=645731.3333333334, ans=0.1 2024-09-25 02:46:26,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=645778.0, ans=0.2 2024-09-25 02:46:50,156 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.284e+02 1.382e+02 1.479e+02 3.122e+02, threshold=2.765e+02, percent-clipped=1.0 2024-09-25 02:47:01,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=645871.3333333334, ans=0.125 2024-09-25 02:47:03,485 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2024-09-25 02:47:17,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-09-25 02:47:18,702 INFO [train.py:1198] (0/4) Epoch 36, batch 2050, loss[loss=0.2027, ctc_loss=0.1325, cr_loss=0.3513, over 17021.00 frames. ], tot_loss[loss=0.1937, ctc_loss=0.1252, cr_loss=0.3423, over 3356631.13 frames. ], batch size: 56, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:47:42,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=645964.6666666666, ans=0.125 2024-09-25 02:47:55,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=646011.3333333334, ans=0.0 2024-09-25 02:48:34,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=646104.6666666666, ans=0.2 2024-09-25 02:48:39,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=646104.6666666666, ans=0.125 2024-09-25 02:48:40,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=646104.6666666666, ans=0.125 2024-09-25 02:48:43,712 INFO [train.py:1198] (0/4) Epoch 36, batch 2100, loss[loss=0.2088, ctc_loss=0.1372, cr_loss=0.3581, over 16772.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1254, cr_loss=0.3422, over 3356834.48 frames. ], batch size: 61, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:49:20,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=646244.6666666666, ans=0.0 2024-09-25 02:49:29,858 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2024-09-25 02:49:34,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=646291.3333333334, ans=0.125 2024-09-25 02:49:36,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=646291.3333333334, ans=0.125 2024-09-25 02:49:40,606 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.268e+02 1.352e+02 1.474e+02 3.330e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-25 02:49:53,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=646338.0, ans=0.2 2024-09-25 02:50:00,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=646338.0, ans=0.125 2024-09-25 02:50:06,070 INFO [train.py:1198] (0/4) Epoch 36, batch 2150, loss[loss=0.2082, ctc_loss=0.1356, cr_loss=0.3628, over 17293.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3443, over 3346569.30 frames. ], batch size: 49, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:50:19,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=646384.6666666666, ans=0.1 2024-09-25 02:50:19,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=646384.6666666666, ans=0.07 2024-09-25 02:50:22,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=646431.3333333334, ans=0.0 2024-09-25 02:50:22,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=6.0 2024-09-25 02:50:55,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=646524.6666666666, ans=0.125 2024-09-25 02:51:19,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=646571.3333333334, ans=0.125 2024-09-25 02:51:25,843 INFO [train.py:1198] (0/4) Epoch 36, batch 2200, loss[loss=0.1819, ctc_loss=0.1172, cr_loss=0.3238, over 17267.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.127, cr_loss=0.3452, over 3347561.31 frames. ], batch size: 44, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:51:34,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=646618.0, ans=0.125 2024-09-25 02:51:47,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=646664.6666666666, ans=0.125 2024-09-25 02:51:57,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=646664.6666666666, ans=0.0 2024-09-25 02:52:08,100 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2024-09-25 02:52:15,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=646758.0, ans=0.2 2024-09-25 02:52:22,864 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.282e+02 1.343e+02 1.455e+02 1.826e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-25 02:52:48,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=646804.6666666666, ans=0.125 2024-09-25 02:52:51,064 INFO [train.py:1198] (0/4) Epoch 36, batch 2250, loss[loss=0.2019, ctc_loss=0.1283, cr_loss=0.3682, over 17220.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1266, cr_loss=0.3451, over 3342459.73 frames. ], batch size: 47, lr: 3.30e-03, grad_scale: 32.0 2024-09-25 02:52:51,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=646851.3333333334, ans=0.2 2024-09-25 02:53:01,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=646851.3333333334, ans=0.125 2024-09-25 02:53:04,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=646851.3333333334, ans=0.125 2024-09-25 02:53:11,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-09-25 02:53:24,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=646944.6666666666, ans=0.125 2024-09-25 02:54:14,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=647084.6666666666, ans=0.125 2024-09-25 02:54:16,047 INFO [train.py:1198] (0/4) Epoch 36, batch 2300, loss[loss=0.1948, ctc_loss=0.1257, cr_loss=0.3455, over 17311.00 frames. ], tot_loss[loss=0.1952, ctc_loss=0.1263, cr_loss=0.3443, over 3346632.31 frames. ], batch size: 51, lr: 3.30e-03, grad_scale: 8.0 2024-09-25 02:54:54,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=647178.0, ans=0.125 2024-09-25 02:54:56,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=647178.0, ans=0.125 2024-09-25 02:55:14,011 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.282e+02 1.356e+02 1.478e+02 2.427e+02, threshold=2.712e+02, percent-clipped=0.0 2024-09-25 02:55:19,699 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2024-09-25 02:55:25,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=647271.3333333334, ans=0.025 2024-09-25 02:55:36,334 INFO [train.py:1198] (0/4) Epoch 36, batch 2350, loss[loss=0.178, ctc_loss=0.1151, cr_loss=0.3146, over 16929.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1265, cr_loss=0.3448, over 3354987.57 frames. ], batch size: 42, lr: 3.30e-03, grad_scale: 8.0 2024-09-25 02:55:38,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=647318.0, ans=0.0 2024-09-25 02:56:00,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=647364.6666666666, ans=10.0 2024-09-25 02:56:07,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=647411.3333333334, ans=0.125 2024-09-25 02:56:23,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=647458.0, ans=0.125 2024-09-25 02:56:51,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=647504.6666666666, ans=0.0 2024-09-25 02:56:58,845 INFO [train.py:1198] (0/4) Epoch 36, batch 2400, loss[loss=0.1928, ctc_loss=0.1255, cr_loss=0.3364, over 17049.00 frames. ], tot_loss[loss=0.1955, ctc_loss=0.1265, cr_loss=0.3453, over 3359059.56 frames. ], batch size: 39, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:57:03,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=647551.3333333334, ans=0.125 2024-09-25 02:57:04,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=647551.3333333334, ans=0.05 2024-09-25 02:57:47,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=647691.3333333334, ans=0.2 2024-09-25 02:57:58,956 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.280e+02 1.384e+02 1.461e+02 3.295e+02, threshold=2.768e+02, percent-clipped=1.0 2024-09-25 02:58:21,139 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=647738.0, ans=0.0 2024-09-25 02:58:24,117 INFO [train.py:1198] (0/4) Epoch 36, batch 2450, loss[loss=0.1993, ctc_loss=0.1272, cr_loss=0.3604, over 16831.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1261, cr_loss=0.3448, over 3358869.90 frames. ], batch size: 61, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 02:58:37,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=647784.6666666666, ans=0.0 2024-09-25 02:58:56,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=647831.3333333334, ans=0.0 2024-09-25 02:59:27,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=647924.6666666666, ans=0.125 2024-09-25 02:59:46,720 INFO [train.py:1198] (0/4) Epoch 36, batch 2500, loss[loss=0.2507, ctc_loss=0.1668, cr_loss=0.4195, over 15107.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1262, cr_loss=0.3439, over 3351971.35 frames. ], batch size: 89, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 03:00:09,920 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=12.0 2024-09-25 03:00:44,516 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.286e+02 1.353e+02 1.494e+02 3.015e+02, threshold=2.705e+02, percent-clipped=1.0 2024-09-25 03:00:47,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=648158.0, ans=0.125 2024-09-25 03:00:49,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=648204.6666666666, ans=0.02 2024-09-25 03:00:52,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=648204.6666666666, ans=0.025 2024-09-25 03:00:59,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=648204.6666666666, ans=0.125 2024-09-25 03:01:06,742 INFO [train.py:1198] (0/4) Epoch 36, batch 2550, loss[loss=0.1865, ctc_loss=0.1174, cr_loss=0.3454, over 17023.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.126, cr_loss=0.3438, over 3356295.13 frames. ], batch size: 44, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 03:02:02,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=22.5 2024-09-25 03:02:31,701 INFO [train.py:1198] (0/4) Epoch 36, batch 2600, loss[loss=0.2153, ctc_loss=0.1427, cr_loss=0.3629, over 16594.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.126, cr_loss=0.3438, over 3352096.39 frames. ], batch size: 66, lr: 3.30e-03, grad_scale: 16.0 2024-09-25 03:02:52,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=648531.3333333334, ans=0.07 2024-09-25 03:02:57,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=648531.3333333334, ans=0.125 2024-09-25 03:03:05,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2024-09-25 03:03:18,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648578.0, ans=0.1 2024-09-25 03:03:31,214 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.288e+02 1.366e+02 1.471e+02 2.094e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-25 03:03:46,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=648671.3333333334, ans=0.125 2024-09-25 03:03:56,675 INFO [train.py:1198] (0/4) Epoch 36, batch 2650, loss[loss=0.1874, ctc_loss=0.1211, cr_loss=0.3316, over 17138.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1259, cr_loss=0.3436, over 3357054.67 frames. ], batch size: 48, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:04:30,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=648811.3333333334, ans=0.025 2024-09-25 03:04:30,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=648811.3333333334, ans=0.0 2024-09-25 03:05:04,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=12.0 2024-09-25 03:05:10,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=22.5 2024-09-25 03:05:12,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=648904.6666666666, ans=0.0 2024-09-25 03:05:16,418 INFO [train.py:1198] (0/4) Epoch 36, batch 2700, loss[loss=0.1996, ctc_loss=0.1286, cr_loss=0.3549, over 17072.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1252, cr_loss=0.3429, over 3364016.27 frames. ], batch size: 46, lr: 3.29e-03, grad_scale: 8.0 2024-09-25 03:05:17,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2024-09-25 03:05:18,321 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:05:24,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=648951.3333333334, ans=0.0 2024-09-25 03:05:29,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=648951.3333333334, ans=0.2 2024-09-25 03:05:29,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=648951.3333333334, ans=0.125 2024-09-25 03:05:37,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=648998.0, ans=0.0 2024-09-25 03:05:40,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=648998.0, ans=0.125 2024-09-25 03:05:43,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=648998.0, ans=0.125 2024-09-25 03:06:15,209 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.266e+02 1.332e+02 1.436e+02 2.982e+02, threshold=2.664e+02, percent-clipped=1.0 2024-09-25 03:06:24,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=649138.0, ans=0.2 2024-09-25 03:06:26,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=649138.0, ans=0.0 2024-09-25 03:06:28,526 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=12.0 2024-09-25 03:06:33,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649138.0, ans=0.1 2024-09-25 03:06:35,963 INFO [train.py:1198] (0/4) Epoch 36, batch 2750, loss[loss=0.2054, ctc_loss=0.1333, cr_loss=0.3606, over 17212.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1246, cr_loss=0.341, over 3362487.25 frames. ], batch size: 55, lr: 3.29e-03, grad_scale: 8.0 2024-09-25 03:06:51,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=649184.6666666666, ans=0.125 2024-09-25 03:07:15,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=649278.0, ans=0.125 2024-09-25 03:07:50,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=649371.3333333334, ans=0.025 2024-09-25 03:08:01,228 INFO [train.py:1198] (0/4) Epoch 36, batch 2800, loss[loss=0.2164, ctc_loss=0.1399, cr_loss=0.3825, over 16744.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1249, cr_loss=0.3419, over 3354880.64 frames. ], batch size: 61, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:08:01,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=649418.0, ans=10.0 2024-09-25 03:08:13,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=649418.0, ans=0.0 2024-09-25 03:08:15,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=649418.0, ans=0.125 2024-09-25 03:08:31,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=649464.6666666666, ans=0.07 2024-09-25 03:09:05,753 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.257e+02 1.332e+02 1.427e+02 2.151e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-25 03:09:15,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=649604.6666666666, ans=0.125 2024-09-25 03:09:27,093 INFO [train.py:1198] (0/4) Epoch 36, batch 2850, loss[loss=0.2169, ctc_loss=0.1419, cr_loss=0.3753, over 17041.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1239, cr_loss=0.3403, over 3366381.71 frames. ], batch size: 51, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:09:39,033 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=12.0 2024-09-25 03:09:45,090 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:09:46,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=649698.0, ans=0.125 2024-09-25 03:09:59,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=649744.6666666666, ans=0.2 2024-09-25 03:10:05,778 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:10:46,602 INFO [train.py:1198] (0/4) Epoch 36, batch 2900, loss[loss=0.206, ctc_loss=0.1372, cr_loss=0.344, over 12317.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1248, cr_loss=0.3419, over 3357679.59 frames. ], batch size: 124, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:11:01,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=649931.3333333334, ans=0.125 2024-09-25 03:11:14,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2024-09-25 03:11:25,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=22.5 2024-09-25 03:11:45,704 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.278e+02 1.363e+02 1.442e+02 1.816e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 03:12:07,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=650118.0, ans=0.0 2024-09-25 03:12:08,899 INFO [train.py:1198] (0/4) Epoch 36, batch 2950, loss[loss=0.1941, ctc_loss=0.1225, cr_loss=0.3579, over 17305.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1256, cr_loss=0.3437, over 3362897.00 frames. ], batch size: 46, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:12:30,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650164.6666666666, ans=0.1 2024-09-25 03:12:45,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=650211.3333333334, ans=0.125 2024-09-25 03:12:56,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=650211.3333333334, ans=0.1 2024-09-25 03:13:20,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2024-09-25 03:13:32,850 INFO [train.py:1198] (0/4) Epoch 36, batch 3000, loss[loss=0.2205, ctc_loss=0.1452, cr_loss=0.3767, over 16479.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1261, cr_loss=0.3448, over 3361890.58 frames. ], batch size: 66, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:13:32,851 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 03:13:48,653 INFO [train.py:1230] (0/4) Epoch 36, validation: loss=0.03616, ctc_loss=0.03616, cr_loss=9.264e-15, over 944034.00 frames. 2024-09-25 03:13:48,653 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 03:13:50,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=650351.3333333334, ans=0.2 2024-09-25 03:14:12,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=650398.0, ans=0.125 2024-09-25 03:14:28,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=650444.6666666666, ans=0.125 2024-09-25 03:14:34,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650491.3333333334, ans=0.1 2024-09-25 03:14:42,847 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:14:46,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=650491.3333333334, ans=0.2 2024-09-25 03:14:46,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.90 vs. limit=22.5 2024-09-25 03:14:48,919 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.273e+02 1.377e+02 1.498e+02 1.977e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 03:15:09,706 INFO [train.py:1198] (0/4) Epoch 36, batch 3050, loss[loss=0.2095, ctc_loss=0.1367, cr_loss=0.3639, over 17056.00 frames. ], tot_loss[loss=0.1958, ctc_loss=0.1266, cr_loss=0.3458, over 3360183.32 frames. ], batch size: 46, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:15:25,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=650631.3333333334, ans=0.125 2024-09-25 03:15:33,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=650631.3333333334, ans=0.125 2024-09-25 03:15:39,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=650678.0, ans=0.95 2024-09-25 03:15:41,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=650678.0, ans=0.125 2024-09-25 03:15:46,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=650678.0, ans=10.0 2024-09-25 03:15:47,961 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.79 vs. limit=12.0 2024-09-25 03:15:49,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=650678.0, ans=0.125 2024-09-25 03:15:50,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=650678.0, ans=0.0 2024-09-25 03:15:57,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=650724.6666666666, ans=0.025 2024-09-25 03:16:08,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=650724.6666666666, ans=0.125 2024-09-25 03:16:12,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=650771.3333333334, ans=0.0 2024-09-25 03:16:19,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=650771.3333333334, ans=0.125 2024-09-25 03:16:24,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650771.3333333334, ans=0.1 2024-09-25 03:16:25,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=650771.3333333334, ans=0.0 2024-09-25 03:16:28,402 INFO [train.py:1198] (0/4) Epoch 36, batch 3100, loss[loss=0.1832, ctc_loss=0.1162, cr_loss=0.335, over 17367.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1266, cr_loss=0.3458, over 3358091.52 frames. ], batch size: 48, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:16:31,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=650818.0, ans=0.95 2024-09-25 03:16:58,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=12.0 2024-09-25 03:17:02,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=650911.3333333334, ans=0.0 2024-09-25 03:17:26,168 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.257e+02 1.357e+02 1.451e+02 2.020e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-25 03:17:36,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2024-09-25 03:17:45,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=651051.3333333334, ans=0.125 2024-09-25 03:17:46,710 INFO [train.py:1198] (0/4) Epoch 36, batch 3150, loss[loss=0.2133, ctc_loss=0.1379, cr_loss=0.377, over 17257.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1265, cr_loss=0.3452, over 3349195.60 frames. ], batch size: 55, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:18:03,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651098.0, ans=0.1 2024-09-25 03:18:10,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651098.0, ans=0.1 2024-09-25 03:18:15,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=651098.0, ans=0.125 2024-09-25 03:18:20,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.73 vs. limit=10.0 2024-09-25 03:18:31,305 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.05 vs. limit=15.0 2024-09-25 03:18:57,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=651238.0, ans=0.125 2024-09-25 03:19:05,119 INFO [train.py:1198] (0/4) Epoch 36, batch 3200, loss[loss=0.1673, ctc_loss=0.1073, cr_loss=0.3002, over 16273.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.1262, cr_loss=0.3443, over 3351299.12 frames. ], batch size: 36, lr: 3.29e-03, grad_scale: 32.0 2024-09-25 03:19:33,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=651331.3333333334, ans=0.125 2024-09-25 03:19:46,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=651378.0, ans=0.125 2024-09-25 03:19:53,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=651424.6666666666, ans=0.0 2024-09-25 03:19:54,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=651424.6666666666, ans=0.1 2024-09-25 03:19:56,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651424.6666666666, ans=0.1 2024-09-25 03:20:02,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2024-09-25 03:20:02,960 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.272e+02 1.367e+02 1.451e+02 1.708e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 03:20:09,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=651471.3333333334, ans=0.0 2024-09-25 03:20:23,376 INFO [train.py:1198] (0/4) Epoch 36, batch 3250, loss[loss=0.1781, ctc_loss=0.1152, cr_loss=0.3146, over 17106.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1265, cr_loss=0.3445, over 3349948.12 frames. ], batch size: 40, lr: 3.29e-03, grad_scale: 32.0 2024-09-25 03:20:31,415 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:20:35,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=651518.0, ans=0.0 2024-09-25 03:20:38,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.34 vs. limit=6.0 2024-09-25 03:20:40,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651564.6666666666, ans=0.1 2024-09-25 03:20:49,441 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:21:02,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=651611.3333333334, ans=0.2 2024-09-25 03:21:09,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=651611.3333333334, ans=0.2 2024-09-25 03:21:16,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=12.0 2024-09-25 03:21:27,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2024-09-25 03:21:33,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651704.6666666666, ans=0.1 2024-09-25 03:21:33,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=651704.6666666666, ans=0.0 2024-09-25 03:21:43,852 INFO [train.py:1198] (0/4) Epoch 36, batch 3300, loss[loss=0.1998, ctc_loss=0.131, cr_loss=0.3438, over 17298.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.126, cr_loss=0.3437, over 3360511.01 frames. ], batch size: 46, lr: 3.29e-03, grad_scale: 32.0 2024-09-25 03:21:50,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=651751.3333333334, ans=0.0 2024-09-25 03:22:06,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=651798.0, ans=0.95 2024-09-25 03:22:14,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=651798.0, ans=0.125 2024-09-25 03:22:14,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2024-09-25 03:22:15,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=651844.6666666666, ans=0.125 2024-09-25 03:22:44,635 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.051e+02 1.278e+02 1.344e+02 1.481e+02 2.113e+02, threshold=2.688e+02, percent-clipped=0.0 2024-09-25 03:23:03,252 INFO [train.py:1198] (0/4) Epoch 36, batch 3350, loss[loss=0.2197, ctc_loss=0.1497, cr_loss=0.3502, over 10995.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1261, cr_loss=0.3439, over 3361497.12 frames. ], batch size: 123, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:23:16,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=651984.6666666666, ans=0.0 2024-09-25 03:23:41,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=652078.0, ans=0.0 2024-09-25 03:23:55,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=652124.6666666666, ans=0.0 2024-09-25 03:24:03,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=652124.6666666666, ans=0.125 2024-09-25 03:24:06,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=652171.3333333334, ans=0.125 2024-09-25 03:24:20,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=652171.3333333334, ans=0.0 2024-09-25 03:24:23,605 INFO [train.py:1198] (0/4) Epoch 36, batch 3400, loss[loss=0.2191, ctc_loss=0.1431, cr_loss=0.38, over 17219.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1261, cr_loss=0.3433, over 3358754.76 frames. ], batch size: 55, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:24:33,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=652218.0, ans=0.125 2024-09-25 03:24:40,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=652264.6666666666, ans=0.0 2024-09-25 03:24:46,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-25 03:25:12,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=652358.0, ans=0.09899494936611666 2024-09-25 03:25:22,900 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.287e+02 1.358e+02 1.469e+02 2.050e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 03:25:34,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=652404.6666666666, ans=0.2 2024-09-25 03:25:44,232 INFO [train.py:1198] (0/4) Epoch 36, batch 3450, loss[loss=0.1889, ctc_loss=0.1214, cr_loss=0.3374, over 17295.00 frames. ], tot_loss[loss=0.1954, ctc_loss=0.1266, cr_loss=0.3445, over 3355653.86 frames. ], batch size: 46, lr: 3.29e-03, grad_scale: 16.0 2024-09-25 03:26:12,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=652498.0, ans=0.125 2024-09-25 03:26:21,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=652544.6666666666, ans=0.125 2024-09-25 03:26:28,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=652544.6666666666, ans=0.125 2024-09-25 03:26:34,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=652591.3333333334, ans=0.09899494936611666 2024-09-25 03:26:43,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=652591.3333333334, ans=0.2 2024-09-25 03:26:48,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=652638.0, ans=10.0 2024-09-25 03:26:54,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=652638.0, ans=0.125 2024-09-25 03:26:59,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=652638.0, ans=0.025 2024-09-25 03:27:02,027 INFO [train.py:1198] (0/4) Epoch 36, batch 3500, loss[loss=0.2279, ctc_loss=0.1452, cr_loss=0.4134, over 17014.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1265, cr_loss=0.3436, over 3353354.56 frames. ], batch size: 56, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:27:05,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=652684.6666666666, ans=0.0 2024-09-25 03:27:27,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=652731.3333333334, ans=0.0 2024-09-25 03:27:57,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=12.0 2024-09-25 03:27:58,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=652824.6666666666, ans=0.0 2024-09-25 03:28:02,940 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.297e+02 1.416e+02 1.575e+02 3.387e+02, threshold=2.833e+02, percent-clipped=1.0 2024-09-25 03:28:18,459 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=22.5 2024-09-25 03:28:20,578 INFO [train.py:1198] (0/4) Epoch 36, batch 3550, loss[loss=0.1772, ctc_loss=0.1122, cr_loss=0.3253, over 17041.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1259, cr_loss=0.3426, over 3356371.76 frames. ], batch size: 39, lr: 3.28e-03, grad_scale: 8.0 2024-09-25 03:28:39,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=652964.6666666666, ans=0.035 2024-09-25 03:28:53,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=653011.3333333334, ans=0.0 2024-09-25 03:29:25,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=653104.6666666666, ans=0.0 2024-09-25 03:29:38,744 INFO [train.py:1198] (0/4) Epoch 36, batch 3600, loss[loss=0.1833, ctc_loss=0.1174, cr_loss=0.3297, over 17029.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1261, cr_loss=0.3434, over 3370016.09 frames. ], batch size: 51, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:29:50,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2024-09-25 03:30:05,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=653198.0, ans=0.125 2024-09-25 03:30:40,329 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-140000.pt 2024-09-25 03:30:43,994 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.277e+02 1.347e+02 1.445e+02 1.925e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-25 03:30:59,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=653384.6666666666, ans=0.0 2024-09-25 03:31:00,964 INFO [train.py:1198] (0/4) Epoch 36, batch 3650, loss[loss=0.1815, ctc_loss=0.1142, cr_loss=0.3363, over 17072.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1267, cr_loss=0.3445, over 3362042.70 frames. ], batch size: 46, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:31:09,601 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-09-25 03:31:14,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=653384.6666666666, ans=0.0 2024-09-25 03:31:14,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=653384.6666666666, ans=0.0 2024-09-25 03:32:22,170 INFO [train.py:1198] (0/4) Epoch 36, batch 3700, loss[loss=0.2021, ctc_loss=0.1319, cr_loss=0.3514, over 16950.00 frames. ], tot_loss[loss=0.196, ctc_loss=0.127, cr_loss=0.3451, over 3361533.48 frames. ], batch size: 56, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:32:49,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-25 03:33:00,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=653711.3333333334, ans=0.02 2024-09-25 03:33:06,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-25 03:33:20,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2024-09-25 03:33:24,004 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.267e+02 1.341e+02 1.465e+02 1.805e+02, threshold=2.682e+02, percent-clipped=0.0 2024-09-25 03:33:35,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=653804.6666666666, ans=0.0 2024-09-25 03:33:41,506 INFO [train.py:1198] (0/4) Epoch 36, batch 3750, loss[loss=0.212, ctc_loss=0.1399, cr_loss=0.3609, over 17208.00 frames. ], tot_loss[loss=0.1959, ctc_loss=0.1269, cr_loss=0.3451, over 3355093.53 frames. ], batch size: 50, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:33:58,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=653898.0, ans=0.0 2024-09-25 03:34:27,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=653991.3333333334, ans=0.025 2024-09-25 03:35:01,335 INFO [train.py:1198] (0/4) Epoch 36, batch 3800, loss[loss=0.2147, ctc_loss=0.1456, cr_loss=0.3457, over 12581.00 frames. ], tot_loss[loss=0.1949, ctc_loss=0.1262, cr_loss=0.3432, over 3347847.67 frames. ], batch size: 125, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:35:01,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654084.6666666666, ans=0.1 2024-09-25 03:35:04,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=654084.6666666666, ans=0.0 2024-09-25 03:35:08,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=654084.6666666666, ans=0.1 2024-09-25 03:35:09,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=654084.6666666666, ans=0.125 2024-09-25 03:35:23,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654131.3333333334, ans=0.1 2024-09-25 03:36:02,510 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.295e+02 1.360e+02 1.464e+02 2.754e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-25 03:36:04,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=654271.3333333334, ans=0.125 2024-09-25 03:36:19,642 INFO [train.py:1198] (0/4) Epoch 36, batch 3850, loss[loss=0.164, ctc_loss=0.1055, cr_loss=0.2926, over 17005.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1249, cr_loss=0.3399, over 3315293.34 frames. ], batch size: 39, lr: 3.28e-03, grad_scale: 16.0 2024-09-25 03:36:42,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=654364.6666666666, ans=0.125 2024-09-25 03:37:05,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654458.0, ans=0.1 2024-09-25 03:37:08,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=12.0 2024-09-25 03:37:15,977 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:37:28,947 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-36.pt 2024-09-25 03:38:18,974 INFO [train.py:1198] (0/4) Epoch 37, batch 0, loss[loss=0.1897, ctc_loss=0.1216, cr_loss=0.3407, over 17023.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1216, cr_loss=0.3407, over 17023.00 frames. ], batch size: 51, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:38:18,975 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 03:38:29,279 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4200, 4.1809, 3.7143, 4.2202], device='cuda:0') 2024-09-25 03:38:34,306 INFO [train.py:1230] (0/4) Epoch 37, validation: loss=0.03489, ctc_loss=0.03489, cr_loss=9.463e-15, over 944034.00 frames. 2024-09-25 03:38:34,307 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 03:38:37,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=654532.6666666666, ans=0.2 2024-09-25 03:38:39,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=654532.6666666666, ans=0.125 2024-09-25 03:38:49,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=654579.3333333334, ans=0.125 2024-09-25 03:39:06,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=654626.0, ans=0.125 2024-09-25 03:39:28,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=654672.6666666666, ans=0.0 2024-09-25 03:39:36,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=654672.6666666666, ans=0.125 2024-09-25 03:39:46,458 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2024-09-25 03:39:47,468 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.358e+02 1.528e+02 1.726e+02 8.114e+02, threshold=3.057e+02, percent-clipped=1.0 2024-09-25 03:39:57,024 INFO [train.py:1198] (0/4) Epoch 37, batch 50, loss[loss=0.201, ctc_loss=0.1306, cr_loss=0.352, over 17098.00 frames. ], tot_loss[loss=0.1985, ctc_loss=0.1284, cr_loss=0.3504, over 759059.32 frames. ], batch size: 49, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:39:58,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=654766.0, ans=0.2 2024-09-25 03:40:04,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-25 03:40:08,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=654766.0, ans=0.125 2024-09-25 03:40:11,589 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:40:36,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=654859.3333333334, ans=0.2 2024-09-25 03:40:44,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=654859.3333333334, ans=0.125 2024-09-25 03:41:05,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=654952.6666666666, ans=0.1 2024-09-25 03:41:09,767 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:41:11,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=654952.6666666666, ans=0.125 2024-09-25 03:41:16,501 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=22.5 2024-09-25 03:41:19,002 INFO [train.py:1198] (0/4) Epoch 37, batch 100, loss[loss=0.1893, ctc_loss=0.1221, cr_loss=0.3358, over 17359.00 frames. ], tot_loss[loss=0.1948, ctc_loss=0.1258, cr_loss=0.3449, over 1342890.85 frames. ], batch size: 48, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:41:25,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=654999.3333333334, ans=0.125 2024-09-25 03:41:41,969 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2024-09-25 03:42:29,428 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.259e+02 1.323e+02 1.391e+02 2.896e+02, threshold=2.646e+02, percent-clipped=0.0 2024-09-25 03:42:31,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=655186.0, ans=0.125 2024-09-25 03:42:37,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=655232.6666666666, ans=0.04949747468305833 2024-09-25 03:42:39,025 INFO [train.py:1198] (0/4) Epoch 37, batch 150, loss[loss=0.1775, ctc_loss=0.1141, cr_loss=0.3174, over 17065.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1249, cr_loss=0.3423, over 1788255.33 frames. ], batch size: 46, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:42:45,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=655232.6666666666, ans=0.125 2024-09-25 03:42:50,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=655232.6666666666, ans=0.5 2024-09-25 03:43:11,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=655326.0, ans=0.125 2024-09-25 03:44:06,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-09-25 03:44:07,480 INFO [train.py:1198] (0/4) Epoch 37, batch 200, loss[loss=0.1496, ctc_loss=0.09371, cr_loss=0.2795, over 17253.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1228, cr_loss=0.3376, over 2133966.52 frames. ], batch size: 42, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:44:33,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=655512.6666666666, ans=0.125 2024-09-25 03:44:35,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.48 vs. limit=10.0 2024-09-25 03:44:44,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-25 03:45:01,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=655606.0, ans=0.125 2024-09-25 03:45:12,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=655652.6666666666, ans=0.0 2024-09-25 03:45:19,826 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.270e+02 1.344e+02 1.450e+02 1.913e+02, threshold=2.687e+02, percent-clipped=0.0 2024-09-25 03:45:29,187 INFO [train.py:1198] (0/4) Epoch 37, batch 250, loss[loss=0.1961, ctc_loss=0.1277, cr_loss=0.3417, over 16056.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1233, cr_loss=0.3387, over 2402503.42 frames. ], batch size: 74, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:46:00,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=655792.6666666666, ans=0.125 2024-09-25 03:46:05,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=655792.6666666666, ans=0.125 2024-09-25 03:46:25,002 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=15.0 2024-09-25 03:46:40,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=655886.0, ans=10.0 2024-09-25 03:46:45,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=655886.0, ans=0.125 2024-09-25 03:46:48,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=655932.6666666666, ans=0.125 2024-09-25 03:46:50,162 INFO [train.py:1198] (0/4) Epoch 37, batch 300, loss[loss=0.2175, ctc_loss=0.1427, cr_loss=0.3742, over 17153.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1238, cr_loss=0.3391, over 2608800.84 frames. ], batch size: 48, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:46:53,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=655932.6666666666, ans=0.0 2024-09-25 03:47:16,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=655979.3333333334, ans=0.0 2024-09-25 03:47:32,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=656026.0, ans=0.025 2024-09-25 03:47:51,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=656072.6666666666, ans=0.1 2024-09-25 03:47:56,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=656119.3333333334, ans=0.0 2024-09-25 03:48:00,707 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.029e+02 1.285e+02 1.370e+02 1.458e+02 2.873e+02, threshold=2.740e+02, percent-clipped=1.0 2024-09-25 03:48:10,574 INFO [train.py:1198] (0/4) Epoch 37, batch 350, loss[loss=0.177, ctc_loss=0.1118, cr_loss=0.326, over 17002.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1238, cr_loss=0.3386, over 2778959.01 frames. ], batch size: 39, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:48:20,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=656166.0, ans=0.0 2024-09-25 03:48:23,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=656166.0, ans=0.125 2024-09-25 03:48:54,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=656259.3333333334, ans=0.125 2024-09-25 03:49:11,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=656306.0, ans=0.09899494936611666 2024-09-25 03:49:24,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656352.6666666666, ans=0.1 2024-09-25 03:49:38,695 INFO [train.py:1198] (0/4) Epoch 37, batch 400, loss[loss=0.219, ctc_loss=0.1433, cr_loss=0.3785, over 17002.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1236, cr_loss=0.3392, over 2909970.13 frames. ], batch size: 53, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:49:40,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=656399.3333333334, ans=0.0 2024-09-25 03:49:45,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=22.5 2024-09-25 03:50:21,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=656492.6666666666, ans=0.0 2024-09-25 03:50:36,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2024-09-25 03:50:37,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=656539.3333333334, ans=0.125 2024-09-25 03:50:45,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=656586.0, ans=0.07 2024-09-25 03:50:51,384 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.258e+02 1.358e+02 1.447e+02 2.750e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-25 03:51:01,057 INFO [train.py:1198] (0/4) Epoch 37, batch 450, loss[loss=0.196, ctc_loss=0.1278, cr_loss=0.341, over 17020.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1234, cr_loss=0.3386, over 3004130.34 frames. ], batch size: 56, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:51:09,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=656632.6666666666, ans=0.125 2024-09-25 03:51:31,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=656726.0, ans=0.125 2024-09-25 03:51:56,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=656772.6666666666, ans=0.05 2024-09-25 03:52:16,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=656819.3333333334, ans=0.07 2024-09-25 03:52:21,377 INFO [train.py:1198] (0/4) Epoch 37, batch 500, loss[loss=0.1979, ctc_loss=0.1267, cr_loss=0.3561, over 17315.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1243, cr_loss=0.3402, over 3080285.73 frames. ], batch size: 49, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:52:28,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=656866.0, ans=0.125 2024-09-25 03:52:58,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-09-25 03:53:01,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=656959.3333333334, ans=0.0 2024-09-25 03:53:10,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=657006.0, ans=0.125 2024-09-25 03:53:13,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=657006.0, ans=0.125 2024-09-25 03:53:18,861 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 03:53:20,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=657006.0, ans=0.125 2024-09-25 03:53:37,041 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.284e+02 1.360e+02 1.514e+02 2.432e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 03:53:46,708 INFO [train.py:1198] (0/4) Epoch 37, batch 550, loss[loss=0.1921, ctc_loss=0.1234, cr_loss=0.3434, over 17151.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1251, cr_loss=0.342, over 3132891.44 frames. ], batch size: 45, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:54:18,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=657146.0, ans=0.0 2024-09-25 03:55:07,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=657286.0, ans=0.0 2024-09-25 03:55:12,194 INFO [train.py:1198] (0/4) Epoch 37, batch 600, loss[loss=0.2063, ctc_loss=0.1336, cr_loss=0.3634, over 16855.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1249, cr_loss=0.3413, over 3170957.87 frames. ], batch size: 58, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:55:36,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=657379.3333333334, ans=0.125 2024-09-25 03:55:36,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=657379.3333333334, ans=0.125 2024-09-25 03:55:39,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=657379.3333333334, ans=0.0 2024-09-25 03:55:54,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=657426.0, ans=0.0 2024-09-25 03:56:02,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=657472.6666666666, ans=0.0 2024-09-25 03:56:06,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=657472.6666666666, ans=0.0 2024-09-25 03:56:11,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657472.6666666666, ans=0.1 2024-09-25 03:56:22,506 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.287e+02 1.356e+02 1.479e+02 3.442e+02, threshold=2.712e+02, percent-clipped=1.0 2024-09-25 03:56:29,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=657519.3333333334, ans=0.2 2024-09-25 03:56:32,226 INFO [train.py:1198] (0/4) Epoch 37, batch 650, loss[loss=0.1595, ctc_loss=0.1014, cr_loss=0.2908, over 16806.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1249, cr_loss=0.3414, over 3216317.46 frames. ], batch size: 37, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 03:56:40,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=657566.0, ans=0.025 2024-09-25 03:56:54,070 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-25 03:56:58,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=657612.6666666666, ans=0.1 2024-09-25 03:57:18,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=657706.0, ans=10.0 2024-09-25 03:57:31,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=657706.0, ans=0.2 2024-09-25 03:57:31,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=657706.0, ans=0.125 2024-09-25 03:57:41,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=657752.6666666666, ans=0.0 2024-09-25 03:57:41,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657752.6666666666, ans=0.1 2024-09-25 03:57:46,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=657752.6666666666, ans=0.0 2024-09-25 03:57:52,560 INFO [train.py:1198] (0/4) Epoch 37, batch 700, loss[loss=0.1496, ctc_loss=0.09557, cr_loss=0.2702, over 16959.00 frames. ], tot_loss[loss=0.1943, ctc_loss=0.1257, cr_loss=0.343, over 3248603.40 frames. ], batch size: 42, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 03:58:08,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=657799.3333333334, ans=0.125 2024-09-25 03:58:40,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=657892.6666666666, ans=0.125 2024-09-25 03:58:56,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=657939.3333333334, ans=6.0 2024-09-25 03:59:01,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=657939.3333333334, ans=0.125 2024-09-25 03:59:05,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=657986.0, ans=0.05 2024-09-25 03:59:13,008 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.266e+02 1.364e+02 1.473e+02 1.883e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 03:59:19,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=658032.6666666666, ans=0.025 2024-09-25 03:59:21,270 INFO [train.py:1198] (0/4) Epoch 37, batch 750, loss[loss=0.1813, ctc_loss=0.1164, cr_loss=0.3247, over 17213.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1255, cr_loss=0.3428, over 3268911.28 frames. ], batch size: 50, lr: 3.23e-03, grad_scale: 16.0 2024-09-25 04:00:31,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=658219.3333333334, ans=0.2 2024-09-25 04:00:43,390 INFO [train.py:1198] (0/4) Epoch 37, batch 800, loss[loss=0.2663, ctc_loss=0.1839, cr_loss=0.4119, over 11840.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1256, cr_loss=0.3428, over 3283106.34 frames. ], batch size: 123, lr: 3.23e-03, grad_scale: 32.0 2024-09-25 04:00:56,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=12.0 2024-09-25 04:01:21,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=658359.3333333334, ans=10.0 2024-09-25 04:01:35,166 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.16 vs. limit=12.0 2024-09-25 04:01:42,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=658406.0, ans=0.125 2024-09-25 04:01:54,704 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.298e+02 1.352e+02 1.475e+02 2.154e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-25 04:01:56,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=658452.6666666666, ans=0.1 2024-09-25 04:02:02,796 INFO [train.py:1198] (0/4) Epoch 37, batch 850, loss[loss=0.1762, ctc_loss=0.1098, cr_loss=0.3323, over 17197.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1247, cr_loss=0.341, over 3299351.58 frames. ], batch size: 41, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:02:43,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=658592.6666666666, ans=0.125 2024-09-25 04:02:46,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=658592.6666666666, ans=0.125 2024-09-25 04:03:18,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=658686.0, ans=0.0 2024-09-25 04:03:28,267 INFO [train.py:1198] (0/4) Epoch 37, batch 900, loss[loss=0.183, ctc_loss=0.1166, cr_loss=0.3319, over 17021.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1241, cr_loss=0.3396, over 3307233.28 frames. ], batch size: 39, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:03:50,723 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2024-09-25 04:04:18,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=22.5 2024-09-25 04:04:43,135 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.276e+02 1.332e+02 1.387e+02 1.934e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-25 04:04:49,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658966.0, ans=0.1 2024-09-25 04:04:51,042 INFO [train.py:1198] (0/4) Epoch 37, batch 950, loss[loss=0.1669, ctc_loss=0.1056, cr_loss=0.3065, over 17102.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1238, cr_loss=0.3394, over 3329449.66 frames. ], batch size: 40, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:05:08,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=12.0 2024-09-25 04:06:13,087 INFO [train.py:1198] (0/4) Epoch 37, batch 1000, loss[loss=0.1912, ctc_loss=0.1246, cr_loss=0.333, over 17255.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1243, cr_loss=0.3403, over 3344523.65 frames. ], batch size: 44, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:06:20,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2024-09-25 04:06:37,209 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:07:11,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=659339.3333333334, ans=0.0 2024-09-25 04:07:16,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=15.0 2024-09-25 04:07:25,165 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.283e+02 1.388e+02 1.495e+02 1.963e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 04:07:33,125 INFO [train.py:1198] (0/4) Epoch 37, batch 1050, loss[loss=0.1818, ctc_loss=0.1154, cr_loss=0.3321, over 17148.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1242, cr_loss=0.3402, over 3345488.28 frames. ], batch size: 40, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:07:40,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-25 04:08:28,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-09-25 04:08:46,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=659619.3333333334, ans=0.125 2024-09-25 04:09:00,282 INFO [train.py:1198] (0/4) Epoch 37, batch 1100, loss[loss=0.1934, ctc_loss=0.124, cr_loss=0.3467, over 17005.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1242, cr_loss=0.3392, over 3344696.49 frames. ], batch size: 51, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:09:10,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=659666.0, ans=0.04949747468305833 2024-09-25 04:09:10,616 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=15.0 2024-09-25 04:09:21,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=659712.6666666666, ans=0.1 2024-09-25 04:09:24,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=659712.6666666666, ans=0.125 2024-09-25 04:09:29,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=22.5 2024-09-25 04:09:33,028 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=12.0 2024-09-25 04:09:54,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=659806.0, ans=0.0 2024-09-25 04:10:14,845 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.287e+02 1.378e+02 1.545e+02 2.459e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 04:10:22,845 INFO [train.py:1198] (0/4) Epoch 37, batch 1150, loss[loss=0.1861, ctc_loss=0.1222, cr_loss=0.3197, over 17067.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1244, cr_loss=0.3396, over 3356217.89 frames. ], batch size: 43, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:10:26,270 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=659899.3333333334, ans=0.125 2024-09-25 04:10:26,431 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:10:47,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659946.0, ans=0.1 2024-09-25 04:11:12,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=660039.3333333334, ans=0.04949747468305833 2024-09-25 04:11:14,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=660039.3333333334, ans=0.0 2024-09-25 04:11:22,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=660039.3333333334, ans=0.0 2024-09-25 04:11:29,125 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=660086.0, ans=0.125 2024-09-25 04:11:40,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=22.5 2024-09-25 04:11:43,094 INFO [train.py:1198] (0/4) Epoch 37, batch 1200, loss[loss=0.2187, ctc_loss=0.1424, cr_loss=0.3815, over 16768.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1241, cr_loss=0.339, over 3358271.34 frames. ], batch size: 61, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:11:45,556 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=22.5 2024-09-25 04:12:02,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=660179.3333333334, ans=0.2 2024-09-25 04:12:06,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=22.5 2024-09-25 04:12:17,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=660226.0, ans=0.07 2024-09-25 04:12:25,263 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-25 04:12:28,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=660226.0, ans=0.2 2024-09-25 04:12:57,285 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.254e+02 1.340e+02 1.432e+02 2.120e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-25 04:13:05,188 INFO [train.py:1198] (0/4) Epoch 37, batch 1250, loss[loss=0.2029, ctc_loss=0.1303, cr_loss=0.3626, over 17096.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1236, cr_loss=0.3387, over 3358687.51 frames. ], batch size: 49, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:13:06,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2024-09-25 04:13:29,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=660412.6666666666, ans=12.0 2024-09-25 04:13:36,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660412.6666666666, ans=0.125 2024-09-25 04:14:03,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2024-09-25 04:14:13,445 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.25 vs. limit=22.5 2024-09-25 04:14:30,160 INFO [train.py:1198] (0/4) Epoch 37, batch 1300, loss[loss=0.2059, ctc_loss=0.1334, cr_loss=0.3624, over 17146.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1244, cr_loss=0.3404, over 3350720.35 frames. ], batch size: 48, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:14:45,495 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2024-09-25 04:15:03,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=660692.6666666666, ans=0.125 2024-09-25 04:15:22,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=660739.3333333334, ans=0.125 2024-09-25 04:15:44,612 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.301e+02 1.386e+02 1.515e+02 1.831e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 04:15:44,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=660786.0, ans=0.125 2024-09-25 04:15:48,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=660786.0, ans=0.125 2024-09-25 04:15:52,776 INFO [train.py:1198] (0/4) Epoch 37, batch 1350, loss[loss=0.1648, ctc_loss=0.1027, cr_loss=0.3103, over 17268.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1246, cr_loss=0.3409, over 3350736.52 frames. ], batch size: 44, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:15:56,482 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=22.5 2024-09-25 04:16:17,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=660879.3333333334, ans=0.0 2024-09-25 04:16:40,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=22.5 2024-09-25 04:17:13,042 INFO [train.py:1198] (0/4) Epoch 37, batch 1400, loss[loss=0.1996, ctc_loss=0.128, cr_loss=0.3582, over 16858.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1246, cr_loss=0.3408, over 3340898.54 frames. ], batch size: 58, lr: 3.22e-03, grad_scale: 16.0 2024-09-25 04:17:32,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=661112.6666666666, ans=0.125 2024-09-25 04:18:22,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=661252.6666666666, ans=0.125 2024-09-25 04:18:23,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=661252.6666666666, ans=0.0 2024-09-25 04:18:27,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=661252.6666666666, ans=0.2 2024-09-25 04:18:28,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=661252.6666666666, ans=0.0 2024-09-25 04:18:31,389 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.311e+02 1.398e+02 1.519e+02 2.404e+02, threshold=2.797e+02, percent-clipped=0.0 2024-09-25 04:18:40,189 INFO [train.py:1198] (0/4) Epoch 37, batch 1450, loss[loss=0.2344, ctc_loss=0.1555, cr_loss=0.3944, over 15047.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1251, cr_loss=0.3412, over 3342759.72 frames. ], batch size: 89, lr: 3.22e-03, grad_scale: 16.0 2024-09-25 04:18:49,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=661299.3333333334, ans=0.125 2024-09-25 04:18:55,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2024-09-25 04:19:02,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=661346.0, ans=0.02 2024-09-25 04:19:29,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=661439.3333333334, ans=0.0 2024-09-25 04:19:49,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=661486.0, ans=0.0 2024-09-25 04:19:58,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=661486.0, ans=0.125 2024-09-25 04:20:02,950 INFO [train.py:1198] (0/4) Epoch 37, batch 1500, loss[loss=0.1964, ctc_loss=0.1266, cr_loss=0.3488, over 17357.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1244, cr_loss=0.3409, over 3340862.87 frames. ], batch size: 48, lr: 3.22e-03, grad_scale: 16.0 2024-09-25 04:20:15,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=661532.6666666666, ans=0.125 2024-09-25 04:20:31,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=661579.3333333334, ans=0.2 2024-09-25 04:20:34,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=661626.0, ans=0.125 2024-09-25 04:20:36,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2024-09-25 04:20:41,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=661626.0, ans=0.125 2024-09-25 04:20:49,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=661672.6666666666, ans=0.125 2024-09-25 04:20:52,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=661672.6666666666, ans=10.0 2024-09-25 04:21:04,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=661672.6666666666, ans=8.0 2024-09-25 04:21:16,231 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.237e+02 1.312e+02 1.425e+02 1.916e+02, threshold=2.625e+02, percent-clipped=0.0 2024-09-25 04:21:22,595 INFO [train.py:1198] (0/4) Epoch 37, batch 1550, loss[loss=0.2131, ctc_loss=0.1369, cr_loss=0.3814, over 17161.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.125, cr_loss=0.3419, over 3341135.95 frames. ], batch size: 45, lr: 3.22e-03, grad_scale: 16.0 2024-09-25 04:21:37,384 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:22:08,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=661859.3333333334, ans=0.05 2024-09-25 04:22:45,505 INFO [train.py:1198] (0/4) Epoch 37, batch 1600, loss[loss=0.2132, ctc_loss=0.1389, cr_loss=0.3715, over 15927.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1255, cr_loss=0.3434, over 3344931.90 frames. ], batch size: 74, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:22:52,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=661999.3333333334, ans=0.125 2024-09-25 04:23:00,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-09-25 04:23:56,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=662186.0, ans=0.09899494936611666 2024-09-25 04:24:03,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-25 04:24:04,105 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.273e+02 1.358e+02 1.458e+02 2.530e+02, threshold=2.716e+02, percent-clipped=0.0 2024-09-25 04:24:10,513 INFO [train.py:1198] (0/4) Epoch 37, batch 1650, loss[loss=0.1932, ctc_loss=0.1245, cr_loss=0.3431, over 16428.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1257, cr_loss=0.3444, over 3349786.99 frames. ], batch size: 66, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:24:15,556 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:24:30,531 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2024-09-25 04:24:44,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662326.0, ans=0.1 2024-09-25 04:25:02,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=662372.6666666666, ans=0.125 2024-09-25 04:25:28,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.77 vs. limit=10.0 2024-09-25 04:25:32,852 INFO [train.py:1198] (0/4) Epoch 37, batch 1700, loss[loss=0.1777, ctc_loss=0.1146, cr_loss=0.3157, over 16950.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1251, cr_loss=0.3435, over 3355223.58 frames. ], batch size: 42, lr: 3.22e-03, grad_scale: 32.0 2024-09-25 04:25:33,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=662466.0, ans=0.125 2024-09-25 04:25:36,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=662466.0, ans=0.0 2024-09-25 04:25:41,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=662466.0, ans=0.025 2024-09-25 04:25:45,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=662466.0, ans=0.0 2024-09-25 04:25:50,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=662512.6666666666, ans=0.0 2024-09-25 04:25:58,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=662512.6666666666, ans=0.125 2024-09-25 04:25:58,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=662512.6666666666, ans=0.125 2024-09-25 04:26:01,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662512.6666666666, ans=0.1 2024-09-25 04:26:14,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=662559.3333333334, ans=0.125 2024-09-25 04:26:31,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=662606.0, ans=0.125 2024-09-25 04:26:45,874 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.286e+02 1.377e+02 1.479e+02 2.270e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 04:26:49,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662652.6666666666, ans=0.1 2024-09-25 04:26:52,404 INFO [train.py:1198] (0/4) Epoch 37, batch 1750, loss[loss=0.1745, ctc_loss=0.1117, cr_loss=0.3144, over 17302.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1251, cr_loss=0.3437, over 3359119.36 frames. ], batch size: 51, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:26:59,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.04 vs. limit=10.0 2024-09-25 04:27:00,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662699.3333333334, ans=0.1 2024-09-25 04:27:03,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=662699.3333333334, ans=0.125 2024-09-25 04:27:41,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=662839.3333333334, ans=0.125 2024-09-25 04:28:00,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=662886.0, ans=0.125 2024-09-25 04:28:03,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=662886.0, ans=10.0 2024-09-25 04:28:14,224 INFO [train.py:1198] (0/4) Epoch 37, batch 1800, loss[loss=0.2019, ctc_loss=0.1296, cr_loss=0.3611, over 16191.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1254, cr_loss=0.3435, over 3355419.00 frames. ], batch size: 74, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:28:31,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=662979.3333333334, ans=0.125 2024-09-25 04:29:25,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=663119.3333333334, ans=0.0 2024-09-25 04:29:30,051 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.280e+02 1.352e+02 1.451e+02 1.795e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-25 04:29:39,065 INFO [train.py:1198] (0/4) Epoch 37, batch 1850, loss[loss=0.2052, ctc_loss=0.1303, cr_loss=0.3742, over 17028.00 frames. ], tot_loss[loss=0.1944, ctc_loss=0.1257, cr_loss=0.3436, over 3354223.53 frames. ], batch size: 44, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:29:58,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=663212.6666666666, ans=0.125 2024-09-25 04:30:06,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=663212.6666666666, ans=0.125 2024-09-25 04:30:23,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=663259.3333333334, ans=0.125 2024-09-25 04:30:29,038 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=22.5 2024-09-25 04:30:50,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=663352.6666666666, ans=0.2 2024-09-25 04:30:54,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=663352.6666666666, ans=0.1 2024-09-25 04:30:58,770 INFO [train.py:1198] (0/4) Epoch 37, batch 1900, loss[loss=0.2415, ctc_loss=0.1575, cr_loss=0.4199, over 16528.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1259, cr_loss=0.3437, over 3353002.78 frames. ], batch size: 66, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:30:59,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=663399.3333333334, ans=0.09899494936611666 2024-09-25 04:31:34,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=663492.6666666666, ans=0.125 2024-09-25 04:31:57,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=663539.3333333334, ans=0.07 2024-09-25 04:32:12,970 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.076e+02 1.273e+02 1.351e+02 1.492e+02 1.881e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-25 04:32:19,253 INFO [train.py:1198] (0/4) Epoch 37, batch 1950, loss[loss=0.1773, ctc_loss=0.1128, cr_loss=0.3226, over 17021.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1255, cr_loss=0.3432, over 3351153.49 frames. ], batch size: 39, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:32:21,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2024-09-25 04:32:43,010 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:32:50,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663679.3333333334, ans=0.1 2024-09-25 04:33:00,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=663726.0, ans=0.025 2024-09-25 04:33:02,160 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2024-09-25 04:33:33,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=663819.3333333334, ans=0.1 2024-09-25 04:33:35,727 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:33:37,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=663819.3333333334, ans=0.125 2024-09-25 04:33:46,764 INFO [train.py:1198] (0/4) Epoch 37, batch 2000, loss[loss=0.1902, ctc_loss=0.1223, cr_loss=0.3392, over 17149.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1247, cr_loss=0.3416, over 3352127.13 frames. ], batch size: 48, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:33:53,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=663866.0, ans=0.035 2024-09-25 04:34:03,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663912.6666666666, ans=0.1 2024-09-25 04:34:19,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=663959.3333333334, ans=0.2 2024-09-25 04:34:54,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664052.6666666666, ans=0.1 2024-09-25 04:34:56,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=664052.6666666666, ans=0.0 2024-09-25 04:35:02,488 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.268e+02 1.366e+02 1.468e+02 1.745e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-25 04:35:06,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.16 vs. limit=15.0 2024-09-25 04:35:08,885 INFO [train.py:1198] (0/4) Epoch 37, batch 2050, loss[loss=0.2004, ctc_loss=0.1315, cr_loss=0.3445, over 17025.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1249, cr_loss=0.3414, over 3345464.26 frames. ], batch size: 52, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:35:10,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=664099.3333333334, ans=0.125 2024-09-25 04:35:12,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=664099.3333333334, ans=0.0 2024-09-25 04:35:39,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=664192.6666666666, ans=0.125 2024-09-25 04:36:00,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=664239.3333333334, ans=0.025 2024-09-25 04:36:27,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664332.6666666666, ans=0.1 2024-09-25 04:36:28,589 INFO [train.py:1198] (0/4) Epoch 37, batch 2100, loss[loss=0.1694, ctc_loss=0.1069, cr_loss=0.3128, over 16250.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.125, cr_loss=0.3416, over 3349949.14 frames. ], batch size: 36, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:36:35,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664332.6666666666, ans=0.1 2024-09-25 04:36:45,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664379.3333333334, ans=0.1 2024-09-25 04:37:18,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=12.0 2024-09-25 04:37:33,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=664519.3333333334, ans=0.0 2024-09-25 04:37:46,466 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.254e+02 1.343e+02 1.449e+02 1.904e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-25 04:37:51,357 INFO [train.py:1198] (0/4) Epoch 37, batch 2150, loss[loss=0.2241, ctc_loss=0.1471, cr_loss=0.3848, over 15015.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1256, cr_loss=0.343, over 3345772.26 frames. ], batch size: 89, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:38:14,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=664612.6666666666, ans=0.2 2024-09-25 04:38:39,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=664659.3333333334, ans=0.2 2024-09-25 04:38:43,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-09-25 04:38:56,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=664706.0, ans=0.05 2024-09-25 04:39:15,893 INFO [train.py:1198] (0/4) Epoch 37, batch 2200, loss[loss=0.219, ctc_loss=0.1438, cr_loss=0.3759, over 16999.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1259, cr_loss=0.3434, over 3352559.43 frames. ], batch size: 53, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:39:16,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664799.3333333334, ans=0.1 2024-09-25 04:39:26,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=664799.3333333334, ans=0.1 2024-09-25 04:39:43,347 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2024-09-25 04:39:54,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664892.6666666666, ans=0.1 2024-09-25 04:40:10,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=664939.3333333334, ans=0.0 2024-09-25 04:40:26,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=664986.0, ans=0.0 2024-09-25 04:40:34,122 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.296e+02 1.362e+02 1.438e+02 1.964e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 04:40:34,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=664986.0, ans=0.125 2024-09-25 04:40:36,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=664986.0, ans=0.0 2024-09-25 04:40:38,877 INFO [train.py:1198] (0/4) Epoch 37, batch 2250, loss[loss=0.2024, ctc_loss=0.1307, cr_loss=0.3589, over 17037.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1255, cr_loss=0.3423, over 3349263.43 frames. ], batch size: 52, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:41:07,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.57 vs. limit=6.0 2024-09-25 04:41:22,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2024-09-25 04:41:49,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=665219.3333333334, ans=0.2 2024-09-25 04:41:58,860 INFO [train.py:1198] (0/4) Epoch 37, batch 2300, loss[loss=0.1732, ctc_loss=0.1108, cr_loss=0.3118, over 16917.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1255, cr_loss=0.342, over 3348475.32 frames. ], batch size: 42, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:42:00,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=665266.0, ans=0.0 2024-09-25 04:42:22,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-09-25 04:42:23,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=665312.6666666666, ans=0.125 2024-09-25 04:42:38,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=665359.3333333334, ans=0.125 2024-09-25 04:42:49,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=665406.0, ans=0.125 2024-09-25 04:43:21,780 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.297e+02 1.364e+02 1.435e+02 2.216e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-25 04:43:26,540 INFO [train.py:1198] (0/4) Epoch 37, batch 2350, loss[loss=0.2075, ctc_loss=0.1331, cr_loss=0.3721, over 17310.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1255, cr_loss=0.3427, over 3349909.29 frames. ], batch size: 46, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:43:28,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=665499.3333333334, ans=0.1 2024-09-25 04:44:11,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=665592.6666666666, ans=0.125 2024-09-25 04:44:14,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665639.3333333334, ans=0.1 2024-09-25 04:44:28,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=665639.3333333334, ans=0.125 2024-09-25 04:44:28,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=665639.3333333334, ans=0.125 2024-09-25 04:44:33,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=665686.0, ans=0.025 2024-09-25 04:44:34,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=665686.0, ans=0.125 2024-09-25 04:44:48,924 INFO [train.py:1198] (0/4) Epoch 37, batch 2400, loss[loss=0.197, ctc_loss=0.1288, cr_loss=0.3411, over 16882.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1244, cr_loss=0.3403, over 3353352.76 frames. ], batch size: 58, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:44:49,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=12.0 2024-09-25 04:44:52,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=665732.6666666666, ans=0.125 2024-09-25 04:45:12,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=665779.3333333334, ans=0.125 2024-09-25 04:45:16,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=665779.3333333334, ans=0.125 2024-09-25 04:45:27,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=665826.0, ans=0.125 2024-09-25 04:45:43,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=665872.6666666666, ans=0.0 2024-09-25 04:45:46,683 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=665872.6666666666, ans=0.0 2024-09-25 04:45:50,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=12.0 2024-09-25 04:45:56,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=665919.3333333334, ans=22.5 2024-09-25 04:46:03,760 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.277e+02 1.353e+02 1.451e+02 1.749e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-25 04:46:08,612 INFO [train.py:1198] (0/4) Epoch 37, batch 2450, loss[loss=0.2021, ctc_loss=0.1329, cr_loss=0.3461, over 16651.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1247, cr_loss=0.3416, over 3361644.35 frames. ], batch size: 66, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:46:24,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666012.6666666666, ans=0.1 2024-09-25 04:46:31,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=666012.6666666666, ans=0.0 2024-09-25 04:46:39,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=666059.3333333334, ans=0.0 2024-09-25 04:46:47,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=666059.3333333334, ans=0.125 2024-09-25 04:46:48,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=666059.3333333334, ans=0.125 2024-09-25 04:47:29,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=666199.3333333334, ans=0.0 2024-09-25 04:47:29,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666199.3333333334, ans=0.1 2024-09-25 04:47:30,970 INFO [train.py:1198] (0/4) Epoch 37, batch 2500, loss[loss=0.1732, ctc_loss=0.1113, cr_loss=0.3095, over 17202.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1241, cr_loss=0.3411, over 3371688.43 frames. ], batch size: 41, lr: 3.21e-03, grad_scale: 32.0 2024-09-25 04:47:36,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666199.3333333334, ans=0.1 2024-09-25 04:47:39,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=666199.3333333334, ans=0.05 2024-09-25 04:47:50,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=666246.0, ans=0.125 2024-09-25 04:47:50,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666246.0, ans=0.1 2024-09-25 04:48:02,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=666246.0, ans=0.125 2024-09-25 04:48:15,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666292.6666666666, ans=0.1 2024-09-25 04:48:26,390 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:48:53,437 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.305e+02 1.391e+02 1.498e+02 2.183e+02, threshold=2.782e+02, percent-clipped=0.0 2024-09-25 04:48:53,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=666386.0, ans=0.125 2024-09-25 04:48:56,604 INFO [train.py:1198] (0/4) Epoch 37, batch 2550, loss[loss=0.165, ctc_loss=0.106, cr_loss=0.2954, over 17101.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1245, cr_loss=0.3421, over 3378320.53 frames. ], batch size: 40, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:49:17,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=666479.3333333334, ans=0.025 2024-09-25 04:49:36,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=666526.0, ans=0.125 2024-09-25 04:49:42,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=666526.0, ans=0.125 2024-09-25 04:50:19,601 INFO [train.py:1198] (0/4) Epoch 37, batch 2600, loss[loss=0.2172, ctc_loss=0.1439, cr_loss=0.3667, over 16234.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1243, cr_loss=0.341, over 3353218.84 frames. ], batch size: 74, lr: 3.21e-03, grad_scale: 16.0 2024-09-25 04:50:34,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=666712.6666666666, ans=0.025 2024-09-25 04:50:36,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=666712.6666666666, ans=0.125 2024-09-25 04:50:37,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=666712.6666666666, ans=0.125 2024-09-25 04:50:59,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=666759.3333333334, ans=0.125 2024-09-25 04:51:33,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=666852.6666666666, ans=0.0 2024-09-25 04:51:36,208 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.272e+02 1.334e+02 1.506e+02 4.167e+02, threshold=2.667e+02, percent-clipped=1.0 2024-09-25 04:51:39,401 INFO [train.py:1198] (0/4) Epoch 37, batch 2650, loss[loss=0.1812, ctc_loss=0.116, cr_loss=0.3262, over 17242.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1243, cr_loss=0.3412, over 3355598.99 frames. ], batch size: 50, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 04:51:41,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=666899.3333333334, ans=0.0 2024-09-25 04:51:43,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=22.5 2024-09-25 04:51:44,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=666899.3333333334, ans=0.1 2024-09-25 04:52:32,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=667039.3333333334, ans=0.025 2024-09-25 04:52:45,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=667086.0, ans=0.0 2024-09-25 04:53:06,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=667132.6666666666, ans=0.125 2024-09-25 04:53:07,640 INFO [train.py:1198] (0/4) Epoch 37, batch 2700, loss[loss=0.2027, ctc_loss=0.1294, cr_loss=0.366, over 17138.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1247, cr_loss=0.3413, over 3349220.85 frames. ], batch size: 48, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 04:53:07,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=667132.6666666666, ans=0.125 2024-09-25 04:53:11,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2024-09-25 04:53:17,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=667132.6666666666, ans=0.125 2024-09-25 04:53:17,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=667132.6666666666, ans=0.125 2024-09-25 04:53:55,073 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2024-09-25 04:54:27,064 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.290e+02 1.345e+02 1.477e+02 2.693e+02, threshold=2.691e+02, percent-clipped=1.0 2024-09-25 04:54:30,304 INFO [train.py:1198] (0/4) Epoch 37, batch 2750, loss[loss=0.2, ctc_loss=0.1301, cr_loss=0.3493, over 17378.00 frames. ], tot_loss[loss=0.1953, ctc_loss=0.1264, cr_loss=0.3444, over 3344392.02 frames. ], batch size: 48, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 04:54:48,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2024-09-25 04:54:50,469 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.51 vs. limit=15.0 2024-09-25 04:55:28,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=667506.0, ans=0.0 2024-09-25 04:55:40,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=667552.6666666666, ans=0.0 2024-09-25 04:55:44,947 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2024-09-25 04:55:50,321 INFO [train.py:1198] (0/4) Epoch 37, batch 2800, loss[loss=0.2164, ctc_loss=0.1399, cr_loss=0.3825, over 17022.00 frames. ], tot_loss[loss=0.1956, ctc_loss=0.1266, cr_loss=0.3451, over 3344945.21 frames. ], batch size: 51, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 04:56:21,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2024-09-25 04:56:39,746 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-09-25 04:56:53,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=667786.0, ans=0.125 2024-09-25 04:57:02,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=667786.0, ans=0.125 2024-09-25 04:57:10,009 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.310e+02 1.386e+02 1.476e+02 1.796e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 04:57:10,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-09-25 04:57:13,306 INFO [train.py:1198] (0/4) Epoch 37, batch 2850, loss[loss=0.1526, ctc_loss=0.09562, cr_loss=0.2848, over 16994.00 frames. ], tot_loss[loss=0.1946, ctc_loss=0.1259, cr_loss=0.3434, over 3353428.47 frames. ], batch size: 39, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 04:57:13,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=667832.6666666666, ans=0.125 2024-09-25 04:57:26,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=667832.6666666666, ans=0.125 2024-09-25 04:57:28,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2024-09-25 04:57:37,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=667879.3333333334, ans=0.125 2024-09-25 04:57:38,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=667879.3333333334, ans=0.0 2024-09-25 04:57:51,372 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-09-25 04:58:10,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=22.5 2024-09-25 04:58:15,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2024-09-25 04:58:32,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=668019.3333333334, ans=0.1 2024-09-25 04:58:38,163 INFO [train.py:1198] (0/4) Epoch 37, batch 2900, loss[loss=0.1939, ctc_loss=0.1242, cr_loss=0.3483, over 17018.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1253, cr_loss=0.3429, over 3354976.10 frames. ], batch size: 39, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 04:58:40,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668066.0, ans=0.1 2024-09-25 04:58:41,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=668066.0, ans=10.0 2024-09-25 04:58:46,460 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 04:58:51,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=668066.0, ans=0.0 2024-09-25 04:59:07,354 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.92 vs. limit=10.0 2024-09-25 04:59:28,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=668206.0, ans=0.125 2024-09-25 04:59:33,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=668206.0, ans=0.125 2024-09-25 04:59:38,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=668206.0, ans=0.0 2024-09-25 04:59:53,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=668252.6666666666, ans=0.035 2024-09-25 04:59:57,885 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.277e+02 1.411e+02 1.557e+02 2.099e+02, threshold=2.822e+02, percent-clipped=0.0 2024-09-25 05:00:01,064 INFO [train.py:1198] (0/4) Epoch 37, batch 2950, loss[loss=0.1766, ctc_loss=0.1144, cr_loss=0.3109, over 17013.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1257, cr_loss=0.3437, over 3364375.78 frames. ], batch size: 44, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 05:00:06,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=12.0 2024-09-25 05:00:20,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2024-09-25 05:01:00,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=668439.3333333334, ans=0.125 2024-09-25 05:01:08,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=668486.0, ans=0.05 2024-09-25 05:01:17,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=668486.0, ans=0.125 2024-09-25 05:01:20,186 INFO [train.py:1198] (0/4) Epoch 37, batch 3000, loss[loss=0.1823, ctc_loss=0.1141, cr_loss=0.3406, over 17254.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1257, cr_loss=0.3438, over 3365732.56 frames. ], batch size: 44, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:01:20,187 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 05:01:35,774 INFO [train.py:1230] (0/4) Epoch 37, validation: loss=0.03526, ctc_loss=0.03526, cr_loss=1.039e-14, over 944034.00 frames. 2024-09-25 05:01:35,775 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 05:01:47,034 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:01:51,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=668579.3333333334, ans=0.0 2024-09-25 05:02:04,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=668579.3333333334, ans=0.0 2024-09-25 05:02:33,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=668672.6666666666, ans=0.125 2024-09-25 05:02:47,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=668719.3333333334, ans=0.025 2024-09-25 05:02:54,650 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.290e+02 1.381e+02 1.487e+02 2.036e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-25 05:02:56,258 INFO [train.py:1198] (0/4) Epoch 37, batch 3050, loss[loss=0.2212, ctc_loss=0.14, cr_loss=0.4058, over 17011.00 frames. ], tot_loss[loss=0.195, ctc_loss=0.126, cr_loss=0.3448, over 3366504.81 frames. ], batch size: 53, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:03:52,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=12.0 2024-09-25 05:03:54,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=668906.0, ans=0.125 2024-09-25 05:04:14,261 INFO [train.py:1198] (0/4) Epoch 37, batch 3100, loss[loss=0.1921, ctc_loss=0.1279, cr_loss=0.321, over 17223.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1247, cr_loss=0.3421, over 3368109.71 frames. ], batch size: 50, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:04:39,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669046.0, ans=0.1 2024-09-25 05:04:57,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=669092.6666666666, ans=0.0 2024-09-25 05:05:05,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=669139.3333333334, ans=0.0 2024-09-25 05:05:11,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669139.3333333334, ans=0.1 2024-09-25 05:05:21,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=669186.0, ans=0.125 2024-09-25 05:05:35,824 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.257e+02 1.346e+02 1.447e+02 1.997e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-25 05:05:37,418 INFO [train.py:1198] (0/4) Epoch 37, batch 3150, loss[loss=0.1884, ctc_loss=0.123, cr_loss=0.3272, over 17311.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1243, cr_loss=0.3408, over 3365870.06 frames. ], batch size: 46, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:05:37,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=669232.6666666666, ans=0.015 2024-09-25 05:05:43,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669232.6666666666, ans=0.1 2024-09-25 05:05:54,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=669279.3333333334, ans=0.125 2024-09-25 05:06:17,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=669326.0, ans=0.125 2024-09-25 05:06:53,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.48 vs. limit=10.0 2024-09-25 05:06:54,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-25 05:06:55,585 INFO [train.py:1198] (0/4) Epoch 37, batch 3200, loss[loss=0.1761, ctc_loss=0.1121, cr_loss=0.3199, over 17094.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1249, cr_loss=0.3421, over 3364594.95 frames. ], batch size: 43, lr: 3.20e-03, grad_scale: 32.0 2024-09-25 05:07:00,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=669466.0, ans=0.125 2024-09-25 05:07:20,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=669512.6666666666, ans=0.2 2024-09-25 05:07:20,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=669512.6666666666, ans=0.125 2024-09-25 05:07:37,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=669559.3333333334, ans=0.125 2024-09-25 05:07:48,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=669606.0, ans=0.025 2024-09-25 05:08:15,380 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.285e+02 1.355e+02 1.464e+02 2.821e+02, threshold=2.709e+02, percent-clipped=1.0 2024-09-25 05:08:15,405 INFO [train.py:1198] (0/4) Epoch 37, batch 3250, loss[loss=0.2511, ctc_loss=0.1655, cr_loss=0.428, over 17014.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1249, cr_loss=0.3419, over 3364498.84 frames. ], batch size: 53, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:08:38,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-09-25 05:09:06,516 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2024-09-25 05:09:10,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=669839.3333333334, ans=0.0 2024-09-25 05:09:13,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2024-09-25 05:09:33,488 INFO [train.py:1198] (0/4) Epoch 37, batch 3300, loss[loss=0.1786, ctc_loss=0.1155, cr_loss=0.3159, over 17292.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1254, cr_loss=0.3428, over 3361453.28 frames. ], batch size: 51, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:09:39,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2024-09-25 05:09:41,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=669932.6666666666, ans=0.2 2024-09-25 05:10:14,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-09-25 05:10:30,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=670072.6666666666, ans=15.0 2024-09-25 05:10:38,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=670119.3333333334, ans=0.0 2024-09-25 05:10:51,977 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.300e+02 1.355e+02 1.430e+02 2.152e+02, threshold=2.710e+02, percent-clipped=0.0 2024-09-25 05:10:52,001 INFO [train.py:1198] (0/4) Epoch 37, batch 3350, loss[loss=0.152, ctc_loss=0.09371, cr_loss=0.2914, over 16747.00 frames. ], tot_loss[loss=0.1936, ctc_loss=0.1251, cr_loss=0.3426, over 3366478.35 frames. ], batch size: 37, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:10:52,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2024-09-25 05:11:12,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=670212.6666666666, ans=0.025 2024-09-25 05:12:03,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=670352.6666666666, ans=0.0 2024-09-25 05:12:09,882 INFO [train.py:1198] (0/4) Epoch 37, batch 3400, loss[loss=0.1808, ctc_loss=0.1164, cr_loss=0.3217, over 17151.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1251, cr_loss=0.3417, over 3367677.58 frames. ], batch size: 45, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:12:25,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=670446.0, ans=0.0 2024-09-25 05:12:36,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=670446.0, ans=0.2 2024-09-25 05:13:28,361 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.294e+02 1.379e+02 1.549e+02 2.225e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-25 05:13:28,386 INFO [train.py:1198] (0/4) Epoch 37, batch 3450, loss[loss=0.1797, ctc_loss=0.1111, cr_loss=0.3428, over 17187.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1249, cr_loss=0.3407, over 3352180.90 frames. ], batch size: 41, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:13:54,606 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.22 vs. limit=22.5 2024-09-25 05:14:08,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670726.0, ans=0.0 2024-09-25 05:14:17,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=670772.6666666666, ans=0.2 2024-09-25 05:14:21,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=670772.6666666666, ans=0.125 2024-09-25 05:14:21,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=670772.6666666666, ans=22.5 2024-09-25 05:14:22,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=670772.6666666666, ans=0.0 2024-09-25 05:14:43,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=670819.3333333334, ans=0.025 2024-09-25 05:14:49,369 INFO [train.py:1198] (0/4) Epoch 37, batch 3500, loss[loss=0.1923, ctc_loss=0.1234, cr_loss=0.3444, over 17317.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1245, cr_loss=0.3407, over 3355089.89 frames. ], batch size: 51, lr: 3.20e-03, grad_scale: 16.0 2024-09-25 05:15:12,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=670912.6666666666, ans=0.125 2024-09-25 05:15:26,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=670959.3333333334, ans=0.95 2024-09-25 05:15:29,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=670959.3333333334, ans=0.2 2024-09-25 05:15:39,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=671006.0, ans=0.2 2024-09-25 05:15:47,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=671006.0, ans=0.2 2024-09-25 05:16:03,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=671052.6666666666, ans=0.0 2024-09-25 05:16:12,281 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.290e+02 1.375e+02 1.489e+02 2.034e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-25 05:16:12,305 INFO [train.py:1198] (0/4) Epoch 37, batch 3550, loss[loss=0.1657, ctc_loss=0.1067, cr_loss=0.2948, over 17020.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1247, cr_loss=0.3405, over 3362156.31 frames. ], batch size: 39, lr: 3.19e-03, grad_scale: 16.0 2024-09-25 05:16:16,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2024-09-25 05:16:34,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=671146.0, ans=0.025 2024-09-25 05:16:40,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=671146.0, ans=0.125 2024-09-25 05:16:42,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=671192.6666666666, ans=0.035 2024-09-25 05:16:44,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2024-09-25 05:17:30,029 INFO [train.py:1198] (0/4) Epoch 37, batch 3600, loss[loss=0.1701, ctc_loss=0.1052, cr_loss=0.3246, over 17282.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1243, cr_loss=0.3403, over 3369476.90 frames. ], batch size: 42, lr: 3.19e-03, grad_scale: 32.0 2024-09-25 05:18:05,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=671426.0, ans=0.0 2024-09-25 05:18:10,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-25 05:18:44,869 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:18:50,878 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.287e+02 1.354e+02 1.480e+02 1.820e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-25 05:18:50,902 INFO [train.py:1198] (0/4) Epoch 37, batch 3650, loss[loss=0.2021, ctc_loss=0.1278, cr_loss=0.3714, over 17212.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1245, cr_loss=0.3412, over 3370345.87 frames. ], batch size: 47, lr: 3.19e-03, grad_scale: 32.0 2024-09-25 05:19:26,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=671659.3333333334, ans=0.09899494936611666 2024-09-25 05:19:26,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=671659.3333333334, ans=0.125 2024-09-25 05:19:47,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=671706.0, ans=0.0 2024-09-25 05:20:09,452 INFO [train.py:1198] (0/4) Epoch 37, batch 3700, loss[loss=0.2068, ctc_loss=0.1313, cr_loss=0.3776, over 17011.00 frames. ], tot_loss[loss=0.1937, ctc_loss=0.1251, cr_loss=0.3428, over 3369176.60 frames. ], batch size: 51, lr: 3.19e-03, grad_scale: 32.0 2024-09-25 05:20:23,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=671846.0, ans=0.0 2024-09-25 05:20:31,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=671846.0, ans=0.125 2024-09-25 05:20:39,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=671892.6666666666, ans=0.125 2024-09-25 05:21:05,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=12.0 2024-09-25 05:21:14,936 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-144000.pt 2024-09-25 05:21:21,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=671986.0, ans=0.125 2024-09-25 05:21:26,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=22.5 2024-09-25 05:21:29,092 INFO [train.py:1198] (0/4) Epoch 37, batch 3750, loss[loss=0.2188, ctc_loss=0.1431, cr_loss=0.3785, over 16912.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.125, cr_loss=0.342, over 3360984.96 frames. ], batch size: 58, lr: 3.19e-03, grad_scale: 16.0 2024-09-25 05:21:30,592 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.295e+02 1.401e+02 1.510e+02 2.261e+02, threshold=2.801e+02, percent-clipped=0.0 2024-09-25 05:21:34,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2024-09-25 05:21:38,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=672032.6666666666, ans=0.2 2024-09-25 05:22:01,261 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:22:05,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=672126.0, ans=0.125 2024-09-25 05:22:18,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=22.5 2024-09-25 05:22:19,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=672172.6666666666, ans=0.0 2024-09-25 05:22:26,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=672172.6666666666, ans=0.0 2024-09-25 05:22:30,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=672219.3333333334, ans=0.125 2024-09-25 05:22:47,833 INFO [train.py:1198] (0/4) Epoch 37, batch 3800, loss[loss=0.1889, ctc_loss=0.1194, cr_loss=0.3477, over 17329.00 frames. ], tot_loss[loss=0.1951, ctc_loss=0.1262, cr_loss=0.3445, over 3344016.89 frames. ], batch size: 51, lr: 3.19e-03, grad_scale: 16.0 2024-09-25 05:22:54,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=672266.0, ans=0.125 2024-09-25 05:22:56,531 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:24:03,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=672452.6666666666, ans=0.0 2024-09-25 05:24:07,329 INFO [train.py:1198] (0/4) Epoch 37, batch 3850, loss[loss=0.2636, ctc_loss=0.1769, cr_loss=0.4337, over 12123.00 frames. ], tot_loss[loss=0.1957, ctc_loss=0.1268, cr_loss=0.3447, over 3287734.53 frames. ], batch size: 123, lr: 3.19e-03, grad_scale: 16.0 2024-09-25 05:24:08,856 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.077e+02 1.255e+02 1.356e+02 1.479e+02 3.474e+02, threshold=2.713e+02, percent-clipped=1.0 2024-09-25 05:24:27,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=672546.0, ans=0.2 2024-09-25 05:24:40,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=672592.6666666666, ans=0.2 2024-09-25 05:25:16,934 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-37.pt 2024-09-25 05:26:04,403 INFO [train.py:1198] (0/4) Epoch 38, batch 0, loss[loss=0.1997, ctc_loss=0.1295, cr_loss=0.3508, over 17019.00 frames. ], tot_loss[loss=0.1997, ctc_loss=0.1295, cr_loss=0.3508, over 17019.00 frames. ], batch size: 44, lr: 3.15e-03, grad_scale: 32.0 2024-09-25 05:26:04,404 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 05:26:20,208 INFO [train.py:1230] (0/4) Epoch 38, validation: loss=0.03515, ctc_loss=0.03515, cr_loss=9.44e-15, over 944034.00 frames. 2024-09-25 05:26:20,209 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 05:26:26,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=672714.0, ans=0.015 2024-09-25 05:26:27,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=672714.0, ans=0.04949747468305833 2024-09-25 05:26:30,444 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=672714.0, ans=0.1 2024-09-25 05:26:40,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=672760.6666666666, ans=0.0 2024-09-25 05:27:27,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=672900.6666666666, ans=0.025 2024-09-25 05:27:40,674 INFO [train.py:1198] (0/4) Epoch 38, batch 50, loss[loss=0.1972, ctc_loss=0.1291, cr_loss=0.3402, over 17276.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1246, cr_loss=0.3423, over 753305.29 frames. ], batch size: 49, lr: 3.15e-03, grad_scale: 16.0 2024-09-25 05:27:41,473 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2024-09-25 05:27:42,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=672947.3333333334, ans=0.0 2024-09-25 05:27:44,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=672947.3333333334, ans=0.1 2024-09-25 05:27:50,498 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.374e+02 1.537e+02 1.726e+02 2.147e+02, threshold=3.075e+02, percent-clipped=0.0 2024-09-25 05:28:09,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=672994.0, ans=0.0 2024-09-25 05:28:25,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=673040.6666666666, ans=0.125 2024-09-25 05:28:31,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=673087.3333333334, ans=0.0 2024-09-25 05:28:39,913 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:28:44,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=673087.3333333334, ans=0.0 2024-09-25 05:29:03,641 INFO [train.py:1198] (0/4) Epoch 38, batch 100, loss[loss=0.1736, ctc_loss=0.1111, cr_loss=0.3123, over 17121.00 frames. ], tot_loss[loss=0.1947, ctc_loss=0.1258, cr_loss=0.3445, over 1322037.20 frames. ], batch size: 40, lr: 3.15e-03, grad_scale: 8.0 2024-09-25 05:29:07,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=15.0 2024-09-25 05:29:18,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=673227.3333333334, ans=0.125 2024-09-25 05:29:43,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=673274.0, ans=0.05 2024-09-25 05:29:52,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=673274.0, ans=0.125 2024-09-25 05:30:21,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=673367.3333333334, ans=0.5 2024-09-25 05:30:31,020 INFO [train.py:1198] (0/4) Epoch 38, batch 150, loss[loss=0.1883, ctc_loss=0.1224, cr_loss=0.3295, over 16990.00 frames. ], tot_loss[loss=0.1942, ctc_loss=0.1252, cr_loss=0.3447, over 1775019.94 frames. ], batch size: 53, lr: 3.15e-03, grad_scale: 8.0 2024-09-25 05:30:42,271 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.262e+02 1.329e+02 1.427e+02 1.998e+02, threshold=2.657e+02, percent-clipped=0.0 2024-09-25 05:30:45,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=673460.6666666666, ans=0.0 2024-09-25 05:31:16,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=673507.3333333334, ans=0.0 2024-09-25 05:31:40,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=673600.6666666666, ans=0.125 2024-09-25 05:31:51,206 INFO [train.py:1198] (0/4) Epoch 38, batch 200, loss[loss=0.1559, ctc_loss=0.09875, cr_loss=0.2857, over 17264.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1239, cr_loss=0.3417, over 2138089.29 frames. ], batch size: 42, lr: 3.15e-03, grad_scale: 8.0 2024-09-25 05:32:30,055 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:32:41,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=673787.3333333334, ans=0.0 2024-09-25 05:33:13,725 INFO [train.py:1198] (0/4) Epoch 38, batch 250, loss[loss=0.2177, ctc_loss=0.1401, cr_loss=0.3878, over 17032.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1243, cr_loss=0.3433, over 2410131.43 frames. ], batch size: 52, lr: 3.15e-03, grad_scale: 8.0 2024-09-25 05:33:24,710 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.262e+02 1.325e+02 1.385e+02 3.377e+02, threshold=2.651e+02, percent-clipped=1.0 2024-09-25 05:33:36,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=673927.3333333334, ans=0.125 2024-09-25 05:33:55,655 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-09-25 05:33:56,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=673974.0, ans=0.0 2024-09-25 05:34:35,577 INFO [train.py:1198] (0/4) Epoch 38, batch 300, loss[loss=0.1831, ctc_loss=0.1187, cr_loss=0.3217, over 16728.00 frames. ], tot_loss[loss=0.1945, ctc_loss=0.1256, cr_loss=0.3445, over 2607940.17 frames. ], batch size: 61, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:34:40,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=674114.0, ans=0.1 2024-09-25 05:35:08,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=674160.6666666666, ans=0.0 2024-09-25 05:35:20,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=674207.3333333334, ans=0.0 2024-09-25 05:35:28,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.13 vs. limit=10.0 2024-09-25 05:35:31,882 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2024-09-25 05:35:34,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=674254.0, ans=0.025 2024-09-25 05:35:37,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=674254.0, ans=0.125 2024-09-25 05:35:38,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=674254.0, ans=0.0 2024-09-25 05:35:39,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=674254.0, ans=0.0 2024-09-25 05:35:51,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=674300.6666666666, ans=0.2 2024-09-25 05:35:52,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2024-09-25 05:36:01,136 INFO [train.py:1198] (0/4) Epoch 38, batch 350, loss[loss=0.2227, ctc_loss=0.1441, cr_loss=0.3931, over 16997.00 frames. ], tot_loss[loss=0.194, ctc_loss=0.1254, cr_loss=0.3434, over 2766280.20 frames. ], batch size: 56, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:36:12,222 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.073e+02 1.293e+02 1.371e+02 1.501e+02 2.181e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-25 05:36:15,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=674394.0, ans=0.125 2024-09-25 05:36:31,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=674440.6666666666, ans=0.025 2024-09-25 05:36:51,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=674487.3333333334, ans=0.125 2024-09-25 05:36:55,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=674487.3333333334, ans=0.025 2024-09-25 05:37:19,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=674580.6666666666, ans=0.0 2024-09-25 05:37:20,499 INFO [train.py:1198] (0/4) Epoch 38, batch 400, loss[loss=0.184, ctc_loss=0.1143, cr_loss=0.3483, over 17181.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1245, cr_loss=0.3421, over 2900214.38 frames. ], batch size: 45, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:37:22,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=674580.6666666666, ans=0.125 2024-09-25 05:37:43,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=674627.3333333334, ans=0.2 2024-09-25 05:37:47,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=674627.3333333334, ans=0.95 2024-09-25 05:37:56,122 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.31 vs. limit=15.0 2024-09-25 05:38:13,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=674720.6666666666, ans=0.125 2024-09-25 05:38:18,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=674720.6666666666, ans=0.125 2024-09-25 05:38:34,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=674767.3333333334, ans=0.0 2024-09-25 05:38:42,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-25 05:38:42,902 INFO [train.py:1198] (0/4) Epoch 38, batch 450, loss[loss=0.1831, ctc_loss=0.1169, cr_loss=0.3309, over 17223.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.124, cr_loss=0.3408, over 3010392.71 frames. ], batch size: 47, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:38:55,495 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.292e+02 1.363e+02 1.447e+02 2.119e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-25 05:39:11,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=674860.6666666666, ans=0.5 2024-09-25 05:39:18,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=674907.3333333334, ans=0.125 2024-09-25 05:39:35,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=674954.0, ans=10.0 2024-09-25 05:39:40,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=674954.0, ans=0.0 2024-09-25 05:39:54,974 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2024-09-25 05:39:55,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=22.5 2024-09-25 05:40:11,292 INFO [train.py:1198] (0/4) Epoch 38, batch 500, loss[loss=0.156, ctc_loss=0.09566, cr_loss=0.3018, over 17293.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1241, cr_loss=0.3417, over 3092757.15 frames. ], batch size: 42, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:40:27,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=675094.0, ans=0.025 2024-09-25 05:40:48,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=675140.6666666666, ans=0.0 2024-09-25 05:40:56,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=675140.6666666666, ans=0.125 2024-09-25 05:40:59,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=675187.3333333334, ans=0.025 2024-09-25 05:41:19,784 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:41:21,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-09-25 05:41:22,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=675234.0, ans=0.125 2024-09-25 05:41:27,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=675234.0, ans=0.2 2024-09-25 05:41:30,505 INFO [train.py:1198] (0/4) Epoch 38, batch 550, loss[loss=0.1764, ctc_loss=0.1112, cr_loss=0.326, over 17243.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1247, cr_loss=0.3424, over 3144164.70 frames. ], batch size: 44, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:41:30,996 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 05:41:43,253 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.284e+02 1.364e+02 1.434e+02 1.794e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 05:41:45,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=675327.3333333334, ans=0.0 2024-09-25 05:41:48,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=675327.3333333334, ans=0.0 2024-09-25 05:42:47,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-09-25 05:42:50,162 INFO [train.py:1198] (0/4) Epoch 38, batch 600, loss[loss=0.1681, ctc_loss=0.106, cr_loss=0.3108, over 17299.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1252, cr_loss=0.3437, over 3192088.85 frames. ], batch size: 42, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:43:18,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=675560.6666666666, ans=0.1 2024-09-25 05:43:30,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675607.3333333334, ans=0.1 2024-09-25 05:43:31,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=675607.3333333334, ans=0.125 2024-09-25 05:43:53,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=675654.0, ans=0.0 2024-09-25 05:44:13,110 INFO [train.py:1198] (0/4) Epoch 38, batch 650, loss[loss=0.1813, ctc_loss=0.1164, cr_loss=0.3244, over 17066.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1252, cr_loss=0.3435, over 3234841.06 frames. ], batch size: 46, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:44:28,505 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.283e+02 1.368e+02 1.490e+02 2.037e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 05:44:50,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-25 05:45:09,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-09-25 05:45:09,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=26.48 vs. limit=22.5 2024-09-25 05:45:12,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=675887.3333333334, ans=0.5 2024-09-25 05:45:23,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=675934.0, ans=0.125 2024-09-25 05:45:39,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=675980.6666666666, ans=0.2 2024-09-25 05:45:40,419 INFO [train.py:1198] (0/4) Epoch 38, batch 700, loss[loss=0.2202, ctc_loss=0.1423, cr_loss=0.3897, over 17035.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1246, cr_loss=0.3426, over 3259360.23 frames. ], batch size: 56, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:45:55,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=676027.3333333334, ans=0.125 2024-09-25 05:46:04,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676027.3333333334, ans=0.1 2024-09-25 05:46:06,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=676027.3333333334, ans=0.0 2024-09-25 05:46:07,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=676027.3333333334, ans=0.125 2024-09-25 05:46:22,984 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-09-25 05:46:33,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=676120.6666666666, ans=0.125 2024-09-25 05:47:00,009 INFO [train.py:1198] (0/4) Epoch 38, batch 750, loss[loss=0.1972, ctc_loss=0.1265, cr_loss=0.3538, over 17208.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1242, cr_loss=0.3421, over 3286144.29 frames. ], batch size: 55, lr: 3.14e-03, grad_scale: 8.0 2024-09-25 05:47:12,450 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.277e+02 1.363e+02 1.416e+02 2.105e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 05:47:36,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=676307.3333333334, ans=0.125 2024-09-25 05:47:43,429 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2024-09-25 05:48:17,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=676400.6666666666, ans=0.2 2024-09-25 05:48:21,654 INFO [train.py:1198] (0/4) Epoch 38, batch 800, loss[loss=0.2024, ctc_loss=0.1283, cr_loss=0.371, over 17306.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1241, cr_loss=0.342, over 3304884.84 frames. ], batch size: 51, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:49:15,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2024-09-25 05:49:23,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=676587.3333333334, ans=0.0 2024-09-25 05:49:28,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2024-09-25 05:49:32,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=676634.0, ans=0.0 2024-09-25 05:49:49,052 INFO [train.py:1198] (0/4) Epoch 38, batch 850, loss[loss=0.1955, ctc_loss=0.127, cr_loss=0.3427, over 16145.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1246, cr_loss=0.3432, over 3322657.59 frames. ], batch size: 74, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:49:52,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=676680.6666666666, ans=0.0 2024-09-25 05:49:54,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=676680.6666666666, ans=0.125 2024-09-25 05:50:01,638 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.050e+02 1.281e+02 1.360e+02 1.434e+02 2.186e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-25 05:50:02,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2024-09-25 05:50:19,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=676774.0, ans=0.125 2024-09-25 05:50:34,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=676774.0, ans=0.0 2024-09-25 05:50:37,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=12.0 2024-09-25 05:50:38,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=676820.6666666666, ans=0.125 2024-09-25 05:50:45,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=676820.6666666666, ans=0.125 2024-09-25 05:51:08,595 INFO [train.py:1198] (0/4) Epoch 38, batch 900, loss[loss=0.2072, ctc_loss=0.1376, cr_loss=0.3477, over 16179.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1251, cr_loss=0.3437, over 3335296.92 frames. ], batch size: 74, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:51:12,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=676914.0, ans=0.0 2024-09-25 05:51:52,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.53 vs. limit=12.0 2024-09-25 05:52:18,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=677100.6666666666, ans=0.125 2024-09-25 05:52:29,103 INFO [train.py:1198] (0/4) Epoch 38, batch 950, loss[loss=0.2318, ctc_loss=0.151, cr_loss=0.4041, over 17013.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1242, cr_loss=0.3418, over 3340949.20 frames. ], batch size: 53, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:52:39,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=22.5 2024-09-25 05:52:41,920 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.327e+02 1.410e+02 1.537e+02 3.330e+02, threshold=2.819e+02, percent-clipped=2.0 2024-09-25 05:53:52,326 INFO [train.py:1198] (0/4) Epoch 38, batch 1000, loss[loss=0.164, ctc_loss=0.104, cr_loss=0.3003, over 17297.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1239, cr_loss=0.3415, over 3344312.93 frames. ], batch size: 46, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:54:25,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=677474.0, ans=0.09899494936611666 2024-09-25 05:55:07,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=677567.3333333334, ans=0.035 2024-09-25 05:55:16,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=677567.3333333334, ans=0.125 2024-09-25 05:55:19,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=677614.0, ans=0.015 2024-09-25 05:55:20,560 INFO [train.py:1198] (0/4) Epoch 38, batch 1050, loss[loss=0.1512, ctc_loss=0.09567, cr_loss=0.2778, over 16973.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1234, cr_loss=0.3404, over 3351817.47 frames. ], batch size: 42, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:55:33,504 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.267e+02 1.356e+02 1.451e+02 1.928e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-25 05:55:55,276 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2024-09-25 05:56:40,345 INFO [train.py:1198] (0/4) Epoch 38, batch 1100, loss[loss=0.1908, ctc_loss=0.1193, cr_loss=0.3573, over 17010.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3404, over 3356543.92 frames. ], batch size: 39, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:56:42,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=22.5 2024-09-25 05:57:07,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=677894.0, ans=0.0 2024-09-25 05:57:22,930 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2024-09-25 05:57:24,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.48 vs. limit=6.0 2024-09-25 05:57:31,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=677987.3333333334, ans=0.125 2024-09-25 05:57:52,480 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=678034.0, ans=0.125 2024-09-25 05:58:02,652 INFO [train.py:1198] (0/4) Epoch 38, batch 1150, loss[loss=0.1914, ctc_loss=0.124, cr_loss=0.3369, over 17116.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.341, over 3364597.86 frames. ], batch size: 49, lr: 3.14e-03, grad_scale: 16.0 2024-09-25 05:58:07,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=678080.6666666666, ans=0.125 2024-09-25 05:58:14,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=678080.6666666666, ans=0.125 2024-09-25 05:58:15,131 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.254e+02 1.322e+02 1.438e+02 2.414e+02, threshold=2.644e+02, percent-clipped=0.0 2024-09-25 05:58:39,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=678174.0, ans=0.025 2024-09-25 05:59:14,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=678267.3333333334, ans=0.0 2024-09-25 05:59:25,216 INFO [train.py:1198] (0/4) Epoch 38, batch 1200, loss[loss=0.1705, ctc_loss=0.1081, cr_loss=0.3121, over 17116.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1235, cr_loss=0.3401, over 3367574.89 frames. ], batch size: 40, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:00:02,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=678407.3333333334, ans=0.025 2024-09-25 06:00:21,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=678454.0, ans=0.125 2024-09-25 06:00:37,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=678500.6666666666, ans=0.125 2024-09-25 06:00:50,407 INFO [train.py:1198] (0/4) Epoch 38, batch 1250, loss[loss=0.151, ctc_loss=0.09444, cr_loss=0.283, over 16947.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1222, cr_loss=0.3371, over 3369931.57 frames. ], batch size: 42, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:00:57,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=678547.3333333334, ans=0.2 2024-09-25 06:01:04,690 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.074e+02 1.283e+02 1.378e+02 1.489e+02 1.932e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-25 06:01:08,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=678594.0, ans=0.035 2024-09-25 06:01:27,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=678640.6666666666, ans=0.07 2024-09-25 06:01:48,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=678687.3333333334, ans=0.125 2024-09-25 06:01:54,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678734.0, ans=0.1 2024-09-25 06:02:06,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=678734.0, ans=0.1 2024-09-25 06:02:08,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=678734.0, ans=0.2 2024-09-25 06:02:11,152 INFO [train.py:1198] (0/4) Epoch 38, batch 1300, loss[loss=0.2351, ctc_loss=0.1517, cr_loss=0.4173, over 17012.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1224, cr_loss=0.3379, over 3371444.28 frames. ], batch size: 56, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:02:55,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=678874.0, ans=0.2 2024-09-25 06:03:16,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=678967.3333333334, ans=0.2 2024-09-25 06:03:33,486 INFO [train.py:1198] (0/4) Epoch 38, batch 1350, loss[loss=0.2294, ctc_loss=0.1539, cr_loss=0.3779, over 11920.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1238, cr_loss=0.3402, over 3346284.26 frames. ], batch size: 123, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:03:47,755 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.262e+02 1.346e+02 1.430e+02 2.071e+02, threshold=2.693e+02, percent-clipped=0.0 2024-09-25 06:04:05,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679107.3333333334, ans=0.1 2024-09-25 06:04:14,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=679107.3333333334, ans=0.125 2024-09-25 06:04:33,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=679154.0, ans=0.1 2024-09-25 06:05:01,179 INFO [train.py:1198] (0/4) Epoch 38, batch 1400, loss[loss=0.1701, ctc_loss=0.1082, cr_loss=0.3095, over 17281.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1248, cr_loss=0.3416, over 3342666.20 frames. ], batch size: 42, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:05:19,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=679294.0, ans=0.0 2024-09-25 06:05:25,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=679294.0, ans=0.2 2024-09-25 06:05:42,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=679340.6666666666, ans=6.0 2024-09-25 06:05:44,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=12.0 2024-09-25 06:05:47,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=679387.3333333334, ans=0.1 2024-09-25 06:05:55,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=679387.3333333334, ans=0.0 2024-09-25 06:06:00,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=679387.3333333334, ans=0.125 2024-09-25 06:06:06,895 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:06:11,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=679434.0, ans=0.125 2024-09-25 06:06:13,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679434.0, ans=0.1 2024-09-25 06:06:21,082 INFO [train.py:1198] (0/4) Epoch 38, batch 1450, loss[loss=0.1728, ctc_loss=0.1118, cr_loss=0.3048, over 16984.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1236, cr_loss=0.3395, over 3348462.29 frames. ], batch size: 39, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:06:35,619 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.086e+02 1.250e+02 1.326e+02 1.407e+02 2.354e+02, threshold=2.651e+02, percent-clipped=0.0 2024-09-25 06:07:17,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=679620.6666666666, ans=0.125 2024-09-25 06:07:18,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=679620.6666666666, ans=0.0 2024-09-25 06:07:41,101 INFO [train.py:1198] (0/4) Epoch 38, batch 1500, loss[loss=0.1877, ctc_loss=0.1184, cr_loss=0.3464, over 17015.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1245, cr_loss=0.3412, over 3338397.75 frames. ], batch size: 44, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:07:58,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=679760.6666666666, ans=0.125 2024-09-25 06:08:09,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=679760.6666666666, ans=0.2 2024-09-25 06:08:15,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=679807.3333333334, ans=0.125 2024-09-25 06:08:16,460 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-09-25 06:08:19,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-25 06:08:53,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-25 06:08:54,538 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:09:06,158 INFO [train.py:1198] (0/4) Epoch 38, batch 1550, loss[loss=0.1641, ctc_loss=0.1031, cr_loss=0.3048, over 17180.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1248, cr_loss=0.3424, over 3346472.87 frames. ], batch size: 41, lr: 3.13e-03, grad_scale: 16.0 2024-09-25 06:09:09,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=679947.3333333334, ans=0.2 2024-09-25 06:09:12,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=679947.3333333334, ans=0.125 2024-09-25 06:09:20,509 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.263e+02 1.343e+02 1.440e+02 2.044e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-25 06:09:20,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=679994.0, ans=0.1 2024-09-25 06:09:31,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679994.0, ans=0.1 2024-09-25 06:09:31,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=679994.0, ans=0.125 2024-09-25 06:09:55,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=12.0 2024-09-25 06:09:58,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680087.3333333334, ans=0.1 2024-09-25 06:10:31,468 INFO [train.py:1198] (0/4) Epoch 38, batch 1600, loss[loss=0.1907, ctc_loss=0.1264, cr_loss=0.3213, over 17144.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.125, cr_loss=0.3426, over 3354959.55 frames. ], batch size: 48, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:10:58,360 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-25 06:10:59,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2024-09-25 06:11:14,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=680274.0, ans=0.125 2024-09-25 06:11:16,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=680274.0, ans=0.025 2024-09-25 06:11:18,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680320.6666666666, ans=0.1 2024-09-25 06:11:27,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=680320.6666666666, ans=0.125 2024-09-25 06:11:29,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=680320.6666666666, ans=0.025 2024-09-25 06:11:42,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2024-09-25 06:11:51,476 INFO [train.py:1198] (0/4) Epoch 38, batch 1650, loss[loss=0.1645, ctc_loss=0.1033, cr_loss=0.3062, over 17045.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1243, cr_loss=0.3416, over 3359824.35 frames. ], batch size: 39, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:12:05,738 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.273e+02 1.346e+02 1.505e+02 2.146e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-25 06:12:25,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=680507.3333333334, ans=0.0 2024-09-25 06:12:37,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=680554.0, ans=0.0 2024-09-25 06:13:05,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=680600.6666666666, ans=0.05 2024-09-25 06:13:07,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=680600.6666666666, ans=0.125 2024-09-25 06:13:13,539 INFO [train.py:1198] (0/4) Epoch 38, batch 1700, loss[loss=0.1734, ctc_loss=0.1113, cr_loss=0.3104, over 17280.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1241, cr_loss=0.341, over 3365388.78 frames. ], batch size: 46, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:13:28,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=680694.0, ans=0.125 2024-09-25 06:13:41,107 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2024-09-25 06:14:07,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=680787.3333333334, ans=0.125 2024-09-25 06:14:17,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=680787.3333333334, ans=0.2 2024-09-25 06:14:30,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=680834.0, ans=0.0 2024-09-25 06:14:38,371 INFO [train.py:1198] (0/4) Epoch 38, batch 1750, loss[loss=0.1311, ctc_loss=0.0815, cr_loss=0.2481, over 17125.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1244, cr_loss=0.3416, over 3362680.98 frames. ], batch size: 40, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:14:42,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2024-09-25 06:14:55,388 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.278e+02 1.373e+02 1.469e+02 4.120e+02, threshold=2.745e+02, percent-clipped=1.0 2024-09-25 06:15:13,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=680974.0, ans=0.07 2024-09-25 06:15:30,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=681020.6666666666, ans=0.0 2024-09-25 06:15:46,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=681067.3333333334, ans=0.125 2024-09-25 06:16:01,066 INFO [train.py:1198] (0/4) Epoch 38, batch 1800, loss[loss=0.1874, ctc_loss=0.1201, cr_loss=0.3364, over 17298.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.125, cr_loss=0.3423, over 3355718.82 frames. ], batch size: 46, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:16:24,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=681160.6666666666, ans=0.125 2024-09-25 06:16:35,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=681207.3333333334, ans=15.0 2024-09-25 06:17:02,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=681254.0, ans=0.95 2024-09-25 06:17:21,940 INFO [train.py:1198] (0/4) Epoch 38, batch 1850, loss[loss=0.1588, ctc_loss=0.1002, cr_loss=0.2932, over 17057.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1245, cr_loss=0.342, over 3369750.06 frames. ], batch size: 39, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:17:36,437 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.079e+02 1.263e+02 1.331e+02 1.457e+02 2.352e+02, threshold=2.661e+02, percent-clipped=0.0 2024-09-25 06:17:44,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681394.0, ans=0.1 2024-09-25 06:17:55,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=681440.6666666666, ans=0.0 2024-09-25 06:17:57,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=22.5 2024-09-25 06:18:03,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=681440.6666666666, ans=0.0 2024-09-25 06:18:04,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=681440.6666666666, ans=0.0 2024-09-25 06:18:04,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681440.6666666666, ans=0.1 2024-09-25 06:18:33,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=681534.0, ans=0.125 2024-09-25 06:18:43,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=681580.6666666666, ans=0.07 2024-09-25 06:18:44,348 INFO [train.py:1198] (0/4) Epoch 38, batch 1900, loss[loss=0.1812, ctc_loss=0.118, cr_loss=0.3158, over 16731.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1248, cr_loss=0.3427, over 3374369.98 frames. ], batch size: 61, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:19:04,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=681627.3333333334, ans=0.1 2024-09-25 06:19:22,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681674.0, ans=0.1 2024-09-25 06:19:42,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=681720.6666666666, ans=0.125 2024-09-25 06:20:04,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=681767.3333333334, ans=0.125 2024-09-25 06:20:07,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=681767.3333333334, ans=0.05 2024-09-25 06:20:10,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=681814.0, ans=0.2 2024-09-25 06:20:11,786 INFO [train.py:1198] (0/4) Epoch 38, batch 1950, loss[loss=0.1618, ctc_loss=0.1011, cr_loss=0.3038, over 17044.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1245, cr_loss=0.3416, over 3372639.27 frames. ], batch size: 39, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:20:15,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=681814.0, ans=0.025 2024-09-25 06:20:18,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=681814.0, ans=0.0 2024-09-25 06:20:27,501 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.069e+02 1.276e+02 1.376e+02 1.498e+02 2.117e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 06:20:42,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681907.3333333334, ans=0.1 2024-09-25 06:20:51,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=681907.3333333334, ans=0.125 2024-09-25 06:21:15,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=682000.6666666666, ans=0.04949747468305833 2024-09-25 06:21:17,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=682000.6666666666, ans=0.125 2024-09-25 06:21:31,235 INFO [train.py:1198] (0/4) Epoch 38, batch 2000, loss[loss=0.1709, ctc_loss=0.1081, cr_loss=0.3138, over 17093.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1245, cr_loss=0.3413, over 3372064.24 frames. ], batch size: 43, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:22:22,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=682187.3333333334, ans=0.125 2024-09-25 06:22:45,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=682234.0, ans=0.125 2024-09-25 06:22:51,389 INFO [train.py:1198] (0/4) Epoch 38, batch 2050, loss[loss=0.1767, ctc_loss=0.1113, cr_loss=0.3268, over 17111.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1247, cr_loss=0.3426, over 3369464.62 frames. ], batch size: 43, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:23:09,956 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.278e+02 1.341e+02 1.480e+02 2.182e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-25 06:23:31,889 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=22.5 2024-09-25 06:24:04,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=682467.3333333334, ans=0.2 2024-09-25 06:24:16,376 INFO [train.py:1198] (0/4) Epoch 38, batch 2100, loss[loss=0.1719, ctc_loss=0.1099, cr_loss=0.31, over 16371.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.124, cr_loss=0.341, over 3368867.28 frames. ], batch size: 36, lr: 3.13e-03, grad_scale: 32.0 2024-09-25 06:24:21,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=682514.0, ans=0.2 2024-09-25 06:24:43,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682560.6666666666, ans=0.1 2024-09-25 06:25:41,097 INFO [train.py:1198] (0/4) Epoch 38, batch 2150, loss[loss=0.2217, ctc_loss=0.1443, cr_loss=0.3874, over 17179.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1235, cr_loss=0.3401, over 3367435.62 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:25:46,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=682747.3333333334, ans=0.0 2024-09-25 06:25:51,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=682747.3333333334, ans=0.125 2024-09-25 06:25:56,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2024-09-25 06:25:59,160 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.107e+02 1.297e+02 1.376e+02 1.525e+02 2.502e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-25 06:26:02,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=682794.0, ans=0.125 2024-09-25 06:26:10,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=682794.0, ans=0.0 2024-09-25 06:26:35,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=682887.3333333334, ans=0.07 2024-09-25 06:26:47,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=682934.0, ans=0.125 2024-09-25 06:26:55,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=15.0 2024-09-25 06:27:01,626 INFO [train.py:1198] (0/4) Epoch 38, batch 2200, loss[loss=0.1531, ctc_loss=0.0965, cr_loss=0.2829, over 16715.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1241, cr_loss=0.3413, over 3375832.98 frames. ], batch size: 37, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:27:39,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=22.5 2024-09-25 06:28:02,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=683120.6666666666, ans=0.1 2024-09-25 06:28:04,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=683120.6666666666, ans=0.125 2024-09-25 06:28:15,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=683167.3333333334, ans=0.0 2024-09-25 06:28:24,720 INFO [train.py:1198] (0/4) Epoch 38, batch 2250, loss[loss=0.2081, ctc_loss=0.1346, cr_loss=0.3675, over 17221.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1248, cr_loss=0.3421, over 3369571.18 frames. ], batch size: 55, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:28:26,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=683214.0, ans=0.125 2024-09-25 06:28:31,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-09-25 06:28:42,347 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.283e+02 1.354e+02 1.470e+02 2.386e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-25 06:28:46,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=683260.6666666666, ans=15.0 2024-09-25 06:28:46,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2024-09-25 06:28:47,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=683260.6666666666, ans=0.025 2024-09-25 06:29:08,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=683307.3333333334, ans=0.09899494936611666 2024-09-25 06:29:32,452 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:29:49,856 INFO [train.py:1198] (0/4) Epoch 38, batch 2300, loss[loss=0.2406, ctc_loss=0.16, cr_loss=0.4029, over 16020.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1254, cr_loss=0.3437, over 3359477.41 frames. ], batch size: 74, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:30:13,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=12.0 2024-09-25 06:30:25,005 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-09-25 06:30:32,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=683540.6666666666, ans=0.07 2024-09-25 06:30:48,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=683587.3333333334, ans=0.2 2024-09-25 06:30:49,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=683587.3333333334, ans=0.125 2024-09-25 06:31:00,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=683634.0, ans=15.0 2024-09-25 06:31:10,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=683680.6666666666, ans=0.2 2024-09-25 06:31:12,144 INFO [train.py:1198] (0/4) Epoch 38, batch 2350, loss[loss=0.1898, ctc_loss=0.1204, cr_loss=0.3467, over 16363.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1254, cr_loss=0.3438, over 3355432.19 frames. ], batch size: 36, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:31:20,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=683680.6666666666, ans=0.125 2024-09-25 06:31:26,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=683727.3333333334, ans=0.0 2024-09-25 06:31:29,751 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.295e+02 1.360e+02 1.434e+02 2.304e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 06:32:16,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=683867.3333333334, ans=0.125 2024-09-25 06:32:30,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=683914.0, ans=0.125 2024-09-25 06:32:31,860 INFO [train.py:1198] (0/4) Epoch 38, batch 2400, loss[loss=0.1531, ctc_loss=0.09498, cr_loss=0.2905, over 16353.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1245, cr_loss=0.3421, over 3364622.44 frames. ], batch size: 36, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:32:39,465 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=15.0 2024-09-25 06:32:40,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=683914.0, ans=0.2 2024-09-25 06:33:54,251 INFO [train.py:1198] (0/4) Epoch 38, batch 2450, loss[loss=0.2147, ctc_loss=0.1391, cr_loss=0.3778, over 16687.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1245, cr_loss=0.3423, over 3357686.13 frames. ], batch size: 61, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:33:54,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=684147.3333333334, ans=0.0 2024-09-25 06:34:14,633 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.026e+02 1.306e+02 1.378e+02 1.472e+02 2.938e+02, threshold=2.756e+02, percent-clipped=1.0 2024-09-25 06:34:18,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=684194.0, ans=0.125 2024-09-25 06:34:23,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=684194.0, ans=0.2 2024-09-25 06:34:24,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=684194.0, ans=0.0 2024-09-25 06:34:57,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=684287.3333333334, ans=0.125 2024-09-25 06:35:02,199 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=12.0 2024-09-25 06:35:22,480 INFO [train.py:1198] (0/4) Epoch 38, batch 2500, loss[loss=0.2052, ctc_loss=0.132, cr_loss=0.366, over 17236.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1248, cr_loss=0.3427, over 3363175.11 frames. ], batch size: 50, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:35:34,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=684380.6666666666, ans=0.125 2024-09-25 06:35:46,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=684427.3333333334, ans=0.0 2024-09-25 06:35:51,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.65 vs. limit=22.5 2024-09-25 06:35:57,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=684474.0, ans=0.125 2024-09-25 06:35:59,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=684474.0, ans=0.0 2024-09-25 06:36:00,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=684474.0, ans=0.2 2024-09-25 06:36:21,729 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=684520.6666666666, ans=0.05 2024-09-25 06:36:31,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=684567.3333333334, ans=0.0 2024-09-25 06:36:42,201 INFO [train.py:1198] (0/4) Epoch 38, batch 2550, loss[loss=0.1728, ctc_loss=0.1109, cr_loss=0.3098, over 17079.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1241, cr_loss=0.341, over 3364400.12 frames. ], batch size: 46, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:37:00,104 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.274e+02 1.355e+02 1.436e+02 2.221e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-25 06:37:13,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=684707.3333333334, ans=0.07 2024-09-25 06:38:05,959 INFO [train.py:1198] (0/4) Epoch 38, batch 2600, loss[loss=0.1743, ctc_loss=0.1124, cr_loss=0.3092, over 17252.00 frames. ], tot_loss[loss=0.1937, ctc_loss=0.1251, cr_loss=0.3429, over 3358788.87 frames. ], batch size: 42, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:38:11,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=684847.3333333334, ans=0.0 2024-09-25 06:38:17,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=684847.3333333334, ans=0.0 2024-09-25 06:38:25,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=684894.0, ans=0.125 2024-09-25 06:38:28,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=684894.0, ans=0.025 2024-09-25 06:38:35,640 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.78 vs. limit=10.0 2024-09-25 06:39:03,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=684987.3333333334, ans=0.125 2024-09-25 06:39:06,770 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=684987.3333333334, ans=0.0 2024-09-25 06:39:28,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=685034.0, ans=0.125 2024-09-25 06:39:31,218 INFO [train.py:1198] (0/4) Epoch 38, batch 2650, loss[loss=0.2414, ctc_loss=0.1574, cr_loss=0.4199, over 16994.00 frames. ], tot_loss[loss=0.1941, ctc_loss=0.1255, cr_loss=0.3432, over 3368174.31 frames. ], batch size: 56, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:39:37,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=685080.6666666666, ans=0.0 2024-09-25 06:39:45,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=685127.3333333334, ans=0.0 2024-09-25 06:39:48,718 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.311e+02 1.384e+02 1.483e+02 1.840e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-25 06:40:18,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=685174.0, ans=0.0 2024-09-25 06:40:22,123 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2024-09-25 06:40:44,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2024-09-25 06:40:50,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=685267.3333333334, ans=0.0 2024-09-25 06:40:53,184 INFO [train.py:1198] (0/4) Epoch 38, batch 2700, loss[loss=0.1809, ctc_loss=0.1176, cr_loss=0.3168, over 17188.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1249, cr_loss=0.342, over 3368627.38 frames. ], batch size: 41, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:41:04,762 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=685314.0, ans=0.125 2024-09-25 06:41:49,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2024-09-25 06:41:54,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=685454.0, ans=0.125 2024-09-25 06:42:12,893 INFO [train.py:1198] (0/4) Epoch 38, batch 2750, loss[loss=0.1577, ctc_loss=0.1006, cr_loss=0.2857, over 17256.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.124, cr_loss=0.3401, over 3361591.11 frames. ], batch size: 44, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:42:13,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=685547.3333333334, ans=0.2 2024-09-25 06:42:27,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=685594.0, ans=0.125 2024-09-25 06:42:32,105 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.260e+02 1.351e+02 1.429e+02 2.193e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-25 06:42:40,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=685594.0, ans=0.125 2024-09-25 06:43:13,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=685687.3333333334, ans=0.125 2024-09-25 06:43:24,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=685734.0, ans=0.0 2024-09-25 06:43:27,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=685734.0, ans=0.125 2024-09-25 06:43:32,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=685734.0, ans=0.025 2024-09-25 06:43:35,231 INFO [train.py:1198] (0/4) Epoch 38, batch 2800, loss[loss=0.1923, ctc_loss=0.1254, cr_loss=0.3344, over 16743.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1238, cr_loss=0.3401, over 3369252.17 frames. ], batch size: 61, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:44:06,956 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:44:19,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2024-09-25 06:44:22,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=12.0 2024-09-25 06:44:28,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=685920.6666666666, ans=0.125 2024-09-25 06:44:44,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=685967.3333333334, ans=0.125 2024-09-25 06:45:03,155 INFO [train.py:1198] (0/4) Epoch 38, batch 2850, loss[loss=0.1722, ctc_loss=0.1089, cr_loss=0.3164, over 17163.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1234, cr_loss=0.3393, over 3376960.10 frames. ], batch size: 45, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:45:08,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=686014.0, ans=0.0 2024-09-25 06:45:08,742 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=22.5 2024-09-25 06:45:17,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=15.0 2024-09-25 06:45:22,310 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.271e+02 1.362e+02 1.479e+02 2.279e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 06:45:40,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=686107.3333333334, ans=0.125 2024-09-25 06:46:02,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686154.0, ans=0.1 2024-09-25 06:46:02,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.99 vs. limit=10.0 2024-09-25 06:46:23,068 INFO [train.py:1198] (0/4) Epoch 38, batch 2900, loss[loss=0.1732, ctc_loss=0.1095, cr_loss=0.3186, over 17083.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1235, cr_loss=0.3398, over 3368636.75 frames. ], batch size: 46, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:46:34,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=686247.3333333334, ans=0.125 2024-09-25 06:46:50,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=686294.0, ans=10.0 2024-09-25 06:46:54,150 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2024-09-25 06:47:00,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=686340.6666666666, ans=0.125 2024-09-25 06:47:19,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=686387.3333333334, ans=0.125 2024-09-25 06:47:34,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686434.0, ans=0.1 2024-09-25 06:47:34,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=686434.0, ans=0.2 2024-09-25 06:47:36,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=686434.0, ans=0.125 2024-09-25 06:47:42,603 INFO [train.py:1198] (0/4) Epoch 38, batch 2950, loss[loss=0.1904, ctc_loss=0.1209, cr_loss=0.3473, over 17267.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1234, cr_loss=0.3402, over 3371271.18 frames. ], batch size: 44, lr: 3.12e-03, grad_scale: 32.0 2024-09-25 06:47:58,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=686480.6666666666, ans=0.125 2024-09-25 06:48:01,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=686527.3333333334, ans=0.0 2024-09-25 06:48:03,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686527.3333333334, ans=0.1 2024-09-25 06:48:04,473 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.305e+02 1.376e+02 1.477e+02 2.268e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 06:48:25,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686574.0, ans=0.1 2024-09-25 06:48:26,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=686574.0, ans=0.0 2024-09-25 06:48:26,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=686574.0, ans=0.0 2024-09-25 06:49:06,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=686714.0, ans=0.07 2024-09-25 06:49:07,364 INFO [train.py:1198] (0/4) Epoch 38, batch 3000, loss[loss=0.1802, ctc_loss=0.1135, cr_loss=0.3334, over 17075.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1231, cr_loss=0.3391, over 3359482.44 frames. ], batch size: 46, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:49:07,365 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 06:49:22,917 INFO [train.py:1230] (0/4) Epoch 38, validation: loss=0.03571, ctc_loss=0.03571, cr_loss=9.665e-15, over 944034.00 frames. 2024-09-25 06:49:22,917 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 06:49:57,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=686807.3333333334, ans=0.125 2024-09-25 06:50:19,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=686854.0, ans=0.025 2024-09-25 06:50:44,316 INFO [train.py:1198] (0/4) Epoch 38, batch 3050, loss[loss=0.2318, ctc_loss=0.1519, cr_loss=0.3995, over 16535.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1243, cr_loss=0.3423, over 3350078.31 frames. ], batch size: 66, lr: 3.12e-03, grad_scale: 16.0 2024-09-25 06:50:52,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=686947.3333333334, ans=0.0 2024-09-25 06:50:53,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=686947.3333333334, ans=0.125 2024-09-25 06:51:01,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=686994.0, ans=0.0 2024-09-25 06:51:04,197 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.282e+02 1.358e+02 1.470e+02 1.835e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 06:51:48,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=687134.0, ans=0.0 2024-09-25 06:52:01,990 INFO [train.py:1198] (0/4) Epoch 38, batch 3100, loss[loss=0.2055, ctc_loss=0.1347, cr_loss=0.3541, over 17084.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1244, cr_loss=0.3423, over 3361628.37 frames. ], batch size: 49, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 06:52:04,743 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.42 vs. limit=15.0 2024-09-25 06:52:21,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=687227.3333333334, ans=0.0 2024-09-25 06:52:48,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2024-09-25 06:53:20,026 INFO [train.py:1198] (0/4) Epoch 38, batch 3150, loss[loss=0.1862, ctc_loss=0.1202, cr_loss=0.3302, over 17031.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1238, cr_loss=0.341, over 3373838.11 frames. ], batch size: 56, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 06:53:26,500 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:53:40,301 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.270e+02 1.356e+02 1.474e+02 1.773e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-25 06:53:45,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=687460.6666666666, ans=0.125 2024-09-25 06:54:37,810 INFO [train.py:1198] (0/4) Epoch 38, batch 3200, loss[loss=0.2005, ctc_loss=0.1295, cr_loss=0.3553, over 17228.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.124, cr_loss=0.3413, over 3358537.03 frames. ], batch size: 50, lr: 3.11e-03, grad_scale: 32.0 2024-09-25 06:54:38,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=687647.3333333334, ans=0.0 2024-09-25 06:54:52,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=687694.0, ans=0.125 2024-09-25 06:55:06,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2024-09-25 06:55:13,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=687740.6666666666, ans=0.125 2024-09-25 06:55:36,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=15.0 2024-09-25 06:55:47,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=12.0 2024-09-25 06:55:55,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=22.5 2024-09-25 06:55:56,025 INFO [train.py:1198] (0/4) Epoch 38, batch 3250, loss[loss=0.1721, ctc_loss=0.1085, cr_loss=0.3182, over 17069.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1237, cr_loss=0.3406, over 3364804.35 frames. ], batch size: 46, lr: 3.11e-03, grad_scale: 32.0 2024-09-25 06:55:57,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=687880.6666666666, ans=0.0 2024-09-25 06:56:17,749 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.275e+02 1.365e+02 1.473e+02 2.154e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-25 06:56:34,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=687974.0, ans=22.5 2024-09-25 06:56:35,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=687974.0, ans=0.125 2024-09-25 06:56:57,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=688020.6666666666, ans=0.125 2024-09-25 06:57:05,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=688067.3333333334, ans=0.2 2024-09-25 06:57:11,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=688067.3333333334, ans=10.0 2024-09-25 06:57:16,215 INFO [train.py:1198] (0/4) Epoch 38, batch 3300, loss[loss=0.1862, ctc_loss=0.1205, cr_loss=0.3285, over 17021.00 frames. ], tot_loss[loss=0.1936, ctc_loss=0.1251, cr_loss=0.3426, over 3349499.16 frames. ], batch size: 44, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 06:57:16,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=688114.0, ans=10.0 2024-09-25 06:57:24,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=688114.0, ans=0.125 2024-09-25 06:57:30,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=688160.6666666666, ans=0.5 2024-09-25 06:57:32,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=688160.6666666666, ans=0.125 2024-09-25 06:57:40,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-09-25 06:58:06,326 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 06:58:23,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=688300.6666666666, ans=0.125 2024-09-25 06:58:27,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=688300.6666666666, ans=0.125 2024-09-25 06:58:33,937 INFO [train.py:1198] (0/4) Epoch 38, batch 3350, loss[loss=0.1477, ctc_loss=0.09438, cr_loss=0.2667, over 17108.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1246, cr_loss=0.3419, over 3351398.86 frames. ], batch size: 40, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 06:58:55,605 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.280e+02 1.359e+02 1.507e+02 2.410e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-25 06:59:24,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=688487.3333333334, ans=0.0 2024-09-25 06:59:38,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=688534.0, ans=0.0 2024-09-25 06:59:56,004 INFO [train.py:1198] (0/4) Epoch 38, batch 3400, loss[loss=0.1507, ctc_loss=0.09392, cr_loss=0.2839, over 16288.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1241, cr_loss=0.3413, over 3357087.03 frames. ], batch size: 36, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:00:09,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-09-25 07:00:27,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=688674.0, ans=0.0 2024-09-25 07:00:37,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=688674.0, ans=0.125 2024-09-25 07:00:44,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=688720.6666666666, ans=0.125 2024-09-25 07:00:48,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=688720.6666666666, ans=0.125 2024-09-25 07:00:58,261 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.38 vs. limit=22.5 2024-09-25 07:01:16,601 INFO [train.py:1198] (0/4) Epoch 38, batch 3450, loss[loss=0.1776, ctc_loss=0.1132, cr_loss=0.322, over 17361.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1245, cr_loss=0.3424, over 3358502.97 frames. ], batch size: 48, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:01:18,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=688814.0, ans=0.2 2024-09-25 07:01:40,121 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.283e+02 1.390e+02 1.486e+02 2.473e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-25 07:02:08,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=688954.0, ans=0.125 2024-09-25 07:02:35,141 INFO [train.py:1198] (0/4) Epoch 38, batch 3500, loss[loss=0.2185, ctc_loss=0.1439, cr_loss=0.373, over 16496.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1245, cr_loss=0.3418, over 3361121.93 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 8.0 2024-09-25 07:02:52,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=689094.0, ans=0.0 2024-09-25 07:03:19,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=689140.6666666666, ans=0.0 2024-09-25 07:03:39,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=689234.0, ans=0.1 2024-09-25 07:03:53,032 INFO [train.py:1198] (0/4) Epoch 38, batch 3550, loss[loss=0.1946, ctc_loss=0.1278, cr_loss=0.3343, over 17361.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1243, cr_loss=0.3414, over 3363655.83 frames. ], batch size: 48, lr: 3.11e-03, grad_scale: 8.0 2024-09-25 07:04:10,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=689327.3333333334, ans=0.125 2024-09-25 07:04:12,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=689327.3333333334, ans=0.0 2024-09-25 07:04:12,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=689327.3333333334, ans=0.1 2024-09-25 07:04:16,680 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.303e+02 1.383e+02 1.456e+02 2.390e+02, threshold=2.765e+02, percent-clipped=0.0 2024-09-25 07:04:18,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=689327.3333333334, ans=0.1 2024-09-25 07:04:23,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=689374.0, ans=0.025 2024-09-25 07:04:46,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=689420.6666666666, ans=0.125 2024-09-25 07:04:59,304 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-09-25 07:05:11,165 INFO [train.py:1198] (0/4) Epoch 38, batch 3600, loss[loss=0.2061, ctc_loss=0.1343, cr_loss=0.359, over 17099.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1241, cr_loss=0.3412, over 3367393.37 frames. ], batch size: 49, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:05:13,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=689514.0, ans=0.125 2024-09-25 07:05:20,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=689514.0, ans=10.0 2024-09-25 07:05:23,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=689514.0, ans=0.1 2024-09-25 07:05:36,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=689560.6666666666, ans=0.125 2024-09-25 07:05:42,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=689607.3333333334, ans=0.0 2024-09-25 07:05:56,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=689654.0, ans=0.125 2024-09-25 07:06:03,904 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-25 07:06:05,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.54 vs. limit=10.0 2024-09-25 07:06:11,446 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2024-09-25 07:06:29,883 INFO [train.py:1198] (0/4) Epoch 38, batch 3650, loss[loss=0.2115, ctc_loss=0.1353, cr_loss=0.3808, over 16473.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3404, over 3373765.62 frames. ], batch size: 66, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:06:31,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=689747.3333333334, ans=0.2 2024-09-25 07:06:55,350 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.248e+02 1.330e+02 1.434e+02 2.019e+02, threshold=2.659e+02, percent-clipped=0.0 2024-09-25 07:06:58,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689794.0, ans=0.1 2024-09-25 07:07:14,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=689840.6666666666, ans=0.125 2024-09-25 07:07:37,146 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-09-25 07:07:39,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=689934.0, ans=0.125 2024-09-25 07:07:50,476 INFO [train.py:1198] (0/4) Epoch 38, batch 3700, loss[loss=0.2004, ctc_loss=0.1289, cr_loss=0.3571, over 17278.00 frames. ], tot_loss[loss=0.1938, ctc_loss=0.1251, cr_loss=0.3434, over 3375334.44 frames. ], batch size: 46, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:08:13,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=690027.3333333334, ans=0.1 2024-09-25 07:08:30,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=690074.0, ans=0.2 2024-09-25 07:09:01,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=690167.3333333334, ans=0.125 2024-09-25 07:09:08,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=690167.3333333334, ans=0.125 2024-09-25 07:09:10,775 INFO [train.py:1198] (0/4) Epoch 38, batch 3750, loss[loss=0.1628, ctc_loss=0.1048, cr_loss=0.2897, over 16947.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1249, cr_loss=0.3426, over 3367792.74 frames. ], batch size: 42, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:09:13,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2024-09-25 07:09:23,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=690214.0, ans=0.125 2024-09-25 07:09:24,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2024-09-25 07:09:34,561 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.279e+02 1.362e+02 1.449e+02 2.293e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 07:09:46,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=690307.3333333334, ans=0.125 2024-09-25 07:10:22,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2024-09-25 07:10:31,011 INFO [train.py:1198] (0/4) Epoch 38, batch 3800, loss[loss=0.1424, ctc_loss=0.08708, cr_loss=0.2767, over 16234.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1249, cr_loss=0.3419, over 3348846.33 frames. ], batch size: 36, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:10:34,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=690447.3333333334, ans=0.025 2024-09-25 07:10:49,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=690494.0, ans=0.0 2024-09-25 07:10:54,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=690494.0, ans=0.0 2024-09-25 07:10:54,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=690494.0, ans=0.125 2024-09-25 07:10:58,157 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2024-09-25 07:11:07,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=690540.6666666666, ans=0.0 2024-09-25 07:11:08,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=690540.6666666666, ans=0.09899494936611666 2024-09-25 07:11:18,578 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-25 07:11:30,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=690587.3333333334, ans=0.0 2024-09-25 07:11:44,312 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-148000.pt 2024-09-25 07:11:48,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=690634.0, ans=0.0 2024-09-25 07:11:52,450 INFO [train.py:1198] (0/4) Epoch 38, batch 3850, loss[loss=0.2376, ctc_loss=0.1562, cr_loss=0.4071, over 15083.00 frames. ], tot_loss[loss=0.197, ctc_loss=0.128, cr_loss=0.3452, over 3243144.54 frames. ], batch size: 89, lr: 3.11e-03, grad_scale: 16.0 2024-09-25 07:11:54,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=690680.6666666666, ans=0.125 2024-09-25 07:11:57,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690680.6666666666, ans=0.1 2024-09-25 07:12:03,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=690680.6666666666, ans=0.035 2024-09-25 07:12:05,462 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=12.0 2024-09-25 07:12:08,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=690727.3333333334, ans=0.125 2024-09-25 07:12:09,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=690727.3333333334, ans=0.125 2024-09-25 07:12:15,185 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.374e+02 1.486e+02 1.640e+02 2.118e+02, threshold=2.972e+02, percent-clipped=0.0 2024-09-25 07:12:38,533 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=15.0 2024-09-25 07:12:59,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=22.5 2024-09-25 07:13:02,178 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-38.pt 2024-09-25 07:13:50,140 INFO [train.py:1198] (0/4) Epoch 39, batch 0, loss[loss=0.1905, ctc_loss=0.1239, cr_loss=0.3329, over 17156.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1239, cr_loss=0.3329, over 17156.00 frames. ], batch size: 45, lr: 3.07e-03, grad_scale: 32.0 2024-09-25 07:13:50,141 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 07:13:58,152 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([5.1336, 3.9852, 4.8968, 4.6575], device='cuda:0') 2024-09-25 07:14:06,124 INFO [train.py:1230] (0/4) Epoch 39, validation: loss=0.03529, ctc_loss=0.03529, cr_loss=1.033e-14, over 944034.00 frames. 2024-09-25 07:14:06,124 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 07:14:25,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=690942.0, ans=0.0 2024-09-25 07:14:25,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=690942.0, ans=0.2 2024-09-25 07:14:30,523 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.27 vs. limit=10.0 2024-09-25 07:14:49,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=690988.6666666666, ans=0.2 2024-09-25 07:14:57,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=691035.3333333334, ans=0.09899494936611666 2024-09-25 07:15:03,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=691035.3333333334, ans=0.0 2024-09-25 07:15:27,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=691128.6666666666, ans=0.125 2024-09-25 07:15:28,981 INFO [train.py:1198] (0/4) Epoch 39, batch 50, loss[loss=0.1772, ctc_loss=0.1138, cr_loss=0.317, over 17063.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3396, over 757308.23 frames. ], batch size: 46, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:15:48,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=691175.3333333334, ans=0.025 2024-09-25 07:15:59,585 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.081e+02 1.279e+02 1.417e+02 1.630e+02 3.403e+02, threshold=2.834e+02, percent-clipped=1.0 2024-09-25 07:16:23,928 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=12.0 2024-09-25 07:16:34,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=691315.3333333334, ans=0.125 2024-09-25 07:16:44,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=691315.3333333334, ans=0.125 2024-09-25 07:16:52,228 INFO [train.py:1198] (0/4) Epoch 39, batch 100, loss[loss=0.1956, ctc_loss=0.1249, cr_loss=0.3539, over 17210.00 frames. ], tot_loss[loss=0.1964, ctc_loss=0.1271, cr_loss=0.3465, over 1326780.64 frames. ], batch size: 50, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:16:58,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=691362.0, ans=10.0 2024-09-25 07:17:35,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=691455.3333333334, ans=0.125 2024-09-25 07:17:43,976 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:18:12,698 INFO [train.py:1198] (0/4) Epoch 39, batch 150, loss[loss=0.1518, ctc_loss=0.09293, cr_loss=0.2942, over 17094.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1245, cr_loss=0.3425, over 1778532.23 frames. ], batch size: 43, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:18:17,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=691595.3333333334, ans=0.0 2024-09-25 07:18:45,368 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.277e+02 1.390e+02 1.504e+02 2.454e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 07:19:04,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=691735.3333333334, ans=0.125 2024-09-25 07:19:40,579 INFO [train.py:1198] (0/4) Epoch 39, batch 200, loss[loss=0.2028, ctc_loss=0.1323, cr_loss=0.3522, over 17359.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3405, over 2133440.25 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:20:12,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=691922.0, ans=0.125 2024-09-25 07:20:22,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=691922.0, ans=0.125 2024-09-25 07:20:24,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=691922.0, ans=0.0 2024-09-25 07:20:30,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=691968.6666666666, ans=0.125 2024-09-25 07:20:35,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=691968.6666666666, ans=0.2 2024-09-25 07:21:00,250 INFO [train.py:1198] (0/4) Epoch 39, batch 250, loss[loss=0.188, ctc_loss=0.1204, cr_loss=0.3384, over 17030.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1247, cr_loss=0.3423, over 2396007.59 frames. ], batch size: 44, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:21:00,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=692062.0, ans=0.125 2024-09-25 07:21:05,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=692062.0, ans=0.0 2024-09-25 07:21:33,649 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.285e+02 1.349e+02 1.463e+02 2.685e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-25 07:21:43,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=692155.3333333334, ans=0.125 2024-09-25 07:21:45,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=12.0 2024-09-25 07:21:55,147 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=22.5 2024-09-25 07:22:02,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=692202.0, ans=0.0 2024-09-25 07:22:08,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=692248.6666666666, ans=0.125 2024-09-25 07:22:13,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=692248.6666666666, ans=0.125 2024-09-25 07:22:16,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.58 vs. limit=10.0 2024-09-25 07:22:17,738 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-25 07:22:20,371 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2024-09-25 07:22:21,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=692295.3333333334, ans=0.125 2024-09-25 07:22:21,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=692295.3333333334, ans=0.04949747468305833 2024-09-25 07:22:22,921 INFO [train.py:1198] (0/4) Epoch 39, batch 300, loss[loss=0.2195, ctc_loss=0.1459, cr_loss=0.3678, over 16184.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1235, cr_loss=0.34, over 2612781.33 frames. ], batch size: 74, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:22:54,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2024-09-25 07:22:59,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=692388.6666666666, ans=0.0 2024-09-25 07:23:22,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=692435.3333333334, ans=0.0 2024-09-25 07:23:39,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=692482.0, ans=0.0 2024-09-25 07:23:46,205 INFO [train.py:1198] (0/4) Epoch 39, batch 350, loss[loss=0.1702, ctc_loss=0.1082, cr_loss=0.3098, over 17300.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.123, cr_loss=0.3395, over 2781562.87 frames. ], batch size: 46, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:23:48,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=692528.6666666666, ans=0.125 2024-09-25 07:24:01,744 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-09-25 07:24:12,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=692575.3333333334, ans=0.125 2024-09-25 07:24:15,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-25 07:24:21,675 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.269e+02 1.341e+02 1.429e+02 1.987e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-25 07:24:25,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=692622.0, ans=0.125 2024-09-25 07:24:28,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=692622.0, ans=0.125 2024-09-25 07:24:36,835 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2024-09-25 07:24:45,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=692668.6666666666, ans=0.125 2024-09-25 07:25:10,802 INFO [train.py:1198] (0/4) Epoch 39, batch 400, loss[loss=0.2342, ctc_loss=0.1536, cr_loss=0.4032, over 16129.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.3417, over 2913657.66 frames. ], batch size: 74, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:25:25,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=692808.6666666666, ans=0.025 2024-09-25 07:25:28,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=692808.6666666666, ans=0.125 2024-09-25 07:26:32,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=692995.3333333334, ans=0.2 2024-09-25 07:26:33,629 INFO [train.py:1198] (0/4) Epoch 39, batch 450, loss[loss=0.2157, ctc_loss=0.142, cr_loss=0.3684, over 17207.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1237, cr_loss=0.3412, over 3013724.36 frames. ], batch size: 55, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:26:37,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=692995.3333333334, ans=0.125 2024-09-25 07:26:57,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=693042.0, ans=0.0 2024-09-25 07:27:01,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.53 vs. limit=6.0 2024-09-25 07:27:03,899 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.286e+02 1.365e+02 1.440e+02 1.919e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-25 07:27:07,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=693088.6666666666, ans=0.125 2024-09-25 07:27:53,310 INFO [train.py:1198] (0/4) Epoch 39, batch 500, loss[loss=0.1775, ctc_loss=0.1133, cr_loss=0.3211, over 17169.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.123, cr_loss=0.3397, over 3102545.12 frames. ], batch size: 45, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:27:58,729 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:28:11,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=22.5 2024-09-25 07:28:13,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693275.3333333334, ans=0.1 2024-09-25 07:28:58,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=693368.6666666666, ans=0.125 2024-09-25 07:29:04,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=693415.3333333334, ans=0.0 2024-09-25 07:29:21,848 INFO [train.py:1198] (0/4) Epoch 39, batch 550, loss[loss=0.2006, ctc_loss=0.129, cr_loss=0.3575, over 17005.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3399, over 3152971.08 frames. ], batch size: 52, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:29:25,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=693462.0, ans=0.125 2024-09-25 07:29:38,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=693508.6666666666, ans=0.0 2024-09-25 07:29:41,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693508.6666666666, ans=0.1 2024-09-25 07:29:49,472 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=693508.6666666666, ans=0.125 2024-09-25 07:29:52,389 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.262e+02 1.347e+02 1.440e+02 2.072e+02, threshold=2.694e+02, percent-clipped=0.0 2024-09-25 07:30:05,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=693555.3333333334, ans=0.5 2024-09-25 07:30:13,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=693602.0, ans=0.125 2024-09-25 07:30:36,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=693648.6666666666, ans=0.125 2024-09-25 07:30:42,149 INFO [train.py:1198] (0/4) Epoch 39, batch 600, loss[loss=0.2169, ctc_loss=0.1365, cr_loss=0.4021, over 17138.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3398, over 3205745.50 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:31:06,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=693742.0, ans=0.125 2024-09-25 07:31:10,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-25 07:31:14,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=693788.6666666666, ans=0.025 2024-09-25 07:31:15,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2024-09-25 07:31:25,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=693788.6666666666, ans=0.2 2024-09-25 07:31:29,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2024-09-25 07:31:34,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=693835.3333333334, ans=0.0 2024-09-25 07:31:55,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=693882.0, ans=0.125 2024-09-25 07:32:04,879 INFO [train.py:1198] (0/4) Epoch 39, batch 650, loss[loss=0.1945, ctc_loss=0.1243, cr_loss=0.351, over 16781.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1229, cr_loss=0.3396, over 3242190.81 frames. ], batch size: 61, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:32:05,214 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:32:35,162 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.253e+02 1.371e+02 1.476e+02 2.121e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-25 07:32:46,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694022.0, ans=0.1 2024-09-25 07:33:20,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-09-25 07:33:24,772 INFO [train.py:1198] (0/4) Epoch 39, batch 700, loss[loss=0.1892, ctc_loss=0.1185, cr_loss=0.3535, over 17033.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3405, over 3270632.40 frames. ], batch size: 51, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:33:42,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=694208.6666666666, ans=0.0 2024-09-25 07:34:29,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-25 07:34:49,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=694348.6666666666, ans=0.125 2024-09-25 07:34:51,254 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:34:51,766 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=15.0 2024-09-25 07:34:52,908 INFO [train.py:1198] (0/4) Epoch 39, batch 750, loss[loss=0.2067, ctc_loss=0.1332, cr_loss=0.3674, over 17353.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1238, cr_loss=0.3402, over 3291979.15 frames. ], batch size: 48, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:35:01,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=694395.3333333334, ans=0.125 2024-09-25 07:35:06,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=694395.3333333334, ans=0.2 2024-09-25 07:35:23,323 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.271e+02 1.368e+02 1.469e+02 1.814e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 07:35:28,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=694488.6666666666, ans=0.04949747468305833 2024-09-25 07:35:30,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2024-09-25 07:35:35,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=694488.6666666666, ans=0.125 2024-09-25 07:35:47,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=694535.3333333334, ans=0.125 2024-09-25 07:35:54,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=694535.3333333334, ans=0.05 2024-09-25 07:36:13,109 INFO [train.py:1198] (0/4) Epoch 39, batch 800, loss[loss=0.2083, ctc_loss=0.1358, cr_loss=0.3625, over 14965.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1232, cr_loss=0.3387, over 3307118.60 frames. ], batch size: 89, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:36:25,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=694628.6666666666, ans=0.125 2024-09-25 07:36:28,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=694628.6666666666, ans=0.0 2024-09-25 07:37:05,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694768.6666666666, ans=0.1 2024-09-25 07:37:30,289 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=22.5 2024-09-25 07:37:35,918 INFO [train.py:1198] (0/4) Epoch 39, batch 850, loss[loss=0.1616, ctc_loss=0.102, cr_loss=0.2983, over 17021.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1233, cr_loss=0.3389, over 3318876.57 frames. ], batch size: 44, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:38:02,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-09-25 07:38:06,361 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.258e+02 1.327e+02 1.456e+02 1.924e+02, threshold=2.653e+02, percent-clipped=0.0 2024-09-25 07:39:00,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=695095.3333333334, ans=0.125 2024-09-25 07:39:01,739 INFO [train.py:1198] (0/4) Epoch 39, batch 900, loss[loss=0.1579, ctc_loss=0.1005, cr_loss=0.2867, over 16963.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1237, cr_loss=0.3395, over 3323358.05 frames. ], batch size: 42, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:39:13,307 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-09-25 07:39:19,321 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.54 vs. limit=10.0 2024-09-25 07:39:22,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=695142.0, ans=0.0 2024-09-25 07:39:22,893 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2024-09-25 07:39:33,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=695142.0, ans=0.125 2024-09-25 07:39:42,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=695188.6666666666, ans=0.0 2024-09-25 07:39:52,993 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=15.0 2024-09-25 07:39:59,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=695235.3333333334, ans=0.2 2024-09-25 07:40:04,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=695235.3333333334, ans=0.2 2024-09-25 07:40:04,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=695235.3333333334, ans=0.0 2024-09-25 07:40:15,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=695282.0, ans=0.07 2024-09-25 07:40:24,755 INFO [train.py:1198] (0/4) Epoch 39, batch 950, loss[loss=0.1942, ctc_loss=0.1251, cr_loss=0.3456, over 17008.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1238, cr_loss=0.3402, over 3328024.73 frames. ], batch size: 44, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:40:42,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695375.3333333334, ans=0.1 2024-09-25 07:40:55,205 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.302e+02 1.382e+02 1.480e+02 1.852e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-25 07:40:55,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=695422.0, ans=0.0 2024-09-25 07:41:47,517 INFO [train.py:1198] (0/4) Epoch 39, batch 1000, loss[loss=0.2039, ctc_loss=0.1314, cr_loss=0.3627, over 16993.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1241, cr_loss=0.3406, over 3339346.99 frames. ], batch size: 53, lr: 3.06e-03, grad_scale: 32.0 2024-09-25 07:41:54,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=695562.0, ans=0.125 2024-09-25 07:41:59,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=695562.0, ans=0.025 2024-09-25 07:42:10,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=695608.6666666666, ans=0.0 2024-09-25 07:42:12,394 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-09-25 07:42:30,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=695655.3333333334, ans=0.0 2024-09-25 07:43:00,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=695748.6666666666, ans=0.125 2024-09-25 07:43:07,518 INFO [train.py:1198] (0/4) Epoch 39, batch 1050, loss[loss=0.2308, ctc_loss=0.1509, cr_loss=0.3994, over 16512.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.125, cr_loss=0.3425, over 3349375.32 frames. ], batch size: 66, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 07:43:10,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=695795.3333333334, ans=0.2 2024-09-25 07:43:23,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=695842.0, ans=0.0 2024-09-25 07:43:40,129 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.305e+02 1.388e+02 1.496e+02 1.693e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-25 07:43:43,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=695888.6666666666, ans=0.0 2024-09-25 07:43:58,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=695935.3333333334, ans=0.0 2024-09-25 07:44:16,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-09-25 07:44:22,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=695982.0, ans=0.125 2024-09-25 07:44:25,547 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=695982.0, ans=0.0 2024-09-25 07:44:26,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=695982.0, ans=0.0 2024-09-25 07:44:34,847 INFO [train.py:1198] (0/4) Epoch 39, batch 1100, loss[loss=0.1769, ctc_loss=0.1144, cr_loss=0.3126, over 17215.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1248, cr_loss=0.342, over 3346314.15 frames. ], batch size: 47, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:44:35,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=696028.6666666666, ans=0.0 2024-09-25 07:44:38,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=696028.6666666666, ans=0.125 2024-09-25 07:44:52,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=696075.3333333334, ans=0.2 2024-09-25 07:44:57,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696075.3333333334, ans=0.125 2024-09-25 07:45:02,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=696075.3333333334, ans=0.125 2024-09-25 07:45:20,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=22.5 2024-09-25 07:45:21,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=696168.6666666666, ans=0.2 2024-09-25 07:45:36,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=696168.6666666666, ans=0.025 2024-09-25 07:45:54,901 INFO [train.py:1198] (0/4) Epoch 39, batch 1150, loss[loss=0.1879, ctc_loss=0.1214, cr_loss=0.3326, over 16589.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1243, cr_loss=0.3405, over 3344160.59 frames. ], batch size: 66, lr: 3.05e-03, grad_scale: 8.0 2024-09-25 07:46:03,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696262.0, ans=0.1 2024-09-25 07:46:30,826 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.295e+02 1.353e+02 1.453e+02 1.736e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-25 07:46:39,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-09-25 07:46:54,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=696402.0, ans=0.0 2024-09-25 07:47:14,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=696448.6666666666, ans=0.0 2024-09-25 07:47:16,821 INFO [train.py:1198] (0/4) Epoch 39, batch 1200, loss[loss=0.1721, ctc_loss=0.1085, cr_loss=0.3183, over 17243.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1245, cr_loss=0.3417, over 3347137.32 frames. ], batch size: 44, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:47:17,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=696495.3333333334, ans=0.025 2024-09-25 07:47:47,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=696588.6666666666, ans=0.2 2024-09-25 07:47:49,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-09-25 07:48:04,001 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.43 vs. limit=10.0 2024-09-25 07:48:08,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=12.0 2024-09-25 07:48:13,435 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=22.5 2024-09-25 07:48:19,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=696682.0, ans=0.2 2024-09-25 07:48:33,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=696682.0, ans=0.2 2024-09-25 07:48:38,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696728.6666666666, ans=0.125 2024-09-25 07:48:39,498 INFO [train.py:1198] (0/4) Epoch 39, batch 1250, loss[loss=0.1942, ctc_loss=0.1242, cr_loss=0.3503, over 17017.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1246, cr_loss=0.3418, over 3360640.13 frames. ], batch size: 51, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:48:53,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=696728.6666666666, ans=0.025 2024-09-25 07:49:17,453 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.275e+02 1.351e+02 1.497e+02 2.020e+02, threshold=2.702e+02, percent-clipped=0.0 2024-09-25 07:49:22,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=696822.0, ans=0.125 2024-09-25 07:49:28,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=696822.0, ans=0.0 2024-09-25 07:49:49,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=696915.3333333334, ans=0.0 2024-09-25 07:49:56,707 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=22.5 2024-09-25 07:50:00,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=696915.3333333334, ans=0.2 2024-09-25 07:50:03,595 INFO [train.py:1198] (0/4) Epoch 39, batch 1300, loss[loss=0.1623, ctc_loss=0.1036, cr_loss=0.2931, over 16952.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1239, cr_loss=0.3407, over 3366767.25 frames. ], batch size: 42, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:50:20,195 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2024-09-25 07:50:21,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=697008.6666666666, ans=0.125 2024-09-25 07:50:32,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=697008.6666666666, ans=0.125 2024-09-25 07:50:55,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2024-09-25 07:50:58,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=697102.0, ans=0.0 2024-09-25 07:51:04,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=697102.0, ans=0.125 2024-09-25 07:51:26,691 INFO [train.py:1198] (0/4) Epoch 39, batch 1350, loss[loss=0.2445, ctc_loss=0.16, cr_loss=0.4225, over 17068.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.3408, over 3375418.70 frames. ], batch size: 52, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:52:00,579 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.264e+02 1.325e+02 1.430e+02 2.601e+02, threshold=2.650e+02, percent-clipped=0.0 2024-09-25 07:52:01,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=697288.6666666666, ans=0.125 2024-09-25 07:52:05,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=697288.6666666666, ans=0.0 2024-09-25 07:52:30,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=697382.0, ans=0.125 2024-09-25 07:52:34,747 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 07:52:47,383 INFO [train.py:1198] (0/4) Epoch 39, batch 1400, loss[loss=0.1874, ctc_loss=0.1223, cr_loss=0.3253, over 17297.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.3416, over 3371958.54 frames. ], batch size: 49, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:53:18,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-09-25 07:53:22,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.83 vs. limit=10.0 2024-09-25 07:54:15,112 INFO [train.py:1198] (0/4) Epoch 39, batch 1450, loss[loss=0.2103, ctc_loss=0.1369, cr_loss=0.3669, over 17009.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1242, cr_loss=0.3423, over 3364294.75 frames. ], batch size: 51, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:54:26,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.56 vs. limit=6.0 2024-09-25 07:54:33,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2024-09-25 07:54:36,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=697708.6666666666, ans=0.0 2024-09-25 07:54:48,363 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.285e+02 1.356e+02 1.493e+02 2.927e+02, threshold=2.712e+02, percent-clipped=2.0 2024-09-25 07:54:55,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=697755.3333333334, ans=0.125 2024-09-25 07:54:59,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=697755.3333333334, ans=0.0 2024-09-25 07:55:26,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=697848.6666666666, ans=0.0 2024-09-25 07:55:34,691 INFO [train.py:1198] (0/4) Epoch 39, batch 1500, loss[loss=0.1878, ctc_loss=0.12, cr_loss=0.339, over 16950.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1244, cr_loss=0.3423, over 3363133.75 frames. ], batch size: 42, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:56:14,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=697988.6666666666, ans=0.125 2024-09-25 07:56:15,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=697988.6666666666, ans=0.0 2024-09-25 07:56:57,359 INFO [train.py:1198] (0/4) Epoch 39, batch 1550, loss[loss=0.1998, ctc_loss=0.129, cr_loss=0.3538, over 16000.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.124, cr_loss=0.342, over 3359272.14 frames. ], batch size: 74, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 07:57:13,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=698175.3333333334, ans=0.0 2024-09-25 07:57:30,847 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.291e+02 1.368e+02 1.486e+02 2.492e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 07:58:04,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=698315.3333333334, ans=0.025 2024-09-25 07:58:09,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=698315.3333333334, ans=0.0 2024-09-25 07:58:17,138 INFO [train.py:1198] (0/4) Epoch 39, batch 1600, loss[loss=0.1556, ctc_loss=0.1015, cr_loss=0.2706, over 16938.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3412, over 3362282.96 frames. ], batch size: 42, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 07:59:03,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=698455.3333333334, ans=0.025 2024-09-25 07:59:44,318 INFO [train.py:1198] (0/4) Epoch 39, batch 1650, loss[loss=0.1706, ctc_loss=0.107, cr_loss=0.3178, over 17304.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1228, cr_loss=0.3393, over 3365766.35 frames. ], batch size: 46, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 08:00:17,963 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.058e+02 1.287e+02 1.365e+02 1.434e+02 2.064e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-25 08:00:19,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=698688.6666666666, ans=0.125 2024-09-25 08:00:19,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=698688.6666666666, ans=0.025 2024-09-25 08:00:26,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=12.0 2024-09-25 08:01:03,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=698828.6666666666, ans=0.2 2024-09-25 08:01:04,263 INFO [train.py:1198] (0/4) Epoch 39, batch 1700, loss[loss=0.1708, ctc_loss=0.1068, cr_loss=0.3198, over 16329.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1238, cr_loss=0.3416, over 3361827.10 frames. ], batch size: 36, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 08:01:11,102 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:01:56,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=698968.6666666666, ans=0.2 2024-09-25 08:02:05,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=698968.6666666666, ans=0.07 2024-09-25 08:02:06,102 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.74 vs. limit=10.0 2024-09-25 08:02:26,321 INFO [train.py:1198] (0/4) Epoch 39, batch 1750, loss[loss=0.1708, ctc_loss=0.1095, cr_loss=0.3063, over 17280.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1235, cr_loss=0.3412, over 3366298.11 frames. ], batch size: 44, lr: 3.05e-03, grad_scale: 32.0 2024-09-25 08:02:26,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=699062.0, ans=0.125 2024-09-25 08:02:32,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=699062.0, ans=0.2 2024-09-25 08:02:45,945 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=22.5 2024-09-25 08:02:50,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-09-25 08:02:52,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2024-09-25 08:02:52,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.85 vs. limit=10.0 2024-09-25 08:03:01,043 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.281e+02 1.369e+02 1.457e+02 2.012e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 08:03:04,809 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-25 08:03:20,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=699202.0, ans=0.0 2024-09-25 08:03:43,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=699248.6666666666, ans=0.2 2024-09-25 08:03:53,694 INFO [train.py:1198] (0/4) Epoch 39, batch 1800, loss[loss=0.1902, ctc_loss=0.1228, cr_loss=0.3369, over 17150.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1239, cr_loss=0.3417, over 3356864.25 frames. ], batch size: 48, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 08:04:08,612 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-25 08:04:09,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=699342.0, ans=0.025 2024-09-25 08:04:49,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=699435.3333333334, ans=0.2 2024-09-25 08:05:12,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699528.6666666666, ans=0.1 2024-09-25 08:05:13,795 INFO [train.py:1198] (0/4) Epoch 39, batch 1850, loss[loss=0.1645, ctc_loss=0.1044, cr_loss=0.3003, over 17098.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1229, cr_loss=0.3404, over 3363284.44 frames. ], batch size: 40, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 08:05:48,763 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.097e+02 1.264e+02 1.367e+02 1.511e+02 2.303e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 08:06:06,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=699668.6666666666, ans=0.2 2024-09-25 08:06:06,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=699668.6666666666, ans=0.125 2024-09-25 08:06:25,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=699715.3333333334, ans=0.1 2024-09-25 08:06:27,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=699715.3333333334, ans=0.025 2024-09-25 08:06:36,644 INFO [train.py:1198] (0/4) Epoch 39, batch 1900, loss[loss=0.2017, ctc_loss=0.1314, cr_loss=0.3519, over 16986.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1232, cr_loss=0.341, over 3358974.83 frames. ], batch size: 53, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 08:06:51,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=699808.6666666666, ans=0.125 2024-09-25 08:06:51,828 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.93 vs. limit=22.5 2024-09-25 08:07:05,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699808.6666666666, ans=0.1 2024-09-25 08:07:07,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=699855.3333333334, ans=0.0 2024-09-25 08:07:10,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=699855.3333333334, ans=15.0 2024-09-25 08:07:14,224 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=12.0 2024-09-25 08:07:16,739 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:07:50,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=699948.6666666666, ans=0.125 2024-09-25 08:07:57,351 INFO [train.py:1198] (0/4) Epoch 39, batch 1950, loss[loss=0.2223, ctc_loss=0.1457, cr_loss=0.3828, over 15036.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3405, over 3350142.15 frames. ], batch size: 89, lr: 3.05e-03, grad_scale: 16.0 2024-09-25 08:07:59,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2024-09-25 08:08:04,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-25 08:08:36,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=700088.6666666666, ans=0.0 2024-09-25 08:08:37,873 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.289e+02 1.367e+02 1.538e+02 1.984e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-25 08:08:43,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=700088.6666666666, ans=0.0 2024-09-25 08:08:47,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=700088.6666666666, ans=0.125 2024-09-25 08:08:57,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2024-09-25 08:09:08,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=700182.0, ans=0.04949747468305833 2024-09-25 08:09:25,322 INFO [train.py:1198] (0/4) Epoch 39, batch 2000, loss[loss=0.2156, ctc_loss=0.1418, cr_loss=0.3689, over 17303.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1238, cr_loss=0.3415, over 3340361.17 frames. ], batch size: 51, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:10:02,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=700322.0, ans=0.2 2024-09-25 08:10:05,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=700322.0, ans=0.2 2024-09-25 08:10:17,205 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.41 vs. limit=10.0 2024-09-25 08:10:20,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=700368.6666666666, ans=0.125 2024-09-25 08:10:27,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700415.3333333334, ans=0.1 2024-09-25 08:10:34,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=700415.3333333334, ans=0.0 2024-09-25 08:10:39,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-09-25 08:10:45,498 INFO [train.py:1198] (0/4) Epoch 39, batch 2050, loss[loss=0.1472, ctc_loss=0.09246, cr_loss=0.2736, over 17195.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1234, cr_loss=0.34, over 3346384.16 frames. ], batch size: 41, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:11:22,829 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.63 vs. limit=10.0 2024-09-25 08:11:25,132 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.300e+02 1.377e+02 1.457e+02 2.836e+02, threshold=2.753e+02, percent-clipped=1.0 2024-09-25 08:11:27,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=700555.3333333334, ans=0.0 2024-09-25 08:11:35,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=700602.0, ans=0.07 2024-09-25 08:11:43,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=700602.0, ans=0.0 2024-09-25 08:11:43,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2024-09-25 08:11:55,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=700648.6666666666, ans=0.2 2024-09-25 08:12:02,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=700648.6666666666, ans=0.125 2024-09-25 08:12:08,230 INFO [train.py:1198] (0/4) Epoch 39, batch 2100, loss[loss=0.183, ctc_loss=0.117, cr_loss=0.3301, over 17090.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1233, cr_loss=0.34, over 3355961.78 frames. ], batch size: 43, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:12:21,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-25 08:12:29,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=700742.0, ans=0.125 2024-09-25 08:12:57,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700835.3333333334, ans=0.1 2024-09-25 08:12:57,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=700835.3333333334, ans=0.125 2024-09-25 08:13:14,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=700882.0, ans=0.125 2024-09-25 08:13:16,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=12.0 2024-09-25 08:13:30,841 INFO [train.py:1198] (0/4) Epoch 39, batch 2150, loss[loss=0.187, ctc_loss=0.1181, cr_loss=0.3445, over 16322.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1231, cr_loss=0.3391, over 3350904.84 frames. ], batch size: 36, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:13:47,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=700975.3333333334, ans=0.125 2024-09-25 08:14:01,126 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.33 vs. limit=10.0 2024-09-25 08:14:10,019 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.289e+02 1.363e+02 1.447e+02 2.047e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-25 08:14:10,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=701022.0, ans=0.125 2024-09-25 08:14:19,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=701068.6666666666, ans=0.125 2024-09-25 08:14:24,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=701068.6666666666, ans=0.125 2024-09-25 08:14:52,357 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2024-09-25 08:14:53,285 INFO [train.py:1198] (0/4) Epoch 39, batch 2200, loss[loss=0.194, ctc_loss=0.1264, cr_loss=0.3378, over 17322.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1233, cr_loss=0.3393, over 3355124.73 frames. ], batch size: 51, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:15:05,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.06 vs. limit=10.0 2024-09-25 08:15:26,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=701255.3333333334, ans=15.0 2024-09-25 08:15:42,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=701302.0, ans=12.0 2024-09-25 08:15:51,789 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2024-09-25 08:15:54,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=701302.0, ans=0.0 2024-09-25 08:16:16,048 INFO [train.py:1198] (0/4) Epoch 39, batch 2250, loss[loss=0.2065, ctc_loss=0.1333, cr_loss=0.3661, over 17244.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3403, over 3365290.91 frames. ], batch size: 50, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:16:18,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2024-09-25 08:16:24,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=701395.3333333334, ans=0.2 2024-09-25 08:16:37,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=701442.0, ans=0.2 2024-09-25 08:16:46,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=701488.6666666666, ans=0.125 2024-09-25 08:16:52,930 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.277e+02 1.348e+02 1.481e+02 2.538e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-25 08:17:09,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-09-25 08:17:35,954 INFO [train.py:1198] (0/4) Epoch 39, batch 2300, loss[loss=0.164, ctc_loss=0.1031, cr_loss=0.3047, over 17088.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3413, over 3368694.85 frames. ], batch size: 40, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:17:54,016 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=701675.3333333334, ans=0.2 2024-09-25 08:18:03,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=701675.3333333334, ans=0.2 2024-09-25 08:19:03,167 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-09-25 08:19:04,326 INFO [train.py:1198] (0/4) Epoch 39, batch 2350, loss[loss=0.1947, ctc_loss=0.1274, cr_loss=0.3365, over 17311.00 frames. ], tot_loss[loss=0.1925, ctc_loss=0.1241, cr_loss=0.342, over 3379381.17 frames. ], batch size: 51, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:19:15,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701862.0, ans=0.1 2024-09-25 08:19:17,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=701862.0, ans=0.0 2024-09-25 08:19:23,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=701908.6666666666, ans=0.07 2024-09-25 08:19:32,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=701908.6666666666, ans=0.125 2024-09-25 08:19:36,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=701955.3333333334, ans=0.125 2024-09-25 08:19:40,707 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.317e+02 1.394e+02 1.473e+02 1.777e+02, threshold=2.787e+02, percent-clipped=0.0 2024-09-25 08:19:42,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=701955.3333333334, ans=0.0 2024-09-25 08:19:57,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=22.5 2024-09-25 08:20:01,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=702002.0, ans=0.125 2024-09-25 08:20:19,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=702048.6666666666, ans=0.125 2024-09-25 08:20:23,822 INFO [train.py:1198] (0/4) Epoch 39, batch 2400, loss[loss=0.1554, ctc_loss=0.09744, cr_loss=0.2897, over 17283.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1244, cr_loss=0.343, over 3375349.65 frames. ], batch size: 46, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:20:38,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=702142.0, ans=0.2 2024-09-25 08:21:20,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=702235.3333333334, ans=0.1 2024-09-25 08:21:26,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2024-09-25 08:21:33,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=702282.0, ans=0.0 2024-09-25 08:21:40,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=8.0 2024-09-25 08:21:44,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=702328.6666666666, ans=0.0 2024-09-25 08:21:45,940 INFO [train.py:1198] (0/4) Epoch 39, batch 2450, loss[loss=0.2062, ctc_loss=0.1313, cr_loss=0.3748, over 17067.00 frames. ], tot_loss[loss=0.1934, ctc_loss=0.1248, cr_loss=0.3428, over 3345275.15 frames. ], batch size: 46, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:22:19,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=702422.0, ans=0.0 2024-09-25 08:22:24,052 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.006e+02 1.314e+02 1.404e+02 1.496e+02 1.831e+02, threshold=2.808e+02, percent-clipped=0.0 2024-09-25 08:22:24,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=702422.0, ans=0.125 2024-09-25 08:22:32,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702468.6666666666, ans=0.1 2024-09-25 08:22:34,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=702468.6666666666, ans=0.1 2024-09-25 08:22:38,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=702468.6666666666, ans=0.125 2024-09-25 08:22:40,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=702468.6666666666, ans=0.0 2024-09-25 08:22:46,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=702468.6666666666, ans=0.2 2024-09-25 08:22:50,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=12.0 2024-09-25 08:22:59,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=702515.3333333334, ans=0.0 2024-09-25 08:23:08,341 INFO [train.py:1198] (0/4) Epoch 39, batch 2500, loss[loss=0.1928, ctc_loss=0.1241, cr_loss=0.3436, over 17026.00 frames. ], tot_loss[loss=0.1939, ctc_loss=0.1252, cr_loss=0.3436, over 3341659.72 frames. ], batch size: 56, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:23:11,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=702562.0, ans=0.125 2024-09-25 08:23:17,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=702562.0, ans=0.125 2024-09-25 08:24:12,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=702702.0, ans=0.0 2024-09-25 08:24:33,891 INFO [train.py:1198] (0/4) Epoch 39, batch 2550, loss[loss=0.198, ctc_loss=0.1258, cr_loss=0.3608, over 17002.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1246, cr_loss=0.3424, over 3339822.56 frames. ], batch size: 56, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:24:42,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=702795.3333333334, ans=0.125 2024-09-25 08:24:45,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702795.3333333334, ans=0.1 2024-09-25 08:24:56,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=702842.0, ans=0.125 2024-09-25 08:25:03,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2024-09-25 08:25:08,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=702888.6666666666, ans=12.0 2024-09-25 08:25:12,024 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.300e+02 1.397e+02 1.510e+02 1.872e+02, threshold=2.794e+02, percent-clipped=0.0 2024-09-25 08:25:18,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=702888.6666666666, ans=0.025 2024-09-25 08:25:36,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=702982.0, ans=0.125 2024-09-25 08:25:56,180 INFO [train.py:1198] (0/4) Epoch 39, batch 2600, loss[loss=0.2025, ctc_loss=0.1317, cr_loss=0.354, over 17317.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.124, cr_loss=0.3414, over 3336910.25 frames. ], batch size: 51, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:26:00,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=12.0 2024-09-25 08:26:33,466 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.80 vs. limit=10.0 2024-09-25 08:27:03,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=703215.3333333334, ans=0.1 2024-09-25 08:27:15,880 INFO [train.py:1198] (0/4) Epoch 39, batch 2650, loss[loss=0.1926, ctc_loss=0.1264, cr_loss=0.3309, over 15982.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.124, cr_loss=0.341, over 3338531.83 frames. ], batch size: 74, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:27:27,211 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:27:53,523 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.290e+02 1.367e+02 1.451e+02 1.893e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-25 08:28:30,638 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=703448.6666666666, ans=0.1 2024-09-25 08:28:36,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=15.0 2024-09-25 08:28:42,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=703495.3333333334, ans=0.0 2024-09-25 08:28:42,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=703495.3333333334, ans=0.2 2024-09-25 08:28:43,414 INFO [train.py:1198] (0/4) Epoch 39, batch 2700, loss[loss=0.1717, ctc_loss=0.1063, cr_loss=0.3269, over 17292.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.1242, cr_loss=0.3413, over 3341221.12 frames. ], batch size: 42, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:28:51,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=703495.3333333334, ans=0.5 2024-09-25 08:29:09,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=703542.0, ans=0.125 2024-09-25 08:29:21,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=703588.6666666666, ans=0.125 2024-09-25 08:29:27,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=703588.6666666666, ans=0.1 2024-09-25 08:29:34,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.32 vs. limit=6.0 2024-09-25 08:29:39,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=703635.3333333334, ans=0.125 2024-09-25 08:30:03,226 INFO [train.py:1198] (0/4) Epoch 39, batch 2750, loss[loss=0.2022, ctc_loss=0.1298, cr_loss=0.3621, over 16912.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1246, cr_loss=0.3423, over 3341098.33 frames. ], batch size: 58, lr: 3.04e-03, grad_scale: 16.0 2024-09-25 08:30:16,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=703728.6666666666, ans=0.125 2024-09-25 08:30:41,402 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.292e+02 1.404e+02 1.488e+02 2.567e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-25 08:30:52,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=703868.6666666666, ans=0.07 2024-09-25 08:30:57,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=703868.6666666666, ans=0.025 2024-09-25 08:31:02,434 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2024-09-25 08:31:08,041 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=703915.3333333334, ans=0.125 2024-09-25 08:31:17,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=703915.3333333334, ans=0.125 2024-09-25 08:31:24,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703962.0, ans=0.125 2024-09-25 08:31:25,415 INFO [train.py:1198] (0/4) Epoch 39, batch 2800, loss[loss=0.2241, ctc_loss=0.1485, cr_loss=0.3779, over 16549.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1247, cr_loss=0.3421, over 3334405.55 frames. ], batch size: 66, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:31:28,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=703962.0, ans=0.07 2024-09-25 08:31:38,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=703962.0, ans=0.1 2024-09-25 08:32:14,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-09-25 08:32:45,470 INFO [train.py:1198] (0/4) Epoch 39, batch 2850, loss[loss=0.2109, ctc_loss=0.1359, cr_loss=0.3746, over 15075.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1247, cr_loss=0.3423, over 3336911.01 frames. ], batch size: 89, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:32:59,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=704195.3333333334, ans=0.0 2024-09-25 08:33:22,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=704242.0, ans=0.125 2024-09-25 08:33:28,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=704288.6666666666, ans=0.0 2024-09-25 08:33:31,755 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.313e+02 1.374e+02 1.493e+02 1.951e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-25 08:33:48,561 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=22.5 2024-09-25 08:33:50,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=704335.3333333334, ans=0.2 2024-09-25 08:34:13,711 INFO [train.py:1198] (0/4) Epoch 39, batch 2900, loss[loss=0.1794, ctc_loss=0.1103, cr_loss=0.3455, over 17089.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1246, cr_loss=0.3425, over 3334147.67 frames. ], batch size: 43, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:34:33,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=704475.3333333334, ans=0.125 2024-09-25 08:34:57,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704522.0, ans=0.1 2024-09-25 08:35:12,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=704568.6666666666, ans=0.2 2024-09-25 08:35:13,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=704568.6666666666, ans=0.2 2024-09-25 08:35:16,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=704615.3333333334, ans=0.0 2024-09-25 08:35:17,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.81 vs. limit=10.0 2024-09-25 08:35:34,140 INFO [train.py:1198] (0/4) Epoch 39, batch 2950, loss[loss=0.1736, ctc_loss=0.1123, cr_loss=0.3066, over 17177.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1244, cr_loss=0.3419, over 3335476.05 frames. ], batch size: 45, lr: 3.04e-03, grad_scale: 32.0 2024-09-25 08:35:36,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.70 vs. limit=10.0 2024-09-25 08:36:00,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=704708.6666666666, ans=0.125 2024-09-25 08:36:15,121 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.295e+02 1.375e+02 1.468e+02 1.743e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-25 08:36:23,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=704802.0, ans=0.1 2024-09-25 08:36:55,830 INFO [train.py:1198] (0/4) Epoch 39, batch 3000, loss[loss=0.2307, ctc_loss=0.1522, cr_loss=0.3922, over 15063.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1236, cr_loss=0.3398, over 3335045.22 frames. ], batch size: 89, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:36:55,831 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 08:37:11,152 INFO [train.py:1230] (0/4) Epoch 39, validation: loss=0.03549, ctc_loss=0.03549, cr_loss=9.367e-15, over 944034.00 frames. 2024-09-25 08:37:11,152 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 08:37:22,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=704895.3333333334, ans=0.09899494936611666 2024-09-25 08:37:37,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=704942.0, ans=0.0 2024-09-25 08:37:43,090 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=22.5 2024-09-25 08:37:47,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=704988.6666666666, ans=0.125 2024-09-25 08:38:03,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=12.0 2024-09-25 08:38:29,014 INFO [train.py:1198] (0/4) Epoch 39, batch 3050, loss[loss=0.2116, ctc_loss=0.1387, cr_loss=0.3647, over 17039.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1232, cr_loss=0.339, over 3337796.76 frames. ], batch size: 51, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:39:01,306 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.73 vs. limit=12.0 2024-09-25 08:39:08,844 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.272e+02 1.353e+02 1.453e+02 2.990e+02, threshold=2.707e+02, percent-clipped=1.0 2024-09-25 08:39:17,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=705222.0, ans=0.2 2024-09-25 08:39:22,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=705268.6666666666, ans=0.05 2024-09-25 08:39:34,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=705315.3333333334, ans=0.2 2024-09-25 08:39:45,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=705315.3333333334, ans=0.07 2024-09-25 08:39:54,026 INFO [train.py:1198] (0/4) Epoch 39, batch 3100, loss[loss=0.1826, ctc_loss=0.1171, cr_loss=0.3277, over 17170.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1235, cr_loss=0.3401, over 3337283.68 frames. ], batch size: 45, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:40:05,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=705362.0, ans=0.125 2024-09-25 08:40:09,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=705408.6666666666, ans=0.125 2024-09-25 08:40:13,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=705408.6666666666, ans=0.125 2024-09-25 08:40:17,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=705408.6666666666, ans=0.1 2024-09-25 08:40:27,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2024-09-25 08:40:41,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=705502.0, ans=0.125 2024-09-25 08:40:46,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.74 vs. limit=22.5 2024-09-25 08:41:12,387 INFO [train.py:1198] (0/4) Epoch 39, batch 3150, loss[loss=0.2113, ctc_loss=0.1408, cr_loss=0.3525, over 15962.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1238, cr_loss=0.3407, over 3330451.32 frames. ], batch size: 74, lr: 3.03e-03, grad_scale: 16.0 2024-09-25 08:41:17,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=705595.3333333334, ans=0.1 2024-09-25 08:41:35,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=705642.0, ans=0.0 2024-09-25 08:41:36,819 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-09-25 08:41:50,876 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.273e+02 1.360e+02 1.459e+02 1.912e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-25 08:42:00,439 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:42:02,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=705735.3333333334, ans=0.125 2024-09-25 08:42:16,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=705782.0, ans=0.2 2024-09-25 08:42:22,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.86 vs. limit=15.0 2024-09-25 08:42:29,179 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-09-25 08:42:29,864 INFO [train.py:1198] (0/4) Epoch 39, batch 3200, loss[loss=0.156, ctc_loss=0.09774, cr_loss=0.2911, over 16257.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3404, over 3341794.22 frames. ], batch size: 36, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:42:35,328 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.14 vs. limit=6.0 2024-09-25 08:43:00,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705922.0, ans=0.1 2024-09-25 08:43:01,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=705922.0, ans=0.0 2024-09-25 08:43:08,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=705922.0, ans=0.025 2024-09-25 08:43:08,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=705922.0, ans=0.0 2024-09-25 08:43:20,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=705968.6666666666, ans=0.0 2024-09-25 08:43:48,017 INFO [train.py:1198] (0/4) Epoch 39, batch 3250, loss[loss=0.1864, ctc_loss=0.117, cr_loss=0.3471, over 17143.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1234, cr_loss=0.3402, over 3350028.65 frames. ], batch size: 45, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:43:55,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-09-25 08:43:57,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=706062.0, ans=0.1 2024-09-25 08:44:11,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=706108.6666666666, ans=0.2 2024-09-25 08:44:17,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=706155.3333333334, ans=0.025 2024-09-25 08:44:20,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=706155.3333333334, ans=0.0 2024-09-25 08:44:26,959 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.281e+02 1.355e+02 1.471e+02 2.202e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-25 08:44:32,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2024-09-25 08:44:33,925 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2024-09-25 08:44:57,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706248.6666666666, ans=0.1 2024-09-25 08:44:57,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2024-09-25 08:45:05,829 INFO [train.py:1198] (0/4) Epoch 39, batch 3300, loss[loss=0.2019, ctc_loss=0.1332, cr_loss=0.3432, over 16589.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1238, cr_loss=0.3409, over 3348153.99 frames. ], batch size: 66, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:45:23,461 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:45:39,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=706388.6666666666, ans=0.125 2024-09-25 08:45:45,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=706388.6666666666, ans=0.125 2024-09-25 08:45:49,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2024-09-25 08:46:04,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=706435.3333333334, ans=0.125 2024-09-25 08:46:12,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706482.0, ans=0.1 2024-09-25 08:46:15,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=706482.0, ans=0.0 2024-09-25 08:46:26,547 INFO [train.py:1198] (0/4) Epoch 39, batch 3350, loss[loss=0.2313, ctc_loss=0.1563, cr_loss=0.3751, over 12183.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1243, cr_loss=0.3418, over 3340960.73 frames. ], batch size: 123, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:46:34,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=706528.6666666666, ans=0.0 2024-09-25 08:46:37,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=706528.6666666666, ans=0.07 2024-09-25 08:47:05,602 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.259e+02 1.360e+02 1.476e+02 1.978e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 08:47:17,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2024-09-25 08:47:44,985 INFO [train.py:1198] (0/4) Epoch 39, batch 3400, loss[loss=0.2042, ctc_loss=0.1321, cr_loss=0.3607, over 16717.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1246, cr_loss=0.3423, over 3338063.99 frames. ], batch size: 61, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:48:22,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=706855.3333333334, ans=0.125 2024-09-25 08:48:22,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=706855.3333333334, ans=0.125 2024-09-25 08:48:27,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=706855.3333333334, ans=0.09899494936611666 2024-09-25 08:48:27,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=706855.3333333334, ans=0.125 2024-09-25 08:48:30,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=706902.0, ans=0.0 2024-09-25 08:48:54,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=706948.6666666666, ans=0.125 2024-09-25 08:49:03,010 INFO [train.py:1198] (0/4) Epoch 39, batch 3450, loss[loss=0.2003, ctc_loss=0.13, cr_loss=0.3513, over 17030.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1246, cr_loss=0.3419, over 3344944.17 frames. ], batch size: 51, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:49:06,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.79 vs. limit=10.0 2024-09-25 08:49:17,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.49 vs. limit=6.0 2024-09-25 08:49:18,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=707042.0, ans=0.125 2024-09-25 08:49:48,211 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.321e+02 1.412e+02 1.498e+02 2.585e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-25 08:50:15,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.81 vs. limit=22.5 2024-09-25 08:50:23,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=707182.0, ans=0.1 2024-09-25 08:50:26,705 INFO [train.py:1198] (0/4) Epoch 39, batch 3500, loss[loss=0.1862, ctc_loss=0.1208, cr_loss=0.3272, over 17257.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1241, cr_loss=0.3409, over 3355656.63 frames. ], batch size: 44, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:50:37,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=707228.6666666666, ans=0.125 2024-09-25 08:50:40,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=707275.3333333334, ans=0.2 2024-09-25 08:51:04,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=707322.0, ans=0.2 2024-09-25 08:51:05,284 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2024-09-25 08:51:22,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2024-09-25 08:51:29,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=707415.3333333334, ans=0.2 2024-09-25 08:51:44,940 INFO [train.py:1198] (0/4) Epoch 39, batch 3550, loss[loss=0.1874, ctc_loss=0.1243, cr_loss=0.3151, over 17360.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1248, cr_loss=0.3418, over 3349155.14 frames. ], batch size: 48, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:51:57,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=707462.0, ans=0.2 2024-09-25 08:52:00,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=707508.6666666666, ans=0.025 2024-09-25 08:52:08,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=707508.6666666666, ans=0.1 2024-09-25 08:52:08,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=707508.6666666666, ans=0.0 2024-09-25 08:52:09,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-09-25 08:52:23,764 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.264e+02 1.345e+02 1.429e+02 1.997e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-25 08:52:29,489 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-25 08:52:39,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=707602.0, ans=0.0 2024-09-25 08:53:02,933 INFO [train.py:1198] (0/4) Epoch 39, batch 3600, loss[loss=0.2026, ctc_loss=0.1302, cr_loss=0.3621, over 16758.00 frames. ], tot_loss[loss=0.1936, ctc_loss=0.125, cr_loss=0.3427, over 3354759.84 frames. ], batch size: 61, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:53:12,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=707695.3333333334, ans=0.2 2024-09-25 08:53:14,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=707695.3333333334, ans=0.2 2024-09-25 08:53:20,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=707742.0, ans=0.0 2024-09-25 08:53:24,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-25 08:53:29,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=707742.0, ans=0.2 2024-09-25 08:53:35,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=707788.6666666666, ans=0.125 2024-09-25 08:53:49,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=707835.3333333334, ans=0.125 2024-09-25 08:53:55,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=707835.3333333334, ans=0.125 2024-09-25 08:54:20,463 INFO [train.py:1198] (0/4) Epoch 39, batch 3650, loss[loss=0.1657, ctc_loss=0.1041, cr_loss=0.3078, over 17282.00 frames. ], tot_loss[loss=0.1931, ctc_loss=0.1245, cr_loss=0.3427, over 3361849.28 frames. ], batch size: 42, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:54:41,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=707975.3333333334, ans=0.125 2024-09-25 08:54:53,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=708022.0, ans=0.035 2024-09-25 08:54:59,912 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.284e+02 1.368e+02 1.461e+02 2.127e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 08:55:25,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=708115.3333333334, ans=0.2 2024-09-25 08:55:28,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=708115.3333333334, ans=0.2 2024-09-25 08:55:30,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-25 08:55:31,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=708115.3333333334, ans=0.0 2024-09-25 08:55:40,514 INFO [train.py:1198] (0/4) Epoch 39, batch 3700, loss[loss=0.2078, ctc_loss=0.1359, cr_loss=0.3593, over 16530.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1248, cr_loss=0.3428, over 3361678.69 frames. ], batch size: 66, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:56:35,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=708302.0, ans=0.125 2024-09-25 08:56:48,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=708348.6666666666, ans=0.0 2024-09-25 08:56:48,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=708348.6666666666, ans=0.125 2024-09-25 08:56:59,731 INFO [train.py:1198] (0/4) Epoch 39, batch 3750, loss[loss=0.1776, ctc_loss=0.1135, cr_loss=0.3205, over 17031.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1248, cr_loss=0.3432, over 3354841.24 frames. ], batch size: 39, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:57:21,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=708442.0, ans=0.1 2024-09-25 08:57:38,680 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.270e+02 1.361e+02 1.496e+02 2.109e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-25 08:57:54,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=708535.3333333334, ans=0.025 2024-09-25 08:58:05,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=708582.0, ans=0.125 2024-09-25 08:58:18,292 INFO [train.py:1198] (0/4) Epoch 39, batch 3800, loss[loss=0.1619, ctc_loss=0.1028, cr_loss=0.2952, over 17011.00 frames. ], tot_loss[loss=0.1933, ctc_loss=0.1248, cr_loss=0.3426, over 3349208.85 frames. ], batch size: 39, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:58:31,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=708628.6666666666, ans=0.0 2024-09-25 08:58:38,216 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 08:59:10,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=708768.6666666666, ans=0.09899494936611666 2024-09-25 08:59:21,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=708815.3333333334, ans=0.125 2024-09-25 08:59:31,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=708815.3333333334, ans=0.025 2024-09-25 08:59:39,464 INFO [train.py:1198] (0/4) Epoch 39, batch 3850, loss[loss=0.1479, ctc_loss=0.09324, cr_loss=0.2731, over 17208.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1234, cr_loss=0.3394, over 3329808.75 frames. ], batch size: 41, lr: 3.03e-03, grad_scale: 32.0 2024-09-25 08:59:39,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=708862.0, ans=0.125 2024-09-25 08:59:48,196 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2024-09-25 08:59:57,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=708908.6666666666, ans=0.2 2024-09-25 09:00:12,691 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:00:18,553 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.306e+02 1.417e+02 1.619e+02 3.168e+02, threshold=2.835e+02, percent-clipped=2.0 2024-09-25 09:00:35,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=709002.0, ans=0.2 2024-09-25 09:00:50,469 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-39.pt 2024-09-25 09:01:40,862 INFO [train.py:1198] (0/4) Epoch 40, batch 0, loss[loss=0.2205, ctc_loss=0.144, cr_loss=0.3828, over 16999.00 frames. ], tot_loss[loss=0.2205, ctc_loss=0.144, cr_loss=0.3828, over 16999.00 frames. ], batch size: 56, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:01:40,863 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 09:01:56,601 INFO [train.py:1230] (0/4) Epoch 40, validation: loss=0.03491, ctc_loss=0.03491, cr_loss=1.007e-14, over 944034.00 frames. 2024-09-25 09:01:56,602 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 09:02:00,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=709076.6666666666, ans=0.1 2024-09-25 09:02:27,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=709170.0, ans=0.125 2024-09-25 09:02:31,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=709170.0, ans=0.125 2024-09-25 09:02:46,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=22.5 2024-09-25 09:02:47,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=709216.6666666666, ans=0.125 2024-09-25 09:02:50,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=709216.6666666666, ans=0.2 2024-09-25 09:03:09,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=709263.3333333334, ans=0.025 2024-09-25 09:03:10,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=709263.3333333334, ans=15.0 2024-09-25 09:03:15,889 INFO [train.py:1198] (0/4) Epoch 40, batch 50, loss[loss=0.1529, ctc_loss=0.09487, cr_loss=0.29, over 17278.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1208, cr_loss=0.3369, over 767784.30 frames. ], batch size: 42, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:03:17,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=709310.0, ans=0.5 2024-09-25 09:03:22,679 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-152000.pt 2024-09-25 09:03:42,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=709356.6666666666, ans=0.0 2024-09-25 09:03:44,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=709356.6666666666, ans=0.125 2024-09-25 09:04:10,883 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.285e+02 1.412e+02 1.570e+02 2.190e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-25 09:04:38,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=709496.6666666666, ans=0.5 2024-09-25 09:04:39,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=709496.6666666666, ans=0.1 2024-09-25 09:04:43,258 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:04:44,463 INFO [train.py:1198] (0/4) Epoch 40, batch 100, loss[loss=0.2052, ctc_loss=0.1351, cr_loss=0.3508, over 16695.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1216, cr_loss=0.338, over 1348293.47 frames. ], batch size: 61, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:04:48,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2024-09-25 09:04:55,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=709543.3333333334, ans=0.0 2024-09-25 09:05:14,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=709590.0, ans=0.125 2024-09-25 09:05:16,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=709590.0, ans=15.0 2024-09-25 09:05:25,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=709636.6666666666, ans=0.0 2024-09-25 09:05:28,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=709636.6666666666, ans=0.0 2024-09-25 09:05:30,921 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2024-09-25 09:05:36,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=709683.3333333334, ans=0.2 2024-09-25 09:05:41,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=709683.3333333334, ans=0.125 2024-09-25 09:05:41,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=709683.3333333334, ans=0.025 2024-09-25 09:06:07,000 INFO [train.py:1198] (0/4) Epoch 40, batch 150, loss[loss=0.1953, ctc_loss=0.1263, cr_loss=0.3453, over 17145.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1212, cr_loss=0.3362, over 1795526.24 frames. ], batch size: 45, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:06:08,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=709776.6666666666, ans=0.0 2024-09-25 09:06:50,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709870.0, ans=0.1 2024-09-25 09:06:56,159 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.270e+02 1.336e+02 1.420e+02 2.555e+02, threshold=2.672e+02, percent-clipped=0.0 2024-09-25 09:07:04,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=709916.6666666666, ans=0.2 2024-09-25 09:07:29,580 INFO [train.py:1198] (0/4) Epoch 40, batch 200, loss[loss=0.2098, ctc_loss=0.1349, cr_loss=0.3745, over 17310.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1218, cr_loss=0.3374, over 2145075.80 frames. ], batch size: 51, lr: 2.99e-03, grad_scale: 32.0 2024-09-25 09:07:49,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=710056.6666666666, ans=0.125 2024-09-25 09:08:17,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-25 09:08:32,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=710196.6666666666, ans=0.1 2024-09-25 09:08:33,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2024-09-25 09:08:48,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=710243.3333333334, ans=0.0 2024-09-25 09:08:50,034 INFO [train.py:1198] (0/4) Epoch 40, batch 250, loss[loss=0.1799, ctc_loss=0.1173, cr_loss=0.3129, over 17312.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1224, cr_loss=0.339, over 2420909.89 frames. ], batch size: 51, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:08:51,955 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:08:54,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2024-09-25 09:09:04,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=710243.3333333334, ans=0.025 2024-09-25 09:09:06,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=710243.3333333334, ans=0.125 2024-09-25 09:09:09,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=710290.0, ans=0.0 2024-09-25 09:09:39,757 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.275e+02 1.359e+02 1.486e+02 2.398e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-25 09:10:16,460 INFO [train.py:1198] (0/4) Epoch 40, batch 300, loss[loss=0.2244, ctc_loss=0.1474, cr_loss=0.3852, over 14873.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.122, cr_loss=0.3391, over 2628268.35 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:10:29,483 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:10:35,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=710523.3333333334, ans=0.1 2024-09-25 09:10:43,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=710523.3333333334, ans=0.0 2024-09-25 09:10:59,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=710570.0, ans=0.125 2024-09-25 09:11:36,230 INFO [train.py:1198] (0/4) Epoch 40, batch 350, loss[loss=0.2267, ctc_loss=0.1507, cr_loss=0.3799, over 15017.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1226, cr_loss=0.3401, over 2798125.93 frames. ], batch size: 89, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:11:36,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=710710.0, ans=0.0 2024-09-25 09:11:38,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=710710.0, ans=0.025 2024-09-25 09:12:03,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=710756.6666666666, ans=0.125 2024-09-25 09:12:08,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.84 vs. limit=10.0 2024-09-25 09:12:27,036 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.301e+02 1.389e+02 1.518e+02 2.541e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-25 09:12:51,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=710896.6666666666, ans=0.125 2024-09-25 09:12:52,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710896.6666666666, ans=0.1 2024-09-25 09:12:59,095 INFO [train.py:1198] (0/4) Epoch 40, batch 400, loss[loss=0.2074, ctc_loss=0.1356, cr_loss=0.3589, over 16373.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.3411, over 2910238.55 frames. ], batch size: 66, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:13:17,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2024-09-25 09:13:20,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=710990.0, ans=0.125 2024-09-25 09:13:23,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=710990.0, ans=0.2 2024-09-25 09:13:36,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=711036.6666666666, ans=0.025 2024-09-25 09:13:39,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=711036.6666666666, ans=15.0 2024-09-25 09:14:06,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=711130.0, ans=0.2 2024-09-25 09:14:08,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=711130.0, ans=0.0 2024-09-25 09:14:20,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=711176.6666666666, ans=0.02 2024-09-25 09:14:22,074 INFO [train.py:1198] (0/4) Epoch 40, batch 450, loss[loss=0.1952, ctc_loss=0.1274, cr_loss=0.3389, over 17017.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1229, cr_loss=0.3398, over 3013677.80 frames. ], batch size: 51, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:14:35,693 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=15.0 2024-09-25 09:14:38,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=711223.3333333334, ans=0.125 2024-09-25 09:14:58,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=711270.0, ans=0.125 2024-09-25 09:15:08,669 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.23 vs. limit=15.0 2024-09-25 09:15:12,493 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.283e+02 1.337e+02 1.424e+02 2.250e+02, threshold=2.674e+02, percent-clipped=0.0 2024-09-25 09:15:22,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=711316.6666666666, ans=0.0 2024-09-25 09:15:30,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=711363.3333333334, ans=0.2 2024-09-25 09:15:35,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2024-09-25 09:15:40,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=711363.3333333334, ans=0.0 2024-09-25 09:15:44,621 INFO [train.py:1198] (0/4) Epoch 40, batch 500, loss[loss=0.1702, ctc_loss=0.1078, cr_loss=0.3117, over 17256.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1223, cr_loss=0.3386, over 3091625.01 frames. ], batch size: 44, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:16:51,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=711596.6666666666, ans=0.025 2024-09-25 09:17:07,180 INFO [train.py:1198] (0/4) Epoch 40, batch 550, loss[loss=0.1837, ctc_loss=0.1202, cr_loss=0.3177, over 17341.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1228, cr_loss=0.3397, over 3155256.64 frames. ], batch size: 48, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:17:12,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2024-09-25 09:17:57,147 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.279e+02 1.358e+02 1.490e+02 2.059e+02, threshold=2.716e+02, percent-clipped=0.0 2024-09-25 09:18:25,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2024-09-25 09:18:28,029 INFO [train.py:1198] (0/4) Epoch 40, batch 600, loss[loss=0.2172, ctc_loss=0.1419, cr_loss=0.3762, over 17031.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1227, cr_loss=0.3395, over 3202474.01 frames. ], batch size: 52, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:18:36,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=711876.6666666666, ans=0.125 2024-09-25 09:19:34,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=712016.6666666666, ans=0.125 2024-09-25 09:19:52,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=712063.3333333334, ans=0.2 2024-09-25 09:19:55,901 INFO [train.py:1198] (0/4) Epoch 40, batch 650, loss[loss=0.182, ctc_loss=0.1171, cr_loss=0.3247, over 17007.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1232, cr_loss=0.3403, over 3226580.51 frames. ], batch size: 51, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:20:03,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=712110.0, ans=0.125 2024-09-25 09:20:11,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-09-25 09:20:43,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=712250.0, ans=0.2 2024-09-25 09:20:45,846 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.262e+02 1.364e+02 1.466e+02 1.763e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 09:20:49,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=712250.0, ans=0.125 2024-09-25 09:20:58,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=712296.6666666666, ans=0.0 2024-09-25 09:21:01,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2024-09-25 09:21:13,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-25 09:21:16,167 INFO [train.py:1198] (0/4) Epoch 40, batch 700, loss[loss=0.1685, ctc_loss=0.1079, cr_loss=0.303, over 17258.00 frames. ], tot_loss[loss=0.1932, ctc_loss=0.1246, cr_loss=0.343, over 3261522.69 frames. ], batch size: 44, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:21:37,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=712390.0, ans=0.125 2024-09-25 09:21:47,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712390.0, ans=0.1 2024-09-25 09:21:49,351 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:21:50,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712436.6666666666, ans=0.125 2024-09-25 09:22:38,080 INFO [train.py:1198] (0/4) Epoch 40, batch 750, loss[loss=0.1836, ctc_loss=0.1164, cr_loss=0.3363, over 17030.00 frames. ], tot_loss[loss=0.1935, ctc_loss=0.1248, cr_loss=0.3438, over 3280886.54 frames. ], batch size: 51, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:22:38,797 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2024-09-25 09:22:54,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=712623.3333333334, ans=0.025 2024-09-25 09:22:57,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=712623.3333333334, ans=0.0 2024-09-25 09:22:58,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712623.3333333334, ans=0.1 2024-09-25 09:23:00,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=712623.3333333334, ans=0.125 2024-09-25 09:23:14,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=712670.0, ans=0.0 2024-09-25 09:23:22,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=712670.0, ans=0.0 2024-09-25 09:23:27,402 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.294e+02 1.361e+02 1.444e+02 2.754e+02, threshold=2.722e+02, percent-clipped=1.0 2024-09-25 09:23:48,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=712763.3333333334, ans=0.2 2024-09-25 09:24:03,468 INFO [train.py:1198] (0/4) Epoch 40, batch 800, loss[loss=0.1906, ctc_loss=0.1247, cr_loss=0.3295, over 12136.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.124, cr_loss=0.3417, over 3302351.02 frames. ], batch size: 123, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:24:03,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=712810.0, ans=0.125 2024-09-25 09:24:43,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=712903.3333333334, ans=0.2 2024-09-25 09:24:52,428 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2024-09-25 09:24:58,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=712950.0, ans=0.05 2024-09-25 09:25:03,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=712950.0, ans=0.5 2024-09-25 09:25:19,541 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2024-09-25 09:25:22,509 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=22.5 2024-09-25 09:25:26,602 INFO [train.py:1198] (0/4) Epoch 40, batch 850, loss[loss=0.165, ctc_loss=0.1043, cr_loss=0.3034, over 17305.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3413, over 3308829.82 frames. ], batch size: 46, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:25:35,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=713043.3333333334, ans=0.0 2024-09-25 09:26:16,238 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.263e+02 1.352e+02 1.405e+02 2.259e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-25 09:26:28,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2024-09-25 09:26:30,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=713230.0, ans=0.0 2024-09-25 09:26:36,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=713230.0, ans=0.125 2024-09-25 09:26:47,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=713276.6666666666, ans=0.0 2024-09-25 09:26:49,141 INFO [train.py:1198] (0/4) Epoch 40, batch 900, loss[loss=0.2192, ctc_loss=0.152, cr_loss=0.3357, over 11879.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1242, cr_loss=0.3417, over 3317707.37 frames. ], batch size: 123, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:26:52,816 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:26:54,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=713276.6666666666, ans=0.09899494936611666 2024-09-25 09:27:02,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=713276.6666666666, ans=0.0 2024-09-25 09:27:21,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=713370.0, ans=0.09899494936611666 2024-09-25 09:27:50,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=713416.6666666666, ans=10.0 2024-09-25 09:28:03,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=713463.3333333334, ans=0.0 2024-09-25 09:28:09,221 INFO [train.py:1198] (0/4) Epoch 40, batch 950, loss[loss=0.2151, ctc_loss=0.1386, cr_loss=0.3824, over 16788.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1239, cr_loss=0.3407, over 3319627.70 frames. ], batch size: 61, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:28:14,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=713510.0, ans=10.0 2024-09-25 09:28:19,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=713510.0, ans=0.5 2024-09-25 09:28:20,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=713510.0, ans=0.0 2024-09-25 09:28:24,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=713556.6666666666, ans=0.07 2024-09-25 09:28:53,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=22.5 2024-09-25 09:29:04,046 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.318e+02 1.382e+02 1.485e+02 2.096e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-25 09:29:29,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=713696.6666666666, ans=0.07 2024-09-25 09:29:30,223 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-09-25 09:29:37,543 INFO [train.py:1198] (0/4) Epoch 40, batch 1000, loss[loss=0.2139, ctc_loss=0.1399, cr_loss=0.3704, over 17201.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1228, cr_loss=0.339, over 3340439.33 frames. ], batch size: 55, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:29:39,803 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2024-09-25 09:29:42,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=713743.3333333334, ans=0.125 2024-09-25 09:30:03,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=713790.0, ans=0.125 2024-09-25 09:30:04,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=713790.0, ans=0.0 2024-09-25 09:30:11,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713836.6666666666, ans=0.1 2024-09-25 09:30:14,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=713836.6666666666, ans=0.0 2024-09-25 09:30:31,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2024-09-25 09:30:33,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=713883.3333333334, ans=0.125 2024-09-25 09:30:57,926 INFO [train.py:1198] (0/4) Epoch 40, batch 1050, loss[loss=0.1687, ctc_loss=0.1078, cr_loss=0.3045, over 17214.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1221, cr_loss=0.3378, over 3355054.60 frames. ], batch size: 41, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:30:59,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=713976.6666666666, ans=0.125 2024-09-25 09:31:01,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=713976.6666666666, ans=0.1 2024-09-25 09:31:29,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=714070.0, ans=0.125 2024-09-25 09:31:41,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=714070.0, ans=0.125 2024-09-25 09:31:41,387 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.28 vs. limit=15.0 2024-09-25 09:31:51,953 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.281e+02 1.362e+02 1.480e+02 1.897e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 09:32:12,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=714163.3333333334, ans=0.1 2024-09-25 09:32:20,605 INFO [train.py:1198] (0/4) Epoch 40, batch 1100, loss[loss=0.1787, ctc_loss=0.115, cr_loss=0.3183, over 17107.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1219, cr_loss=0.3376, over 3359939.64 frames. ], batch size: 40, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:33:18,999 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2024-09-25 09:33:33,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=714396.6666666666, ans=0.025 2024-09-25 09:33:34,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=714396.6666666666, ans=0.1 2024-09-25 09:33:43,634 INFO [train.py:1198] (0/4) Epoch 40, batch 1150, loss[loss=0.1658, ctc_loss=0.1064, cr_loss=0.297, over 17145.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1228, cr_loss=0.3395, over 3360681.37 frames. ], batch size: 48, lr: 2.98e-03, grad_scale: 16.0 2024-09-25 09:33:59,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=714443.3333333334, ans=0.0 2024-09-25 09:34:40,142 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.266e+02 1.369e+02 1.564e+02 2.152e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-25 09:35:08,859 INFO [train.py:1198] (0/4) Epoch 40, batch 1200, loss[loss=0.2314, ctc_loss=0.1515, cr_loss=0.3995, over 16991.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1232, cr_loss=0.3409, over 3363177.03 frames. ], batch size: 53, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:35:30,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=714723.3333333334, ans=0.0 2024-09-25 09:35:46,137 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:35:54,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714770.0, ans=0.1 2024-09-25 09:36:13,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=714863.3333333334, ans=0.0 2024-09-25 09:36:29,057 INFO [train.py:1198] (0/4) Epoch 40, batch 1250, loss[loss=0.2182, ctc_loss=0.1429, cr_loss=0.3766, over 17365.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1233, cr_loss=0.3416, over 3363223.57 frames. ], batch size: 48, lr: 2.98e-03, grad_scale: 32.0 2024-09-25 09:36:37,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-09-25 09:36:43,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=714910.0, ans=0.0 2024-09-25 09:36:51,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=714956.6666666666, ans=0.125 2024-09-25 09:36:53,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=12.0 2024-09-25 09:36:54,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714956.6666666666, ans=0.1 2024-09-25 09:37:02,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715003.3333333334, ans=0.125 2024-09-25 09:37:05,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=715003.3333333334, ans=0.125 2024-09-25 09:37:15,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=715003.3333333334, ans=0.2 2024-09-25 09:37:22,759 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.288e+02 1.344e+02 1.458e+02 2.122e+02, threshold=2.687e+02, percent-clipped=0.0 2024-09-25 09:37:51,163 INFO [train.py:1198] (0/4) Epoch 40, batch 1300, loss[loss=0.1738, ctc_loss=0.1093, cr_loss=0.3223, over 17064.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1237, cr_loss=0.3411, over 3346873.33 frames. ], batch size: 46, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:37:53,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-09-25 09:37:59,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=715143.3333333334, ans=0.125 2024-09-25 09:38:13,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=715190.0, ans=0.0 2024-09-25 09:38:23,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715236.6666666666, ans=0.125 2024-09-25 09:38:38,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715236.6666666666, ans=0.1 2024-09-25 09:38:49,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=715283.3333333334, ans=0.125 2024-09-25 09:39:07,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=715330.0, ans=0.0 2024-09-25 09:39:19,071 INFO [train.py:1198] (0/4) Epoch 40, batch 1350, loss[loss=0.1646, ctc_loss=0.1029, cr_loss=0.3083, over 16947.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.124, cr_loss=0.3414, over 3337528.33 frames. ], batch size: 42, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:39:37,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=715423.3333333334, ans=0.0 2024-09-25 09:39:49,992 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:40:10,085 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.269e+02 1.368e+02 1.487e+02 1.942e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 09:40:10,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=715516.6666666666, ans=0.1 2024-09-25 09:40:32,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=715563.3333333334, ans=0.025 2024-09-25 09:40:39,123 INFO [train.py:1198] (0/4) Epoch 40, batch 1400, loss[loss=0.1962, ctc_loss=0.1264, cr_loss=0.3489, over 17293.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.124, cr_loss=0.3415, over 3336115.35 frames. ], batch size: 51, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:40:39,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=715610.0, ans=0.125 2024-09-25 09:40:50,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=715610.0, ans=0.2 2024-09-25 09:41:08,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715656.6666666666, ans=0.125 2024-09-25 09:41:16,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=715703.3333333334, ans=0.125 2024-09-25 09:41:33,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=715750.0, ans=0.0 2024-09-25 09:41:33,589 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:41:35,936 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.95 vs. limit=22.5 2024-09-25 09:41:54,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715796.6666666666, ans=0.1 2024-09-25 09:42:02,343 INFO [train.py:1198] (0/4) Epoch 40, batch 1450, loss[loss=0.2203, ctc_loss=0.1441, cr_loss=0.3813, over 17009.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1236, cr_loss=0.3415, over 3332240.68 frames. ], batch size: 53, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:42:10,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=715843.3333333334, ans=0.0 2024-09-25 09:42:12,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=715843.3333333334, ans=0.2 2024-09-25 09:42:12,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.50 vs. limit=6.0 2024-09-25 09:42:27,014 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-09-25 09:42:29,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=715890.0, ans=15.0 2024-09-25 09:42:30,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=12.0 2024-09-25 09:42:34,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=715936.6666666666, ans=0.0 2024-09-25 09:42:45,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=715936.6666666666, ans=0.0 2024-09-25 09:42:47,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=715936.6666666666, ans=0.2 2024-09-25 09:42:48,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=715983.3333333334, ans=0.0 2024-09-25 09:42:50,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=715983.3333333334, ans=0.125 2024-09-25 09:42:53,420 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.248e+02 1.311e+02 1.404e+02 2.777e+02, threshold=2.622e+02, percent-clipped=1.0 2024-09-25 09:42:55,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=715983.3333333334, ans=0.125 2024-09-25 09:42:56,958 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=715983.3333333334, ans=0.125 2024-09-25 09:43:01,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=715983.3333333334, ans=0.125 2024-09-25 09:43:07,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=716030.0, ans=0.0 2024-09-25 09:43:21,949 INFO [train.py:1198] (0/4) Epoch 40, batch 1500, loss[loss=0.1845, ctc_loss=0.1165, cr_loss=0.34, over 17079.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1237, cr_loss=0.342, over 3327651.23 frames. ], batch size: 43, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:43:43,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.17 vs. limit=10.0 2024-09-25 09:43:56,582 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2024-09-25 09:44:02,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=716170.0, ans=0.025 2024-09-25 09:44:04,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=716170.0, ans=0.0 2024-09-25 09:44:49,470 INFO [train.py:1198] (0/4) Epoch 40, batch 1550, loss[loss=0.1951, ctc_loss=0.1253, cr_loss=0.3492, over 17298.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1236, cr_loss=0.3418, over 3339958.96 frames. ], batch size: 46, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:45:31,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=716403.3333333334, ans=0.2 2024-09-25 09:45:34,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=716403.3333333334, ans=0.125 2024-09-25 09:45:42,231 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.286e+02 1.352e+02 1.488e+02 2.645e+02, threshold=2.703e+02, percent-clipped=1.0 2024-09-25 09:45:42,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=716450.0, ans=0.0 2024-09-25 09:45:45,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=716450.0, ans=0.0 2024-09-25 09:45:57,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=716496.6666666666, ans=0.09899494936611666 2024-09-25 09:46:09,315 INFO [train.py:1198] (0/4) Epoch 40, batch 1600, loss[loss=0.1702, ctc_loss=0.1071, cr_loss=0.3154, over 17259.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1234, cr_loss=0.342, over 3348358.66 frames. ], batch size: 44, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:46:28,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=716590.0, ans=0.125 2024-09-25 09:46:33,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=716590.0, ans=0.125 2024-09-25 09:47:32,135 INFO [train.py:1198] (0/4) Epoch 40, batch 1650, loss[loss=0.1601, ctc_loss=0.09973, cr_loss=0.3019, over 17102.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1235, cr_loss=0.3424, over 3356863.61 frames. ], batch size: 40, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:48:27,242 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.303e+02 1.369e+02 1.457e+02 1.993e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 09:48:47,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716963.3333333334, ans=0.1 2024-09-25 09:48:56,878 INFO [train.py:1198] (0/4) Epoch 40, batch 1700, loss[loss=0.1668, ctc_loss=0.1058, cr_loss=0.305, over 17032.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1229, cr_loss=0.3405, over 3356821.69 frames. ], batch size: 44, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:48:58,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717010.0, ans=0.1 2024-09-25 09:49:05,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=717010.0, ans=0.125 2024-09-25 09:49:08,166 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 09:49:27,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.34 vs. limit=10.0 2024-09-25 09:49:50,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717150.0, ans=0.1 2024-09-25 09:49:55,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=717150.0, ans=0.125 2024-09-25 09:50:18,903 INFO [train.py:1198] (0/4) Epoch 40, batch 1750, loss[loss=0.1748, ctc_loss=0.1123, cr_loss=0.3124, over 17317.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1233, cr_loss=0.3416, over 3356797.53 frames. ], batch size: 49, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:50:25,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=717243.3333333334, ans=0.025 2024-09-25 09:50:31,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=717243.3333333334, ans=0.0 2024-09-25 09:51:11,756 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.261e+02 1.332e+02 1.446e+02 2.217e+02, threshold=2.663e+02, percent-clipped=0.0 2024-09-25 09:51:14,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2024-09-25 09:51:15,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=717383.3333333334, ans=0.07 2024-09-25 09:51:21,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=717430.0, ans=0.125 2024-09-25 09:51:40,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=717476.6666666666, ans=0.0 2024-09-25 09:51:41,754 INFO [train.py:1198] (0/4) Epoch 40, batch 1800, loss[loss=0.1946, ctc_loss=0.129, cr_loss=0.3278, over 17222.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1237, cr_loss=0.3418, over 3347984.84 frames. ], batch size: 55, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:51:46,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=717476.6666666666, ans=0.0 2024-09-25 09:51:48,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.15 vs. limit=22.5 2024-09-25 09:51:50,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2024-09-25 09:51:59,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=717523.3333333334, ans=0.1 2024-09-25 09:52:03,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=717523.3333333334, ans=0.0 2024-09-25 09:52:30,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2024-09-25 09:52:48,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=15.0 2024-09-25 09:53:01,966 INFO [train.py:1198] (0/4) Epoch 40, batch 1850, loss[loss=0.1938, ctc_loss=0.1255, cr_loss=0.3419, over 17225.00 frames. ], tot_loss[loss=0.1927, ctc_loss=0.1242, cr_loss=0.3428, over 3346919.61 frames. ], batch size: 47, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:53:50,652 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=717803.3333333334, ans=0.125 2024-09-25 09:53:50,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=717803.3333333334, ans=0.0 2024-09-25 09:53:57,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=717850.0, ans=0.04949747468305833 2024-09-25 09:54:00,036 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.246e+02 1.343e+02 1.431e+02 1.891e+02, threshold=2.686e+02, percent-clipped=0.0 2024-09-25 09:54:04,233 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2024-09-25 09:54:29,718 INFO [train.py:1198] (0/4) Epoch 40, batch 1900, loss[loss=0.2102, ctc_loss=0.1369, cr_loss=0.3665, over 16921.00 frames. ], tot_loss[loss=0.193, ctc_loss=0.1244, cr_loss=0.3427, over 3360348.17 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:54:35,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=22.5 2024-09-25 09:54:55,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=717990.0, ans=0.125 2024-09-25 09:55:05,785 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2024-09-25 09:55:21,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=718083.3333333334, ans=0.125 2024-09-25 09:55:24,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=718083.3333333334, ans=0.0 2024-09-25 09:55:25,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=718083.3333333334, ans=0.125 2024-09-25 09:55:49,712 INFO [train.py:1198] (0/4) Epoch 40, batch 1950, loss[loss=0.151, ctc_loss=0.0932, cr_loss=0.2891, over 17250.00 frames. ], tot_loss[loss=0.1929, ctc_loss=0.1244, cr_loss=0.3426, over 3355738.02 frames. ], batch size: 42, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:56:02,873 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=718176.6666666666, ans=0.125 2024-09-25 09:56:08,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-09-25 09:56:19,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=718223.3333333334, ans=0.125 2024-09-25 09:56:26,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=718270.0, ans=0.1 2024-09-25 09:56:42,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=718316.6666666666, ans=0.125 2024-09-25 09:56:44,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2024-09-25 09:56:45,247 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.303e+02 1.365e+02 1.463e+02 2.159e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-25 09:57:12,482 INFO [train.py:1198] (0/4) Epoch 40, batch 2000, loss[loss=0.1955, ctc_loss=0.124, cr_loss=0.3576, over 17230.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1235, cr_loss=0.3412, over 3367459.22 frames. ], batch size: 50, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:57:24,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=718410.0, ans=0.125 2024-09-25 09:57:27,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=718456.6666666666, ans=0.025 2024-09-25 09:57:29,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.83 vs. limit=22.5 2024-09-25 09:57:30,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=718456.6666666666, ans=0.125 2024-09-25 09:57:39,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=718456.6666666666, ans=0.2 2024-09-25 09:57:45,248 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2024-09-25 09:57:59,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718550.0, ans=0.1 2024-09-25 09:58:27,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=718596.6666666666, ans=0.0 2024-09-25 09:58:32,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=718596.6666666666, ans=0.125 2024-09-25 09:58:37,850 INFO [train.py:1198] (0/4) Epoch 40, batch 2050, loss[loss=0.1781, ctc_loss=0.1127, cr_loss=0.3273, over 17015.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3405, over 3364411.18 frames. ], batch size: 51, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 09:58:44,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=718643.3333333334, ans=0.125 2024-09-25 09:59:06,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=718690.0, ans=0.125 2024-09-25 09:59:11,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=718736.6666666666, ans=15.0 2024-09-25 09:59:33,096 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.276e+02 1.355e+02 1.490e+02 2.224e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-25 09:59:38,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=718783.3333333334, ans=0.0 2024-09-25 10:00:00,330 INFO [train.py:1198] (0/4) Epoch 40, batch 2100, loss[loss=0.2108, ctc_loss=0.1381, cr_loss=0.3635, over 16561.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1234, cr_loss=0.3413, over 3362294.55 frames. ], batch size: 66, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 10:00:04,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=718876.6666666666, ans=0.2 2024-09-25 10:00:13,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=718876.6666666666, ans=0.0 2024-09-25 10:00:37,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=718970.0, ans=0.0 2024-09-25 10:00:53,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=719016.6666666666, ans=0.025 2024-09-25 10:00:58,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=719016.6666666666, ans=0.1 2024-09-25 10:01:06,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=719063.3333333334, ans=0.2 2024-09-25 10:01:12,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=719063.3333333334, ans=0.2 2024-09-25 10:01:15,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=719063.3333333334, ans=0.0 2024-09-25 10:01:20,396 INFO [train.py:1198] (0/4) Epoch 40, batch 2150, loss[loss=0.2125, ctc_loss=0.1377, cr_loss=0.3738, over 16874.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1227, cr_loss=0.3403, over 3363209.74 frames. ], batch size: 58, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 10:01:44,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=719156.6666666666, ans=0.125 2024-09-25 10:01:50,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.03 vs. limit=15.0 2024-09-25 10:02:07,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.84 vs. limit=10.0 2024-09-25 10:02:16,125 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.284e+02 1.361e+02 1.483e+02 2.210e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-25 10:02:16,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=719250.0, ans=0.125 2024-09-25 10:02:27,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=719296.6666666666, ans=0.125 2024-09-25 10:02:28,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=22.5 2024-09-25 10:02:43,177 INFO [train.py:1198] (0/4) Epoch 40, batch 2200, loss[loss=0.1785, ctc_loss=0.1115, cr_loss=0.335, over 17215.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.123, cr_loss=0.3407, over 3368088.11 frames. ], batch size: 47, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 10:03:57,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=719530.0, ans=0.125 2024-09-25 10:04:08,275 INFO [train.py:1198] (0/4) Epoch 40, batch 2250, loss[loss=0.1675, ctc_loss=0.1057, cr_loss=0.3087, over 17159.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1232, cr_loss=0.3406, over 3375664.29 frames. ], batch size: 45, lr: 2.97e-03, grad_scale: 32.0 2024-09-25 10:04:27,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=719623.3333333334, ans=0.125 2024-09-25 10:04:45,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=719670.0, ans=0.125 2024-09-25 10:04:46,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=719670.0, ans=0.125 2024-09-25 10:04:51,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=719670.0, ans=0.1 2024-09-25 10:04:54,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=719670.0, ans=0.2 2024-09-25 10:05:04,030 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.272e+02 1.370e+02 1.440e+02 2.779e+02, threshold=2.741e+02, percent-clipped=1.0 2024-09-25 10:05:31,321 INFO [train.py:1198] (0/4) Epoch 40, batch 2300, loss[loss=0.2028, ctc_loss=0.1345, cr_loss=0.3416, over 17338.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1227, cr_loss=0.3394, over 3373174.70 frames. ], batch size: 48, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:06:16,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=719903.3333333334, ans=0.1 2024-09-25 10:06:27,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=719950.0, ans=0.025 2024-09-25 10:06:52,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720043.3333333334, ans=0.1 2024-09-25 10:06:52,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=720043.3333333334, ans=0.125 2024-09-25 10:06:52,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=720043.3333333334, ans=0.0 2024-09-25 10:06:54,046 INFO [train.py:1198] (0/4) Epoch 40, batch 2350, loss[loss=0.1772, ctc_loss=0.1153, cr_loss=0.3095, over 17114.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1228, cr_loss=0.3399, over 3362080.21 frames. ], batch size: 40, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:07:14,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2024-09-25 10:07:46,592 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.272e+02 1.331e+02 1.456e+02 1.927e+02, threshold=2.662e+02, percent-clipped=0.0 2024-09-25 10:08:16,523 INFO [train.py:1198] (0/4) Epoch 40, batch 2400, loss[loss=0.2212, ctc_loss=0.1416, cr_loss=0.3981, over 17032.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1225, cr_loss=0.3397, over 3372177.53 frames. ], batch size: 52, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:08:25,345 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=12.0 2024-09-25 10:08:31,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=720323.3333333334, ans=0.125 2024-09-25 10:08:51,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720370.0, ans=0.1 2024-09-25 10:09:36,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=720463.3333333334, ans=0.2 2024-09-25 10:09:41,533 INFO [train.py:1198] (0/4) Epoch 40, batch 2450, loss[loss=0.1845, ctc_loss=0.1158, cr_loss=0.3435, over 17063.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1223, cr_loss=0.3398, over 3377449.85 frames. ], batch size: 46, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:09:41,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=720510.0, ans=0.1 2024-09-25 10:09:58,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2024-09-25 10:10:20,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720603.3333333334, ans=0.1 2024-09-25 10:10:34,963 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.276e+02 1.362e+02 1.464e+02 1.936e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 10:10:35,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.90 vs. limit=22.5 2024-09-25 10:10:41,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=720650.0, ans=0.07 2024-09-25 10:10:44,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=720696.6666666666, ans=0.0 2024-09-25 10:10:51,493 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.57 vs. limit=6.0 2024-09-25 10:11:00,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=720743.3333333334, ans=0.2 2024-09-25 10:11:00,822 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=720743.3333333334, ans=0.125 2024-09-25 10:11:02,048 INFO [train.py:1198] (0/4) Epoch 40, batch 2500, loss[loss=0.1729, ctc_loss=0.1098, cr_loss=0.3157, over 17243.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1223, cr_loss=0.3398, over 3374816.31 frames. ], batch size: 44, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:11:07,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=720743.3333333334, ans=0.125 2024-09-25 10:11:23,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=720790.0, ans=0.1 2024-09-25 10:11:51,480 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:12:01,417 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-09-25 10:12:24,941 INFO [train.py:1198] (0/4) Epoch 40, batch 2550, loss[loss=0.2124, ctc_loss=0.1388, cr_loss=0.3677, over 16586.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1231, cr_loss=0.3409, over 3359897.76 frames. ], batch size: 66, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:12:26,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=720976.6666666666, ans=0.125 2024-09-25 10:12:49,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=721023.3333333334, ans=0.0 2024-09-25 10:12:54,649 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-25 10:13:19,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=721116.6666666666, ans=0.0 2024-09-25 10:13:20,275 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.334e+02 1.455e+02 1.593e+02 2.101e+02, threshold=2.910e+02, percent-clipped=0.0 2024-09-25 10:13:50,328 INFO [train.py:1198] (0/4) Epoch 40, batch 2600, loss[loss=0.1584, ctc_loss=0.1006, cr_loss=0.2893, over 16768.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1227, cr_loss=0.3398, over 3359791.64 frames. ], batch size: 37, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:13:57,467 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=22.5 2024-09-25 10:14:27,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721303.3333333334, ans=0.1 2024-09-25 10:14:30,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=721303.3333333334, ans=0.2 2024-09-25 10:14:34,040 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2024-09-25 10:14:44,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=721350.0, ans=0.125 2024-09-25 10:14:52,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=721350.0, ans=0.0 2024-09-25 10:15:00,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721396.6666666666, ans=0.1 2024-09-25 10:15:02,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=721396.6666666666, ans=0.125 2024-09-25 10:15:07,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=721396.6666666666, ans=0.025 2024-09-25 10:15:13,272 INFO [train.py:1198] (0/4) Epoch 40, batch 2650, loss[loss=0.1812, ctc_loss=0.1171, cr_loss=0.3203, over 17246.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1223, cr_loss=0.3392, over 3364343.74 frames. ], batch size: 44, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:15:26,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=721443.3333333334, ans=0.125 2024-09-25 10:15:29,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=721490.0, ans=0.0 2024-09-25 10:15:44,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=721536.6666666666, ans=0.125 2024-09-25 10:15:47,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=721536.6666666666, ans=0.0 2024-09-25 10:15:57,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=721536.6666666666, ans=0.125 2024-09-25 10:16:06,674 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.318e+02 1.393e+02 1.493e+02 1.834e+02, threshold=2.785e+02, percent-clipped=0.0 2024-09-25 10:16:24,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=721630.0, ans=0.125 2024-09-25 10:16:27,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=721630.0, ans=0.2 2024-09-25 10:16:36,435 INFO [train.py:1198] (0/4) Epoch 40, batch 2700, loss[loss=0.1751, ctc_loss=0.1129, cr_loss=0.311, over 16964.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1231, cr_loss=0.3403, over 3350725.12 frames. ], batch size: 42, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:16:40,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.45 vs. limit=10.0 2024-09-25 10:16:47,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=721676.6666666666, ans=0.0 2024-09-25 10:16:57,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=721723.3333333334, ans=0.0 2024-09-25 10:16:57,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=721723.3333333334, ans=0.2 2024-09-25 10:17:36,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=721816.6666666666, ans=0.0 2024-09-25 10:17:49,277 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-09-25 10:17:56,553 INFO [train.py:1198] (0/4) Epoch 40, batch 2750, loss[loss=0.1905, ctc_loss=0.126, cr_loss=0.3224, over 16727.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1238, cr_loss=0.3417, over 3354355.48 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:18:03,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=721910.0, ans=0.0 2024-09-25 10:18:28,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2024-09-25 10:18:46,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=722003.3333333334, ans=0.125 2024-09-25 10:18:54,278 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.078e+02 1.293e+02 1.370e+02 1.533e+02 1.979e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-25 10:18:55,036 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=15.0 2024-09-25 10:19:24,034 INFO [train.py:1198] (0/4) Epoch 40, batch 2800, loss[loss=0.2038, ctc_loss=0.129, cr_loss=0.3739, over 16939.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1235, cr_loss=0.3411, over 3355645.89 frames. ], batch size: 42, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:19:29,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722143.3333333334, ans=0.1 2024-09-25 10:20:22,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=722283.3333333334, ans=0.125 2024-09-25 10:20:26,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722330.0, ans=0.1 2024-09-25 10:20:44,361 INFO [train.py:1198] (0/4) Epoch 40, batch 2850, loss[loss=0.1981, ctc_loss=0.125, cr_loss=0.3656, over 15876.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1239, cr_loss=0.3416, over 3357554.00 frames. ], batch size: 74, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:21:01,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=8.0 2024-09-25 10:21:13,960 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-25 10:21:31,297 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.46 vs. limit=15.0 2024-09-25 10:21:38,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=722516.6666666666, ans=0.125 2024-09-25 10:21:40,029 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.304e+02 1.350e+02 1.486e+02 2.860e+02, threshold=2.699e+02, percent-clipped=1.0 2024-09-25 10:22:07,587 INFO [train.py:1198] (0/4) Epoch 40, batch 2900, loss[loss=0.172, ctc_loss=0.1098, cr_loss=0.3108, over 17359.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3402, over 3354518.01 frames. ], batch size: 48, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:22:07,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=722610.0, ans=0.1 2024-09-25 10:22:07,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=722610.0, ans=0.025 2024-09-25 10:22:11,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=722610.0, ans=0.125 2024-09-25 10:23:32,770 INFO [train.py:1198] (0/4) Epoch 40, batch 2950, loss[loss=0.1894, ctc_loss=0.1224, cr_loss=0.335, over 17246.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1238, cr_loss=0.3414, over 3364645.20 frames. ], batch size: 55, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:23:49,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=722890.0, ans=0.125 2024-09-25 10:24:07,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=722936.6666666666, ans=0.07 2024-09-25 10:24:14,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=722936.6666666666, ans=0.125 2024-09-25 10:24:19,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.38 vs. limit=10.0 2024-09-25 10:24:19,116 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.40 vs. limit=6.0 2024-09-25 10:24:20,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=722936.6666666666, ans=0.125 2024-09-25 10:24:27,665 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.302e+02 1.375e+02 1.472e+02 2.387e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-25 10:24:29,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=722983.3333333334, ans=0.0 2024-09-25 10:24:53,998 INFO [train.py:1198] (0/4) Epoch 40, batch 3000, loss[loss=0.2164, ctc_loss=0.1412, cr_loss=0.3762, over 17025.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1243, cr_loss=0.3425, over 3365638.58 frames. ], batch size: 53, lr: 2.96e-03, grad_scale: 32.0 2024-09-25 10:24:53,999 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 10:25:06,626 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4593, 4.5252, 5.2463, 5.0190], device='cuda:0') 2024-09-25 10:25:09,258 INFO [train.py:1230] (0/4) Epoch 40, validation: loss=0.03571, ctc_loss=0.03571, cr_loss=9.785e-15, over 944034.00 frames. 2024-09-25 10:25:09,259 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 10:25:31,479 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:25:32,988 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=723123.3333333334, ans=0.2 2024-09-25 10:26:06,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-09-25 10:26:07,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=723216.6666666666, ans=0.125 2024-09-25 10:26:27,244 INFO [train.py:1198] (0/4) Epoch 40, batch 3050, loss[loss=0.2358, ctc_loss=0.1558, cr_loss=0.4, over 15287.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.123, cr_loss=0.3399, over 3362968.23 frames. ], batch size: 89, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:26:40,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723310.0, ans=0.1 2024-09-25 10:27:02,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-09-25 10:27:06,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=723403.3333333334, ans=0.0 2024-09-25 10:27:17,668 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=723450.0, ans=0.125 2024-09-25 10:27:20,627 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.084e+02 1.265e+02 1.358e+02 1.442e+02 1.711e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 10:27:22,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=723450.0, ans=0.0 2024-09-25 10:27:44,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=723543.3333333334, ans=0.125 2024-09-25 10:27:45,692 INFO [train.py:1198] (0/4) Epoch 40, batch 3100, loss[loss=0.185, ctc_loss=0.1201, cr_loss=0.3243, over 16944.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1224, cr_loss=0.3385, over 3360167.49 frames. ], batch size: 42, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:27:52,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=723543.3333333334, ans=0.015 2024-09-25 10:27:53,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=723543.3333333334, ans=0.025 2024-09-25 10:28:22,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=723636.6666666666, ans=0.125 2024-09-25 10:28:43,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=723683.3333333334, ans=0.07 2024-09-25 10:28:51,319 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.12 vs. limit=10.0 2024-09-25 10:29:06,038 INFO [train.py:1198] (0/4) Epoch 40, batch 3150, loss[loss=0.1828, ctc_loss=0.1162, cr_loss=0.3329, over 16650.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1227, cr_loss=0.3392, over 3363256.29 frames. ], batch size: 37, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:29:25,399 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-09-25 10:29:59,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=723916.6666666666, ans=0.2 2024-09-25 10:30:00,975 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.278e+02 1.373e+02 1.497e+02 1.845e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-25 10:30:17,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=723963.3333333334, ans=0.125 2024-09-25 10:30:24,824 INFO [train.py:1198] (0/4) Epoch 40, batch 3200, loss[loss=0.1965, ctc_loss=0.1256, cr_loss=0.3543, over 17220.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1227, cr_loss=0.3398, over 3360751.16 frames. ], batch size: 47, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:30:31,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=724010.0, ans=0.125 2024-09-25 10:30:32,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=724010.0, ans=0.125 2024-09-25 10:30:42,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=724056.6666666666, ans=0.95 2024-09-25 10:30:53,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724056.6666666666, ans=0.1 2024-09-25 10:30:58,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=724103.3333333334, ans=0.07 2024-09-25 10:31:37,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=724196.6666666666, ans=0.125 2024-09-25 10:31:43,140 INFO [train.py:1198] (0/4) Epoch 40, batch 3250, loss[loss=0.1944, ctc_loss=0.1256, cr_loss=0.3439, over 16751.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1233, cr_loss=0.3405, over 3358974.92 frames. ], batch size: 61, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:32:02,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724290.0, ans=0.1 2024-09-25 10:32:10,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=22.5 2024-09-25 10:32:26,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=724336.6666666666, ans=0.0 2024-09-25 10:32:42,130 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.319e+02 1.415e+02 1.534e+02 3.607e+02, threshold=2.830e+02, percent-clipped=1.0 2024-09-25 10:33:05,516 INFO [train.py:1198] (0/4) Epoch 40, batch 3300, loss[loss=0.1817, ctc_loss=0.1163, cr_loss=0.3269, over 17018.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1235, cr_loss=0.3415, over 3365248.48 frames. ], batch size: 44, lr: 2.96e-03, grad_scale: 16.0 2024-09-25 10:33:52,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=724616.6666666666, ans=0.5 2024-09-25 10:34:07,398 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=22.5 2024-09-25 10:34:12,437 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.47 vs. limit=10.0 2024-09-25 10:34:17,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=724663.3333333334, ans=0.0 2024-09-25 10:34:25,461 INFO [train.py:1198] (0/4) Epoch 40, batch 3350, loss[loss=0.2228, ctc_loss=0.1447, cr_loss=0.3907, over 17207.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1234, cr_loss=0.3411, over 3362651.46 frames. ], batch size: 55, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:34:59,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=724803.3333333334, ans=0.125 2024-09-25 10:35:19,909 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.291e+02 1.364e+02 1.477e+02 1.997e+02, threshold=2.727e+02, percent-clipped=0.0 2024-09-25 10:35:28,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=724896.6666666666, ans=0.125 2024-09-25 10:35:31,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=724896.6666666666, ans=0.0 2024-09-25 10:35:34,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=724896.6666666666, ans=0.125 2024-09-25 10:35:43,180 INFO [train.py:1198] (0/4) Epoch 40, batch 3400, loss[loss=0.217, ctc_loss=0.1386, cr_loss=0.3918, over 16611.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1233, cr_loss=0.3411, over 3355916.67 frames. ], batch size: 66, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:35:47,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-09-25 10:35:51,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=724943.3333333334, ans=0.0 2024-09-25 10:35:54,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724943.3333333334, ans=0.1 2024-09-25 10:36:19,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=725036.6666666666, ans=0.05 2024-09-25 10:36:47,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725130.0, ans=0.1 2024-09-25 10:37:01,220 INFO [train.py:1198] (0/4) Epoch 40, batch 3450, loss[loss=0.1567, ctc_loss=0.1012, cr_loss=0.2775, over 16210.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1238, cr_loss=0.3426, over 3349867.47 frames. ], batch size: 36, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:37:03,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=725176.6666666666, ans=0.0 2024-09-25 10:37:13,497 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2024-09-25 10:37:15,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725223.3333333334, ans=0.1 2024-09-25 10:37:30,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=725223.3333333334, ans=0.125 2024-09-25 10:37:56,653 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.302e+02 1.377e+02 1.496e+02 2.659e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 10:37:56,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=725316.6666666666, ans=0.125 2024-09-25 10:38:12,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=725363.3333333334, ans=0.125 2024-09-25 10:38:20,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=725410.0, ans=0.125 2024-09-25 10:38:22,254 INFO [train.py:1198] (0/4) Epoch 40, batch 3500, loss[loss=0.1522, ctc_loss=0.09794, cr_loss=0.2712, over 16948.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1231, cr_loss=0.3404, over 3351366.83 frames. ], batch size: 42, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:38:27,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=725410.0, ans=0.0 2024-09-25 10:38:37,293 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-09-25 10:38:50,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=725456.6666666666, ans=0.025 2024-09-25 10:39:10,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=725550.0, ans=0.025 2024-09-25 10:39:31,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=725596.6666666666, ans=0.125 2024-09-25 10:39:31,298 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2024-09-25 10:39:32,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=725596.6666666666, ans=0.125 2024-09-25 10:39:39,942 INFO [train.py:1198] (0/4) Epoch 40, batch 3550, loss[loss=0.2202, ctc_loss=0.1462, cr_loss=0.3699, over 15044.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1228, cr_loss=0.3396, over 3344409.53 frames. ], batch size: 89, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:39:43,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=725643.3333333334, ans=0.125 2024-09-25 10:40:16,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=725736.6666666666, ans=0.125 2024-09-25 10:40:31,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=725783.3333333334, ans=0.125 2024-09-25 10:40:34,536 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.284e+02 1.346e+02 1.448e+02 2.075e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-25 10:40:41,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=725830.0, ans=0.0 2024-09-25 10:40:47,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=725830.0, ans=0.05 2024-09-25 10:40:58,149 INFO [train.py:1198] (0/4) Epoch 40, batch 3600, loss[loss=0.2242, ctc_loss=0.1434, cr_loss=0.4042, over 16910.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1225, cr_loss=0.3395, over 3344563.95 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 32.0 2024-09-25 10:40:58,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=725876.6666666666, ans=0.0 2024-09-25 10:41:03,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=725876.6666666666, ans=0.2 2024-09-25 10:41:09,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725876.6666666666, ans=0.1 2024-09-25 10:41:29,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=725970.0, ans=0.2 2024-09-25 10:41:31,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=725970.0, ans=0.04949747468305833 2024-09-25 10:42:04,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=726063.3333333334, ans=0.125 2024-09-25 10:42:20,145 INFO [train.py:1198] (0/4) Epoch 40, batch 3650, loss[loss=0.2291, ctc_loss=0.1467, cr_loss=0.4116, over 17003.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1222, cr_loss=0.3391, over 3356478.56 frames. ], batch size: 56, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:42:48,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=726156.6666666666, ans=0.125 2024-09-25 10:43:19,107 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.298e+02 1.359e+02 1.454e+02 1.962e+02, threshold=2.718e+02, percent-clipped=0.0 2024-09-25 10:43:33,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=726296.6666666666, ans=0.0 2024-09-25 10:43:36,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=726296.6666666666, ans=0.0 2024-09-25 10:43:41,337 INFO [train.py:1198] (0/4) Epoch 40, batch 3700, loss[loss=0.1683, ctc_loss=0.106, cr_loss=0.3116, over 17253.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.338, over 3358610.54 frames. ], batch size: 44, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:43:46,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=726343.3333333334, ans=0.0 2024-09-25 10:43:53,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=726343.3333333334, ans=0.1 2024-09-25 10:44:19,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=726436.6666666666, ans=0.125 2024-09-25 10:44:38,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=726483.3333333334, ans=0.125 2024-09-25 10:44:41,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=726483.3333333334, ans=0.035 2024-09-25 10:44:41,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=726483.3333333334, ans=0.125 2024-09-25 10:44:44,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=726530.0, ans=0.1 2024-09-25 10:44:50,863 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=726530.0, ans=0.125 2024-09-25 10:44:59,749 INFO [train.py:1198] (0/4) Epoch 40, batch 3750, loss[loss=0.163, ctc_loss=0.1033, cr_loss=0.2986, over 16967.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1223, cr_loss=0.3386, over 3340235.52 frames. ], batch size: 42, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:45:32,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=726670.0, ans=0.0 2024-09-25 10:45:55,149 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.300e+02 1.354e+02 1.476e+02 2.861e+02, threshold=2.708e+02, percent-clipped=2.0 2024-09-25 10:46:16,526 INFO [train.py:1198] (0/4) Epoch 40, batch 3800, loss[loss=0.1592, ctc_loss=0.09908, cr_loss=0.3005, over 16977.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1223, cr_loss=0.3385, over 3329930.12 frames. ], batch size: 42, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:46:29,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=726810.0, ans=0.025 2024-09-25 10:46:32,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=726856.6666666666, ans=0.125 2024-09-25 10:46:33,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=726856.6666666666, ans=0.0 2024-09-25 10:46:43,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=726856.6666666666, ans=0.0 2024-09-25 10:47:04,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=726950.0, ans=0.125 2024-09-25 10:47:23,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=726996.6666666666, ans=0.2 2024-09-25 10:47:34,474 INFO [train.py:1198] (0/4) Epoch 40, batch 3850, loss[loss=0.1888, ctc_loss=0.1218, cr_loss=0.3348, over 16956.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1227, cr_loss=0.3392, over 3311781.80 frames. ], batch size: 58, lr: 2.95e-03, grad_scale: 16.0 2024-09-25 10:47:51,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=727090.0, ans=0.04949747468305833 2024-09-25 10:47:58,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-25 10:48:07,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=727136.6666666666, ans=0.02 2024-09-25 10:48:18,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=727136.6666666666, ans=0.125 2024-09-25 10:48:24,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=727183.3333333334, ans=0.125 2024-09-25 10:48:30,296 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.319e+02 1.427e+02 1.593e+02 2.504e+02, threshold=2.853e+02, percent-clipped=0.0 2024-09-25 10:48:30,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727183.3333333334, ans=0.1 2024-09-25 10:48:35,060 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:48:39,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=727230.0, ans=0.125 2024-09-25 10:48:44,949 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-40.pt 2024-09-25 10:49:36,295 INFO [train.py:1198] (0/4) Epoch 41, batch 0, loss[loss=0.1919, ctc_loss=0.1216, cr_loss=0.3511, over 17306.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1216, cr_loss=0.3511, over 17306.00 frames. ], batch size: 49, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 10:49:36,295 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 10:49:51,735 INFO [train.py:1230] (0/4) Epoch 41, validation: loss=0.03537, ctc_loss=0.03537, cr_loss=1.035e-14, over 944034.00 frames. 2024-09-25 10:49:51,735 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 10:50:00,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-09-25 10:50:08,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727304.6666666666, ans=0.1 2024-09-25 10:50:28,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=727351.3333333334, ans=0.0 2024-09-25 10:51:15,277 INFO [train.py:1198] (0/4) Epoch 41, batch 50, loss[loss=0.1972, ctc_loss=0.127, cr_loss=0.351, over 16994.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3374, over 754135.41 frames. ], batch size: 53, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 10:51:17,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=727491.3333333334, ans=0.125 2024-09-25 10:51:36,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=727538.0, ans=0.125 2024-09-25 10:51:36,602 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.48 vs. limit=6.0 2024-09-25 10:51:41,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=727538.0, ans=0.125 2024-09-25 10:51:52,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=727584.6666666666, ans=0.0 2024-09-25 10:51:54,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-25 10:52:19,248 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.297e+02 1.379e+02 1.480e+02 1.921e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-25 10:52:22,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=727678.0, ans=0.125 2024-09-25 10:52:22,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2024-09-25 10:52:32,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=727678.0, ans=0.04949747468305833 2024-09-25 10:52:34,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=727724.6666666666, ans=0.0 2024-09-25 10:52:35,284 INFO [train.py:1198] (0/4) Epoch 41, batch 100, loss[loss=0.2058, ctc_loss=0.1322, cr_loss=0.3684, over 17148.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1225, cr_loss=0.3402, over 1338463.62 frames. ], batch size: 48, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 10:52:38,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=727724.6666666666, ans=0.0 2024-09-25 10:52:40,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=727724.6666666666, ans=0.125 2024-09-25 10:52:40,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727724.6666666666, ans=0.1 2024-09-25 10:53:02,957 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2024-09-25 10:53:15,816 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2024-09-25 10:54:00,639 INFO [train.py:1198] (0/4) Epoch 41, batch 150, loss[loss=0.2255, ctc_loss=0.148, cr_loss=0.3873, over 15118.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1221, cr_loss=0.338, over 1776715.54 frames. ], batch size: 89, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 10:54:01,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=727958.0, ans=0.05 2024-09-25 10:54:15,962 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-156000.pt 2024-09-25 10:54:33,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=728004.6666666666, ans=0.0 2024-09-25 10:54:53,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=728098.0, ans=0.0 2024-09-25 10:54:59,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=728098.0, ans=0.0 2024-09-25 10:55:02,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=728098.0, ans=0.05 2024-09-25 10:55:10,425 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.287e+02 1.355e+02 1.476e+02 1.968e+02, threshold=2.710e+02, percent-clipped=0.0 2024-09-25 10:55:27,697 INFO [train.py:1198] (0/4) Epoch 41, batch 200, loss[loss=0.2238, ctc_loss=0.1468, cr_loss=0.3847, over 14949.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1229, cr_loss=0.3392, over 2120384.41 frames. ], batch size: 89, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 10:56:08,745 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=22.5 2024-09-25 10:56:26,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=728331.3333333334, ans=0.125 2024-09-25 10:56:40,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=728378.0, ans=0.1 2024-09-25 10:56:47,730 INFO [train.py:1198] (0/4) Epoch 41, batch 250, loss[loss=0.167, ctc_loss=0.1053, cr_loss=0.3089, over 17290.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1225, cr_loss=0.338, over 2393726.91 frames. ], batch size: 42, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 10:57:03,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=728471.3333333334, ans=0.125 2024-09-25 10:57:30,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=728518.0, ans=0.125 2024-09-25 10:57:43,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=728564.6666666666, ans=0.025 2024-09-25 10:57:53,024 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.260e+02 1.343e+02 1.433e+02 1.845e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-25 10:57:58,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=728611.3333333334, ans=0.125 2024-09-25 10:58:07,361 INFO [train.py:1198] (0/4) Epoch 41, batch 300, loss[loss=0.1726, ctc_loss=0.1117, cr_loss=0.3045, over 17264.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1224, cr_loss=0.3384, over 2607386.55 frames. ], batch size: 44, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 10:58:14,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0 2024-09-25 10:58:29,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=728704.6666666666, ans=0.2 2024-09-25 10:58:34,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=728704.6666666666, ans=0.0 2024-09-25 10:58:44,850 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=12.0 2024-09-25 10:58:48,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=728751.3333333334, ans=0.0 2024-09-25 10:59:03,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=728798.0, ans=0.2 2024-09-25 10:59:03,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=728798.0, ans=0.2 2024-09-25 10:59:11,513 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 10:59:36,174 INFO [train.py:1198] (0/4) Epoch 41, batch 350, loss[loss=0.169, ctc_loss=0.1081, cr_loss=0.3041, over 17086.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1223, cr_loss=0.3374, over 2772016.93 frames. ], batch size: 43, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 10:59:36,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728891.3333333334, ans=0.1 2024-09-25 11:00:19,891 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-09-25 11:00:24,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2024-09-25 11:00:33,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2024-09-25 11:00:44,742 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.270e+02 1.343e+02 1.483e+02 2.666e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-25 11:00:59,198 INFO [train.py:1198] (0/4) Epoch 41, batch 400, loss[loss=0.1893, ctc_loss=0.122, cr_loss=0.3361, over 16745.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3392, over 2899515.71 frames. ], batch size: 61, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 11:01:13,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=729171.3333333334, ans=0.2 2024-09-25 11:01:28,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=729171.3333333334, ans=0.0 2024-09-25 11:01:31,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=729218.0, ans=0.0 2024-09-25 11:01:47,514 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=729264.6666666666, ans=0.125 2024-09-25 11:02:01,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=729311.3333333334, ans=0.0 2024-09-25 11:02:06,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=729311.3333333334, ans=0.125 2024-09-25 11:02:18,860 INFO [train.py:1198] (0/4) Epoch 41, batch 450, loss[loss=0.2247, ctc_loss=0.1462, cr_loss=0.393, over 14905.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1228, cr_loss=0.3399, over 2998461.63 frames. ], batch size: 89, lr: 2.91e-03, grad_scale: 32.0 2024-09-25 11:02:30,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=729358.0, ans=0.025 2024-09-25 11:03:01,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.45 vs. limit=22.5 2024-09-25 11:03:04,795 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2024-09-25 11:03:26,324 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.254e+02 1.327e+02 1.463e+02 1.935e+02, threshold=2.654e+02, percent-clipped=0.0 2024-09-25 11:03:26,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=729544.6666666666, ans=0.0 2024-09-25 11:03:41,631 INFO [train.py:1198] (0/4) Epoch 41, batch 500, loss[loss=0.2088, ctc_loss=0.1355, cr_loss=0.3666, over 17017.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1228, cr_loss=0.3399, over 3062739.65 frames. ], batch size: 56, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 11:04:31,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=729684.6666666666, ans=0.2 2024-09-25 11:04:46,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=729731.3333333334, ans=0.07 2024-09-25 11:04:50,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=729778.0, ans=0.125 2024-09-25 11:04:54,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=729778.0, ans=0.125 2024-09-25 11:05:07,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=22.5 2024-09-25 11:05:09,615 INFO [train.py:1198] (0/4) Epoch 41, batch 550, loss[loss=0.2002, ctc_loss=0.1284, cr_loss=0.3589, over 17031.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3396, over 3125550.52 frames. ], batch size: 51, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 11:05:57,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-09-25 11:06:18,378 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.304e+02 1.377e+02 1.509e+02 2.527e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 11:06:29,792 INFO [train.py:1198] (0/4) Epoch 41, batch 600, loss[loss=0.1852, ctc_loss=0.1191, cr_loss=0.3306, over 17018.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.122, cr_loss=0.339, over 3180246.84 frames. ], batch size: 51, lr: 2.91e-03, grad_scale: 8.0 2024-09-25 11:06:30,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730058.0, ans=0.1 2024-09-25 11:06:38,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=730058.0, ans=0.0 2024-09-25 11:06:39,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=730058.0, ans=0.0 2024-09-25 11:06:52,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=730104.6666666666, ans=0.125 2024-09-25 11:06:58,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=730104.6666666666, ans=0.125 2024-09-25 11:07:11,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=730151.3333333334, ans=0.025 2024-09-25 11:07:49,973 INFO [train.py:1198] (0/4) Epoch 41, batch 650, loss[loss=0.169, ctc_loss=0.1077, cr_loss=0.3065, over 17165.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1217, cr_loss=0.3384, over 3216248.83 frames. ], batch size: 45, lr: 2.91e-03, grad_scale: 8.0 2024-09-25 11:08:27,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730384.6666666666, ans=0.1 2024-09-25 11:08:29,368 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-09-25 11:08:33,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=730384.6666666666, ans=0.07 2024-09-25 11:09:06,925 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.080e+02 1.289e+02 1.341e+02 1.440e+02 1.849e+02, threshold=2.683e+02, percent-clipped=0.0 2024-09-25 11:09:18,185 INFO [train.py:1198] (0/4) Epoch 41, batch 700, loss[loss=0.1571, ctc_loss=0.09834, cr_loss=0.294, over 17108.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1227, cr_loss=0.3399, over 3240679.22 frames. ], batch size: 40, lr: 2.91e-03, grad_scale: 8.0 2024-09-25 11:10:20,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=730664.6666666666, ans=0.0 2024-09-25 11:10:40,885 INFO [train.py:1198] (0/4) Epoch 41, batch 750, loss[loss=0.1708, ctc_loss=0.1098, cr_loss=0.3047, over 17113.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1222, cr_loss=0.3384, over 3266357.47 frames. ], batch size: 49, lr: 2.91e-03, grad_scale: 8.0 2024-09-25 11:10:47,538 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 11:10:47,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=730758.0, ans=0.125 2024-09-25 11:11:00,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-09-25 11:11:01,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=730804.6666666666, ans=0.125 2024-09-25 11:11:08,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=730804.6666666666, ans=0.125 2024-09-25 11:11:46,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730944.6666666666, ans=0.125 2024-09-25 11:11:49,460 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.291e+02 1.357e+02 1.455e+02 2.788e+02, threshold=2.714e+02, percent-clipped=1.0 2024-09-25 11:11:53,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=730944.6666666666, ans=0.125 2024-09-25 11:12:00,720 INFO [train.py:1198] (0/4) Epoch 41, batch 800, loss[loss=0.1618, ctc_loss=0.1018, cr_loss=0.3, over 17065.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1227, cr_loss=0.3399, over 3295647.48 frames. ], batch size: 39, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 11:12:03,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.79 vs. limit=10.0 2024-09-25 11:12:13,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=730991.3333333334, ans=0.0 2024-09-25 11:13:18,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=731178.0, ans=0.0 2024-09-25 11:13:21,001 INFO [train.py:1198] (0/4) Epoch 41, batch 850, loss[loss=0.2026, ctc_loss=0.1293, cr_loss=0.3664, over 17014.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1237, cr_loss=0.3414, over 3303362.59 frames. ], batch size: 51, lr: 2.91e-03, grad_scale: 16.0 2024-09-25 11:14:00,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2024-09-25 11:14:34,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=731411.3333333334, ans=0.125 2024-09-25 11:14:35,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=731411.3333333334, ans=0.125 2024-09-25 11:14:37,169 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.274e+02 1.366e+02 1.473e+02 2.977e+02, threshold=2.732e+02, percent-clipped=1.0 2024-09-25 11:14:37,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=731411.3333333334, ans=0.125 2024-09-25 11:14:48,456 INFO [train.py:1198] (0/4) Epoch 41, batch 900, loss[loss=0.1855, ctc_loss=0.1168, cr_loss=0.3434, over 17232.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1235, cr_loss=0.3414, over 3326185.11 frames. ], batch size: 50, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:15:01,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=731458.0, ans=0.025 2024-09-25 11:15:23,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2024-09-25 11:15:28,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=731551.3333333334, ans=0.125 2024-09-25 11:15:35,532 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=12.0 2024-09-25 11:15:41,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=731598.0, ans=0.0 2024-09-25 11:15:41,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=731598.0, ans=0.125 2024-09-25 11:15:42,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=731598.0, ans=0.07 2024-09-25 11:15:47,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=731598.0, ans=0.1 2024-09-25 11:16:10,880 INFO [train.py:1198] (0/4) Epoch 41, batch 950, loss[loss=0.1847, ctc_loss=0.1197, cr_loss=0.325, over 16992.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1236, cr_loss=0.3415, over 3324172.91 frames. ], batch size: 51, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:16:22,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-25 11:16:27,918 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.67 vs. limit=15.0 2024-09-25 11:16:33,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731738.0, ans=0.1 2024-09-25 11:16:37,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=22.5 2024-09-25 11:17:18,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=731878.0, ans=15.0 2024-09-25 11:17:19,942 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.310e+02 1.391e+02 1.460e+02 2.998e+02, threshold=2.782e+02, percent-clipped=1.0 2024-09-25 11:17:25,642 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2024-09-25 11:17:31,155 INFO [train.py:1198] (0/4) Epoch 41, batch 1000, loss[loss=0.1704, ctc_loss=0.1081, cr_loss=0.3117, over 17274.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1227, cr_loss=0.3399, over 3339866.74 frames. ], batch size: 42, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:17:49,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731971.3333333334, ans=0.1 2024-09-25 11:17:49,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2024-09-25 11:18:14,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.35 vs. limit=10.0 2024-09-25 11:18:22,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=732064.6666666666, ans=0.125 2024-09-25 11:18:27,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=732064.6666666666, ans=0.025 2024-09-25 11:18:56,467 INFO [train.py:1198] (0/4) Epoch 41, batch 1050, loss[loss=0.2051, ctc_loss=0.1342, cr_loss=0.3546, over 15110.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1222, cr_loss=0.3391, over 3342139.20 frames. ], batch size: 89, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:19:10,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.75 vs. limit=10.0 2024-09-25 11:19:15,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=732204.6666666666, ans=0.125 2024-09-25 11:19:44,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=732251.3333333334, ans=0.025 2024-09-25 11:20:10,107 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.269e+02 1.332e+02 1.413e+02 1.822e+02, threshold=2.664e+02, percent-clipped=0.0 2024-09-25 11:20:21,422 INFO [train.py:1198] (0/4) Epoch 41, batch 1100, loss[loss=0.2088, ctc_loss=0.1359, cr_loss=0.3644, over 17309.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3395, over 3341737.15 frames. ], batch size: 51, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:20:42,602 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=732438.0, ans=0.2 2024-09-25 11:20:58,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732484.6666666666, ans=0.1 2024-09-25 11:21:02,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.06 vs. limit=12.0 2024-09-25 11:21:07,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2024-09-25 11:21:10,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=732531.3333333334, ans=0.125 2024-09-25 11:21:19,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732531.3333333334, ans=0.1 2024-09-25 11:21:37,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=732578.0, ans=0.0 2024-09-25 11:21:41,757 INFO [train.py:1198] (0/4) Epoch 41, batch 1150, loss[loss=0.1452, ctc_loss=0.09139, cr_loss=0.269, over 17033.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1222, cr_loss=0.3384, over 3346754.82 frames. ], batch size: 39, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:21:51,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732624.6666666666, ans=0.1 2024-09-25 11:22:39,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=732764.6666666666, ans=0.125 2024-09-25 11:22:40,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2024-09-25 11:22:41,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=732764.6666666666, ans=0.125 2024-09-25 11:22:50,423 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.312e+02 1.375e+02 1.453e+02 2.138e+02, threshold=2.750e+02, percent-clipped=0.0 2024-09-25 11:23:01,756 INFO [train.py:1198] (0/4) Epoch 41, batch 1200, loss[loss=0.166, ctc_loss=0.1066, cr_loss=0.2973, over 16663.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1226, cr_loss=0.3391, over 3355823.54 frames. ], batch size: 37, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:23:05,122 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 11:23:34,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2024-09-25 11:23:39,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=732951.3333333334, ans=0.0 2024-09-25 11:23:44,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=732951.3333333334, ans=0.2 2024-09-25 11:24:10,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732998.0, ans=0.1 2024-09-25 11:24:29,677 INFO [train.py:1198] (0/4) Epoch 41, batch 1250, loss[loss=0.1677, ctc_loss=0.1064, cr_loss=0.3064, over 17255.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1227, cr_loss=0.339, over 3349254.70 frames. ], batch size: 44, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:25:36,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.58 vs. limit=22.5 2024-09-25 11:25:40,678 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.293e+02 1.355e+02 1.454e+02 2.054e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-25 11:25:41,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-09-25 11:25:49,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=733278.0, ans=0.0 2024-09-25 11:25:49,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=733278.0, ans=0.125 2024-09-25 11:25:52,136 INFO [train.py:1198] (0/4) Epoch 41, batch 1300, loss[loss=0.1565, ctc_loss=0.09866, cr_loss=0.289, over 16321.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1226, cr_loss=0.3388, over 3356470.15 frames. ], batch size: 36, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:25:58,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733324.6666666666, ans=0.1 2024-09-25 11:26:43,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=733464.6666666666, ans=0.5 2024-09-25 11:26:46,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=733464.6666666666, ans=0.2 2024-09-25 11:27:12,355 INFO [train.py:1198] (0/4) Epoch 41, batch 1350, loss[loss=0.2184, ctc_loss=0.143, cr_loss=0.3769, over 16090.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1232, cr_loss=0.3397, over 3345990.67 frames. ], batch size: 74, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:27:17,967 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-25 11:27:19,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733558.0, ans=0.1 2024-09-25 11:27:36,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=733604.6666666666, ans=0.0 2024-09-25 11:27:53,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-09-25 11:28:20,789 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.331e+02 1.402e+02 1.512e+02 1.864e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-25 11:28:32,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=733744.6666666666, ans=0.0 2024-09-25 11:28:37,187 INFO [train.py:1198] (0/4) Epoch 41, batch 1400, loss[loss=0.191, ctc_loss=0.1239, cr_loss=0.3356, over 17298.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1222, cr_loss=0.3382, over 3354040.20 frames. ], batch size: 49, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:29:12,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=733884.6666666666, ans=0.0 2024-09-25 11:29:14,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.69 vs. limit=10.0 2024-09-25 11:29:40,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733931.3333333334, ans=0.1 2024-09-25 11:29:46,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733978.0, ans=0.1 2024-09-25 11:30:01,595 INFO [train.py:1198] (0/4) Epoch 41, batch 1450, loss[loss=0.2189, ctc_loss=0.1445, cr_loss=0.3721, over 16034.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1228, cr_loss=0.3394, over 3358069.20 frames. ], batch size: 74, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:30:08,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=734024.6666666666, ans=0.125 2024-09-25 11:30:39,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=734118.0, ans=0.05 2024-09-25 11:30:43,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=734118.0, ans=0.0 2024-09-25 11:30:47,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=734164.6666666666, ans=0.125 2024-09-25 11:30:47,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=734164.6666666666, ans=0.0 2024-09-25 11:31:10,081 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.281e+02 1.374e+02 1.455e+02 2.816e+02, threshold=2.747e+02, percent-clipped=1.0 2024-09-25 11:31:21,398 INFO [train.py:1198] (0/4) Epoch 41, batch 1500, loss[loss=0.1555, ctc_loss=0.09807, cr_loss=0.2872, over 17107.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1234, cr_loss=0.3398, over 3353910.70 frames. ], batch size: 40, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:31:41,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=734304.6666666666, ans=0.125 2024-09-25 11:31:53,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=734351.3333333334, ans=0.125 2024-09-25 11:31:56,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=734351.3333333334, ans=0.125 2024-09-25 11:32:22,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=734398.0, ans=0.125 2024-09-25 11:32:32,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=734444.6666666666, ans=0.125 2024-09-25 11:32:41,154 INFO [train.py:1198] (0/4) Epoch 41, batch 1550, loss[loss=0.2167, ctc_loss=0.1395, cr_loss=0.3858, over 17249.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1238, cr_loss=0.3405, over 3341590.30 frames. ], batch size: 55, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:32:41,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=734491.3333333334, ans=0.0 2024-09-25 11:32:43,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=734491.3333333334, ans=0.0 2024-09-25 11:32:59,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734538.0, ans=0.1 2024-09-25 11:33:05,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734538.0, ans=0.1 2024-09-25 11:33:21,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2024-09-25 11:33:35,213 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2024-09-25 11:33:41,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=734631.3333333334, ans=0.0 2024-09-25 11:33:42,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=734631.3333333334, ans=0.035 2024-09-25 11:33:55,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=734678.0, ans=0.125 2024-09-25 11:33:59,387 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.278e+02 1.349e+02 1.424e+02 1.781e+02, threshold=2.698e+02, percent-clipped=0.0 2024-09-25 11:33:59,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=734678.0, ans=0.125 2024-09-25 11:34:04,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=734678.0, ans=0.0 2024-09-25 11:34:09,067 INFO [train.py:1198] (0/4) Epoch 41, batch 1600, loss[loss=0.1803, ctc_loss=0.1173, cr_loss=0.3149, over 17298.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1233, cr_loss=0.3394, over 3341577.72 frames. ], batch size: 51, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:34:15,279 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2024-09-25 11:34:15,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=734724.6666666666, ans=0.0 2024-09-25 11:34:52,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734818.0, ans=0.1 2024-09-25 11:34:52,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734818.0, ans=0.1 2024-09-25 11:35:16,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=734911.3333333334, ans=0.125 2024-09-25 11:35:19,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=734911.3333333334, ans=10.0 2024-09-25 11:35:31,974 INFO [train.py:1198] (0/4) Epoch 41, batch 1650, loss[loss=0.2067, ctc_loss=0.1334, cr_loss=0.3664, over 17009.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1237, cr_loss=0.3406, over 3340475.03 frames. ], batch size: 53, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:35:44,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=734958.0, ans=0.2 2024-09-25 11:36:20,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=735098.0, ans=0.0 2024-09-25 11:36:42,043 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.308e+02 1.358e+02 1.444e+02 2.176e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 11:36:45,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=735144.6666666666, ans=0.125 2024-09-25 11:36:51,861 INFO [train.py:1198] (0/4) Epoch 41, batch 1700, loss[loss=0.1931, ctc_loss=0.1253, cr_loss=0.3389, over 17231.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1234, cr_loss=0.3407, over 3344018.73 frames. ], batch size: 55, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:37:22,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=735284.6666666666, ans=0.0 2024-09-25 11:38:12,454 INFO [train.py:1198] (0/4) Epoch 41, batch 1750, loss[loss=0.1871, ctc_loss=0.1189, cr_loss=0.3407, over 17271.00 frames. ], tot_loss[loss=0.1917, ctc_loss=0.1236, cr_loss=0.3408, over 3345739.43 frames. ], batch size: 44, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:38:32,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=735471.3333333334, ans=0.2 2024-09-25 11:38:49,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735518.0, ans=0.1 2024-09-25 11:38:50,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=735518.0, ans=0.125 2024-09-25 11:39:09,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=735564.6666666666, ans=0.125 2024-09-25 11:39:30,212 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.294e+02 1.365e+02 1.443e+02 2.325e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-25 11:39:39,719 INFO [train.py:1198] (0/4) Epoch 41, batch 1800, loss[loss=0.2179, ctc_loss=0.1419, cr_loss=0.3801, over 17015.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1237, cr_loss=0.3408, over 3356134.88 frames. ], batch size: 51, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:39:43,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=735658.0, ans=0.0 2024-09-25 11:40:04,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=735704.6666666666, ans=0.125 2024-09-25 11:40:24,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=735751.3333333334, ans=0.0 2024-09-25 11:40:48,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=735844.6666666666, ans=0.2 2024-09-25 11:41:00,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-09-25 11:41:02,239 INFO [train.py:1198] (0/4) Epoch 41, batch 1850, loss[loss=0.246, ctc_loss=0.167, cr_loss=0.3953, over 11906.00 frames. ], tot_loss[loss=0.1928, ctc_loss=0.1243, cr_loss=0.3422, over 3346848.59 frames. ], batch size: 124, lr: 2.90e-03, grad_scale: 32.0 2024-09-25 11:41:02,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=735891.3333333334, ans=0.0 2024-09-25 11:41:37,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=735984.6666666666, ans=0.125 2024-09-25 11:41:48,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=736031.3333333334, ans=0.125 2024-09-25 11:41:59,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=736031.3333333334, ans=0.125 2024-09-25 11:42:06,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=736078.0, ans=0.125 2024-09-25 11:42:13,779 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.264e+02 1.364e+02 1.473e+02 1.819e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-25 11:42:21,755 INFO [train.py:1198] (0/4) Epoch 41, batch 1900, loss[loss=0.1668, ctc_loss=0.1028, cr_loss=0.3201, over 17120.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1233, cr_loss=0.3398, over 3351454.04 frames. ], batch size: 40, lr: 2.90e-03, grad_scale: 16.0 2024-09-25 11:43:02,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=736218.0, ans=0.125 2024-09-25 11:43:38,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=736311.3333333334, ans=12.0 2024-09-25 11:43:39,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=736311.3333333334, ans=0.2 2024-09-25 11:43:50,148 INFO [train.py:1198] (0/4) Epoch 41, batch 1950, loss[loss=0.1673, ctc_loss=0.107, cr_loss=0.3016, over 17211.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1237, cr_loss=0.3407, over 3344799.10 frames. ], batch size: 50, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:43:57,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2024-09-25 11:44:05,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=736404.6666666666, ans=0.125 2024-09-25 11:44:42,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=736498.0, ans=0.0 2024-09-25 11:44:52,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.72 vs. limit=15.0 2024-09-25 11:45:00,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=736544.6666666666, ans=0.125 2024-09-25 11:45:05,169 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.301e+02 1.403e+02 1.536e+02 5.268e+02, threshold=2.807e+02, percent-clipped=2.0 2024-09-25 11:45:13,265 INFO [train.py:1198] (0/4) Epoch 41, batch 2000, loss[loss=0.2024, ctc_loss=0.1302, cr_loss=0.3611, over 17110.00 frames. ], tot_loss[loss=0.1923, ctc_loss=0.1239, cr_loss=0.3417, over 3338874.69 frames. ], batch size: 49, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 11:45:20,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=736591.3333333334, ans=0.125 2024-09-25 11:45:39,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=736638.0, ans=0.2 2024-09-25 11:46:10,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=736731.3333333334, ans=0.1 2024-09-25 11:46:12,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2024-09-25 11:46:13,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=736731.3333333334, ans=0.0 2024-09-25 11:46:32,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=736824.6666666666, ans=0.0 2024-09-25 11:46:34,056 INFO [train.py:1198] (0/4) Epoch 41, batch 2050, loss[loss=0.2179, ctc_loss=0.1429, cr_loss=0.3748, over 17049.00 frames. ], tot_loss[loss=0.1924, ctc_loss=0.124, cr_loss=0.3422, over 3342301.38 frames. ], batch size: 52, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:46:39,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2024-09-25 11:47:47,792 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.110e+02 1.289e+02 1.380e+02 1.485e+02 2.059e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-25 11:47:53,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=737058.0, ans=0.0 2024-09-25 11:47:54,264 INFO [train.py:1198] (0/4) Epoch 41, batch 2100, loss[loss=0.1975, ctc_loss=0.1275, cr_loss=0.3501, over 17084.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1236, cr_loss=0.3415, over 3342904.21 frames. ], batch size: 43, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:48:20,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=737104.6666666666, ans=0.0 2024-09-25 11:48:34,811 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 11:48:38,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.04 vs. limit=10.0 2024-09-25 11:49:21,663 INFO [train.py:1198] (0/4) Epoch 41, batch 2150, loss[loss=0.1849, ctc_loss=0.1204, cr_loss=0.3224, over 17204.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1233, cr_loss=0.341, over 3343362.44 frames. ], batch size: 47, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:49:23,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=737291.3333333334, ans=0.125 2024-09-25 11:49:28,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-09-25 11:49:50,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737338.0, ans=0.1 2024-09-25 11:50:10,153 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2024-09-25 11:50:14,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=737431.3333333334, ans=0.125 2024-09-25 11:50:16,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2024-09-25 11:50:25,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=737431.3333333334, ans=0.2 2024-09-25 11:50:26,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=737478.0, ans=0.125 2024-09-25 11:50:34,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=737478.0, ans=0.125 2024-09-25 11:50:37,767 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.274e+02 1.356e+02 1.503e+02 3.384e+02, threshold=2.711e+02, percent-clipped=1.0 2024-09-25 11:50:44,083 INFO [train.py:1198] (0/4) Epoch 41, batch 2200, loss[loss=0.1969, ctc_loss=0.131, cr_loss=0.3297, over 17363.00 frames. ], tot_loss[loss=0.1921, ctc_loss=0.1238, cr_loss=0.3417, over 3338963.93 frames. ], batch size: 48, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:51:35,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737664.6666666666, ans=0.1 2024-09-25 11:52:04,269 INFO [train.py:1198] (0/4) Epoch 41, batch 2250, loss[loss=0.1946, ctc_loss=0.1263, cr_loss=0.3417, over 17353.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1228, cr_loss=0.3401, over 3350341.22 frames. ], batch size: 48, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:52:04,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=737758.0, ans=0.0 2024-09-25 11:52:08,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=12.0 2024-09-25 11:52:28,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=737804.6666666666, ans=0.0 2024-09-25 11:52:28,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-09-25 11:52:57,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=737898.0, ans=0.95 2024-09-25 11:53:13,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=737944.6666666666, ans=0.125 2024-09-25 11:53:22,628 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.265e+02 1.339e+02 1.413e+02 1.733e+02, threshold=2.679e+02, percent-clipped=0.0 2024-09-25 11:53:29,038 INFO [train.py:1198] (0/4) Epoch 41, batch 2300, loss[loss=0.1982, ctc_loss=0.1276, cr_loss=0.3532, over 16895.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1235, cr_loss=0.3419, over 3357543.02 frames. ], batch size: 58, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:53:52,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=738038.0, ans=0.2 2024-09-25 11:53:57,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=738038.0, ans=0.07 2024-09-25 11:54:02,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738084.6666666666, ans=0.0 2024-09-25 11:54:06,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=738084.6666666666, ans=0.1 2024-09-25 11:54:26,087 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=738131.3333333334, ans=0.025 2024-09-25 11:54:53,983 INFO [train.py:1198] (0/4) Epoch 41, batch 2350, loss[loss=0.1809, ctc_loss=0.117, cr_loss=0.3196, over 17244.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1237, cr_loss=0.3423, over 3365542.22 frames. ], batch size: 44, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:54:57,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738224.6666666666, ans=0.0 2024-09-25 11:55:18,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738271.3333333334, ans=0.0 2024-09-25 11:55:23,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=738271.3333333334, ans=0.0 2024-09-25 11:55:40,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=738364.6666666666, ans=0.0 2024-09-25 11:55:43,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=738364.6666666666, ans=0.0 2024-09-25 11:55:50,585 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2024-09-25 11:56:01,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=738411.3333333334, ans=0.125 2024-09-25 11:56:06,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2024-09-25 11:56:07,456 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.297e+02 1.370e+02 1.455e+02 1.687e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-25 11:56:13,957 INFO [train.py:1198] (0/4) Epoch 41, batch 2400, loss[loss=0.2167, ctc_loss=0.1411, cr_loss=0.3779, over 17014.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1235, cr_loss=0.3415, over 3348925.11 frames. ], batch size: 53, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 11:56:20,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=738458.0, ans=0.0 2024-09-25 11:56:44,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=738551.3333333334, ans=0.125 2024-09-25 11:56:49,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=738551.3333333334, ans=0.0 2024-09-25 11:56:50,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=738551.3333333334, ans=0.125 2024-09-25 11:57:02,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=738598.0, ans=0.125 2024-09-25 11:57:33,334 INFO [train.py:1198] (0/4) Epoch 41, batch 2450, loss[loss=0.2136, ctc_loss=0.1387, cr_loss=0.3746, over 17020.00 frames. ], tot_loss[loss=0.1926, ctc_loss=0.1241, cr_loss=0.3425, over 3339861.27 frames. ], batch size: 52, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:57:35,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=738691.3333333334, ans=0.2 2024-09-25 11:57:36,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=738691.3333333334, ans=0.0 2024-09-25 11:57:45,330 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=15.0 2024-09-25 11:57:47,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=738738.0, ans=0.0 2024-09-25 11:58:14,010 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-09-25 11:58:55,647 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.283e+02 1.406e+02 1.500e+02 1.911e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 11:58:57,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=738878.0, ans=0.125 2024-09-25 11:59:00,468 INFO [train.py:1198] (0/4) Epoch 41, batch 2500, loss[loss=0.1924, ctc_loss=0.1238, cr_loss=0.343, over 17148.00 frames. ], tot_loss[loss=0.192, ctc_loss=0.1236, cr_loss=0.3421, over 3346220.57 frames. ], batch size: 48, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 11:59:10,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=738924.6666666666, ans=0.2 2024-09-25 11:59:13,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=738924.6666666666, ans=0.5 2024-09-25 11:59:14,121 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2024-09-25 11:59:50,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.32 vs. limit=10.0 2024-09-25 12:00:15,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=739111.3333333334, ans=0.0 2024-09-25 12:00:23,049 INFO [train.py:1198] (0/4) Epoch 41, batch 2550, loss[loss=0.2081, ctc_loss=0.1363, cr_loss=0.359, over 16899.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1234, cr_loss=0.3412, over 3350422.53 frames. ], batch size: 58, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:00:29,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=739158.0, ans=0.125 2024-09-25 12:00:40,024 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-25 12:00:55,995 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=12.0 2024-09-25 12:00:58,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=739251.3333333334, ans=0.125 2024-09-25 12:01:26,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2024-09-25 12:01:38,426 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.126e+02 1.313e+02 1.392e+02 1.468e+02 1.882e+02, threshold=2.785e+02, percent-clipped=0.0 2024-09-25 12:01:43,157 INFO [train.py:1198] (0/4) Epoch 41, batch 2600, loss[loss=0.1831, ctc_loss=0.1166, cr_loss=0.3326, over 16896.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1231, cr_loss=0.341, over 3356428.38 frames. ], batch size: 58, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:02:10,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=739438.0, ans=0.2 2024-09-25 12:02:13,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=739484.6666666666, ans=0.0 2024-09-25 12:02:16,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=739484.6666666666, ans=0.0 2024-09-25 12:03:07,751 INFO [train.py:1198] (0/4) Epoch 41, batch 2650, loss[loss=0.1686, ctc_loss=0.1066, cr_loss=0.3101, over 16387.00 frames. ], tot_loss[loss=0.1922, ctc_loss=0.1238, cr_loss=0.3424, over 3357854.50 frames. ], batch size: 36, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:03:19,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=739624.6666666666, ans=0.125 2024-09-25 12:03:34,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=739671.3333333334, ans=10.0 2024-09-25 12:03:55,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=739718.0, ans=0.125 2024-09-25 12:04:25,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2024-09-25 12:04:26,078 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.313e+02 1.405e+02 1.499e+02 1.845e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-25 12:04:30,841 INFO [train.py:1198] (0/4) Epoch 41, batch 2700, loss[loss=0.1294, ctc_loss=0.08045, cr_loss=0.2446, over 17292.00 frames. ], tot_loss[loss=0.1914, ctc_loss=0.1231, cr_loss=0.3414, over 3368295.62 frames. ], batch size: 42, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:04:38,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=739858.0, ans=0.0 2024-09-25 12:05:02,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=739904.6666666666, ans=0.0 2024-09-25 12:05:40,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=740044.6666666666, ans=0.125 2024-09-25 12:05:53,461 INFO [train.py:1198] (0/4) Epoch 41, batch 2750, loss[loss=0.1855, ctc_loss=0.1201, cr_loss=0.3269, over 17225.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1215, cr_loss=0.3383, over 3371721.63 frames. ], batch size: 55, lr: 2.89e-03, grad_scale: 16.0 2024-09-25 12:05:53,903 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=740091.3333333334, ans=0.125 2024-09-25 12:06:22,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=740138.0, ans=0.125 2024-09-25 12:06:48,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=740231.3333333334, ans=0.125 2024-09-25 12:06:48,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=740231.3333333334, ans=0.05 2024-09-25 12:06:51,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=740231.3333333334, ans=0.0 2024-09-25 12:06:53,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.33 vs. limit=6.0 2024-09-25 12:06:59,897 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=740278.0, ans=0.2 2024-09-25 12:07:09,100 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.274e+02 1.386e+02 1.486e+02 2.179e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 12:07:14,040 INFO [train.py:1198] (0/4) Epoch 41, batch 2800, loss[loss=0.2009, ctc_loss=0.1302, cr_loss=0.3532, over 17016.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.3383, over 3372119.72 frames. ], batch size: 56, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:07:16,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740324.6666666666, ans=0.1 2024-09-25 12:07:37,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=740371.3333333334, ans=0.5 2024-09-25 12:07:43,875 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=740371.3333333334, ans=0.0 2024-09-25 12:07:54,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=740418.0, ans=0.0 2024-09-25 12:08:01,537 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=15.0 2024-09-25 12:08:04,390 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-09-25 12:08:42,543 INFO [train.py:1198] (0/4) Epoch 41, batch 2850, loss[loss=0.2101, ctc_loss=0.1387, cr_loss=0.3573, over 17093.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1214, cr_loss=0.3376, over 3370425.45 frames. ], batch size: 49, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:08:58,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=740604.6666666666, ans=0.2 2024-09-25 12:09:11,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=740604.6666666666, ans=0.125 2024-09-25 12:09:13,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2024-09-25 12:09:13,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.55 vs. limit=22.5 2024-09-25 12:09:20,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-09-25 12:09:38,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740698.0, ans=0.1 2024-09-25 12:09:47,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=7.95 vs. limit=22.5 2024-09-25 12:09:51,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=740744.6666666666, ans=0.125 2024-09-25 12:09:55,723 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740744.6666666666, ans=0.125 2024-09-25 12:10:00,302 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.293e+02 1.358e+02 1.450e+02 1.925e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 12:10:00,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=740744.6666666666, ans=0.125 2024-09-25 12:10:05,285 INFO [train.py:1198] (0/4) Epoch 41, batch 2900, loss[loss=0.1966, ctc_loss=0.1276, cr_loss=0.3445, over 16997.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1215, cr_loss=0.3379, over 3369408.26 frames. ], batch size: 53, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:11:01,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740931.3333333334, ans=0.1 2024-09-25 12:11:08,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=740978.0, ans=0.125 2024-09-25 12:11:17,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=740978.0, ans=0.125 2024-09-25 12:11:25,601 INFO [train.py:1198] (0/4) Epoch 41, batch 2950, loss[loss=0.1973, ctc_loss=0.1269, cr_loss=0.3519, over 16895.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1217, cr_loss=0.3388, over 3371071.48 frames. ], batch size: 58, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:11:31,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.47 vs. limit=10.0 2024-09-25 12:11:37,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=741024.6666666666, ans=0.1 2024-09-25 12:12:19,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=15.0 2024-09-25 12:12:26,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=741164.6666666666, ans=0.125 2024-09-25 12:12:31,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=741211.3333333334, ans=0.05 2024-09-25 12:12:34,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=741211.3333333334, ans=0.125 2024-09-25 12:12:39,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=741211.3333333334, ans=0.0 2024-09-25 12:12:39,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=741211.3333333334, ans=0.125 2024-09-25 12:12:40,613 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.313e+02 1.388e+02 1.485e+02 2.724e+02, threshold=2.776e+02, percent-clipped=1.0 2024-09-25 12:12:45,295 INFO [train.py:1198] (0/4) Epoch 41, batch 3000, loss[loss=0.2262, ctc_loss=0.1481, cr_loss=0.3906, over 15872.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1223, cr_loss=0.3398, over 3370514.52 frames. ], batch size: 74, lr: 2.89e-03, grad_scale: 32.0 2024-09-25 12:12:45,296 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 12:13:00,796 INFO [train.py:1230] (0/4) Epoch 41, validation: loss=0.03575, ctc_loss=0.03575, cr_loss=9.81e-15, over 944034.00 frames. 2024-09-25 12:13:00,797 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 12:13:04,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=741258.0, ans=0.0 2024-09-25 12:13:05,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=741258.0, ans=0.125 2024-09-25 12:13:10,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=741258.0, ans=0.125 2024-09-25 12:13:15,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=741304.6666666666, ans=0.0 2024-09-25 12:13:34,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=741351.3333333334, ans=0.125 2024-09-25 12:13:34,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=741351.3333333334, ans=0.0 2024-09-25 12:13:36,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=741351.3333333334, ans=0.025 2024-09-25 12:13:58,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2024-09-25 12:14:02,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=741398.0, ans=0.125 2024-09-25 12:14:08,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=741444.6666666666, ans=0.125 2024-09-25 12:14:26,512 INFO [train.py:1198] (0/4) Epoch 41, batch 3050, loss[loss=0.2014, ctc_loss=0.1305, cr_loss=0.3547, over 16927.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1227, cr_loss=0.3407, over 3373886.59 frames. ], batch size: 58, lr: 2.88e-03, grad_scale: 32.0 2024-09-25 12:14:53,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=741538.0, ans=0.125 2024-09-25 12:14:56,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=741584.6666666666, ans=0.0 2024-09-25 12:15:01,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=8.0 2024-09-25 12:15:11,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=741631.3333333334, ans=0.0 2024-09-25 12:15:24,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=741631.3333333334, ans=0.0 2024-09-25 12:15:30,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=741678.0, ans=0.0 2024-09-25 12:15:39,715 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.278e+02 1.352e+02 1.472e+02 2.246e+02, threshold=2.705e+02, percent-clipped=0.0 2024-09-25 12:15:39,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=741678.0, ans=0.0 2024-09-25 12:15:43,811 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=12.0 2024-09-25 12:15:44,428 INFO [train.py:1198] (0/4) Epoch 41, batch 3100, loss[loss=0.2149, ctc_loss=0.1378, cr_loss=0.3856, over 17000.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3398, over 3381600.05 frames. ], batch size: 56, lr: 2.88e-03, grad_scale: 32.0 2024-09-25 12:16:00,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=741771.3333333334, ans=0.0 2024-09-25 12:16:39,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.68 vs. limit=10.0 2024-09-25 12:17:04,755 INFO [train.py:1198] (0/4) Epoch 41, batch 3150, loss[loss=0.2128, ctc_loss=0.1385, cr_loss=0.3713, over 17162.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.122, cr_loss=0.3399, over 3376099.04 frames. ], batch size: 45, lr: 2.88e-03, grad_scale: 32.0 2024-09-25 12:17:07,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.51 vs. limit=10.0 2024-09-25 12:17:08,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741958.0, ans=0.1 2024-09-25 12:17:41,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=742051.3333333334, ans=0.125 2024-09-25 12:17:49,435 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:17:54,198 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=742098.0, ans=0.0 2024-09-25 12:18:19,189 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.096e+02 1.276e+02 1.346e+02 1.473e+02 3.100e+02, threshold=2.691e+02, percent-clipped=1.0 2024-09-25 12:18:23,880 INFO [train.py:1198] (0/4) Epoch 41, batch 3200, loss[loss=0.2009, ctc_loss=0.1279, cr_loss=0.3648, over 17022.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1214, cr_loss=0.3379, over 3373092.05 frames. ], batch size: 51, lr: 2.88e-03, grad_scale: 32.0 2024-09-25 12:18:28,998 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:18:30,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=742191.3333333334, ans=0.125 2024-09-25 12:18:38,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=742238.0, ans=0.2 2024-09-25 12:18:41,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-09-25 12:19:00,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=742284.6666666666, ans=0.1 2024-09-25 12:19:03,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=742284.6666666666, ans=0.125 2024-09-25 12:19:09,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=742331.3333333334, ans=0.0 2024-09-25 12:19:11,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=742331.3333333334, ans=0.5 2024-09-25 12:19:42,198 INFO [train.py:1198] (0/4) Epoch 41, batch 3250, loss[loss=0.2491, ctc_loss=0.1717, cr_loss=0.3873, over 11829.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1215, cr_loss=0.338, over 3366999.20 frames. ], batch size: 123, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:20:22,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=22.5 2024-09-25 12:20:25,171 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2024-09-25 12:20:29,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=742564.6666666666, ans=0.125 2024-09-25 12:20:35,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=742564.6666666666, ans=0.125 2024-09-25 12:20:58,869 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.297e+02 1.390e+02 1.461e+02 1.762e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 12:21:00,473 INFO [train.py:1198] (0/4) Epoch 41, batch 3300, loss[loss=0.1783, ctc_loss=0.1128, cr_loss=0.3276, over 17262.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1214, cr_loss=0.338, over 3363773.80 frames. ], batch size: 42, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:21:06,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=742658.0, ans=0.125 2024-09-25 12:21:08,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=742658.0, ans=0.2 2024-09-25 12:21:22,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742704.6666666666, ans=0.1 2024-09-25 12:21:32,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=742751.3333333334, ans=0.125 2024-09-25 12:21:57,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=742798.0, ans=0.125 2024-09-25 12:22:08,034 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:22:08,051 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=742844.6666666666, ans=0.0 2024-09-25 12:22:11,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=742844.6666666666, ans=0.125 2024-09-25 12:22:17,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=742891.3333333334, ans=0.125 2024-09-25 12:22:18,688 INFO [train.py:1198] (0/4) Epoch 41, batch 3350, loss[loss=0.2063, ctc_loss=0.1329, cr_loss=0.3665, over 17014.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1212, cr_loss=0.3374, over 3365782.36 frames. ], batch size: 53, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:22:23,733 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=742891.3333333334, ans=0.125 2024-09-25 12:22:31,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=742891.3333333334, ans=0.0 2024-09-25 12:22:39,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=742938.0, ans=0.1 2024-09-25 12:22:46,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2024-09-25 12:22:49,343 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.74 vs. limit=10.0 2024-09-25 12:22:55,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-25 12:23:05,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=743031.3333333334, ans=0.0 2024-09-25 12:23:35,165 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.288e+02 1.390e+02 1.522e+02 3.340e+02, threshold=2.781e+02, percent-clipped=2.0 2024-09-25 12:23:36,784 INFO [train.py:1198] (0/4) Epoch 41, batch 3400, loss[loss=0.2168, ctc_loss=0.1426, cr_loss=0.3712, over 15015.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1222, cr_loss=0.3385, over 3344922.74 frames. ], batch size: 89, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:23:53,203 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-25 12:23:59,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=743171.3333333334, ans=0.0 2024-09-25 12:24:30,506 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2024-09-25 12:24:42,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=743311.3333333334, ans=0.125 2024-09-25 12:24:54,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=743311.3333333334, ans=0.035 2024-09-25 12:25:00,982 INFO [train.py:1198] (0/4) Epoch 41, batch 3450, loss[loss=0.1719, ctc_loss=0.1083, cr_loss=0.318, over 16253.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1229, cr_loss=0.3401, over 3344950.52 frames. ], batch size: 36, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:25:04,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=743358.0, ans=0.2 2024-09-25 12:25:08,961 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:25:15,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=743404.6666666666, ans=0.125 2024-09-25 12:25:19,859 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:25:26,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=743404.6666666666, ans=0.125 2024-09-25 12:25:40,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743451.3333333334, ans=0.1 2024-09-25 12:25:46,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=743498.0, ans=0.125 2024-09-25 12:26:17,207 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.301e+02 1.363e+02 1.481e+02 3.003e+02, threshold=2.726e+02, percent-clipped=1.0 2024-09-25 12:26:18,749 INFO [train.py:1198] (0/4) Epoch 41, batch 3500, loss[loss=0.1956, ctc_loss=0.1221, cr_loss=0.3671, over 17030.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1225, cr_loss=0.3397, over 3342361.12 frames. ], batch size: 51, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:26:36,064 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2024-09-25 12:27:11,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=743731.3333333334, ans=0.125 2024-09-25 12:27:39,045 INFO [train.py:1198] (0/4) Epoch 41, batch 3550, loss[loss=0.202, ctc_loss=0.1321, cr_loss=0.3492, over 16548.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1221, cr_loss=0.3391, over 3352727.59 frames. ], batch size: 66, lr: 2.88e-03, grad_scale: 8.0 2024-09-25 12:27:46,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2024-09-25 12:27:56,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=743871.3333333334, ans=0.05 2024-09-25 12:27:57,243 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-25 12:28:05,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=743871.3333333334, ans=0.05 2024-09-25 12:28:12,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=743918.0, ans=0.125 2024-09-25 12:28:15,205 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743918.0, ans=0.1 2024-09-25 12:28:42,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=744011.3333333334, ans=0.125 2024-09-25 12:28:51,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=744011.3333333334, ans=0.05 2024-09-25 12:28:55,696 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.288e+02 1.370e+02 1.460e+02 2.719e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-25 12:28:57,317 INFO [train.py:1198] (0/4) Epoch 41, batch 3600, loss[loss=0.2027, ctc_loss=0.1339, cr_loss=0.344, over 16033.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1223, cr_loss=0.3393, over 3356656.62 frames. ], batch size: 74, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:29:56,109 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2024-09-25 12:29:56,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=744198.0, ans=0.0 2024-09-25 12:29:59,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2024-09-25 12:30:15,425 INFO [train.py:1198] (0/4) Epoch 41, batch 3650, loss[loss=0.1697, ctc_loss=0.1098, cr_loss=0.2996, over 17286.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1218, cr_loss=0.3386, over 3360769.92 frames. ], batch size: 46, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:30:40,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=744338.0, ans=0.025 2024-09-25 12:30:51,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744384.6666666666, ans=0.1 2024-09-25 12:31:13,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2024-09-25 12:31:31,160 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:31:32,488 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.315e+02 1.407e+02 1.494e+02 1.743e+02, threshold=2.814e+02, percent-clipped=0.0 2024-09-25 12:31:34,121 INFO [train.py:1198] (0/4) Epoch 41, batch 3700, loss[loss=0.2084, ctc_loss=0.1373, cr_loss=0.3557, over 17028.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3394, over 3365426.51 frames. ], batch size: 52, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:31:45,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=744524.6666666666, ans=0.125 2024-09-25 12:31:50,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.75 vs. limit=22.5 2024-09-25 12:31:56,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=744571.3333333334, ans=0.0 2024-09-25 12:32:12,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744618.0, ans=0.1 2024-09-25 12:32:12,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=744618.0, ans=0.2 2024-09-25 12:32:17,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=744618.0, ans=0.025 2024-09-25 12:32:21,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744664.6666666666, ans=0.1 2024-09-25 12:32:26,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=744664.6666666666, ans=0.125 2024-09-25 12:32:53,081 INFO [train.py:1198] (0/4) Epoch 41, batch 3750, loss[loss=0.1406, ctc_loss=0.08459, cr_loss=0.28, over 16710.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1212, cr_loss=0.3374, over 3366100.57 frames. ], batch size: 37, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:33:04,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=744758.0, ans=0.125 2024-09-25 12:33:27,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=744851.3333333334, ans=0.125 2024-09-25 12:33:27,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=744851.3333333334, ans=0.125 2024-09-25 12:34:11,725 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.288e+02 1.369e+02 1.453e+02 1.957e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 12:34:14,054 INFO [train.py:1198] (0/4) Epoch 41, batch 3800, loss[loss=0.2042, ctc_loss=0.131, cr_loss=0.3657, over 16995.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1219, cr_loss=0.3394, over 3350914.75 frames. ], batch size: 44, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:34:39,982 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.01 vs. limit=15.0 2024-09-25 12:34:42,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2024-09-25 12:34:51,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=745084.6666666666, ans=0.0 2024-09-25 12:35:05,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745131.3333333334, ans=0.1 2024-09-25 12:35:08,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=745131.3333333334, ans=0.125 2024-09-25 12:35:15,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=745178.0, ans=0.125 2024-09-25 12:35:16,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=745178.0, ans=0.125 2024-09-25 12:35:32,599 INFO [train.py:1198] (0/4) Epoch 41, batch 3850, loss[loss=0.181, ctc_loss=0.1159, cr_loss=0.3252, over 16384.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1223, cr_loss=0.3385, over 3321929.10 frames. ], batch size: 36, lr: 2.88e-03, grad_scale: 16.0 2024-09-25 12:35:32,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=745224.6666666666, ans=0.125 2024-09-25 12:35:39,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745224.6666666666, ans=0.1 2024-09-25 12:35:54,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=745271.3333333334, ans=0.0 2024-09-25 12:35:59,433 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:36:36,238 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-09-25 12:36:42,527 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-41.pt 2024-09-25 12:37:33,473 INFO [train.py:1198] (0/4) Epoch 42, batch 0, loss[loss=0.1757, ctc_loss=0.1082, cr_loss=0.3372, over 17225.00 frames. ], tot_loss[loss=0.1757, ctc_loss=0.1082, cr_loss=0.3372, over 17225.00 frames. ], batch size: 50, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:37:33,474 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 12:37:48,887 INFO [train.py:1230] (0/4) Epoch 42, validation: loss=0.03453, ctc_loss=0.03453, cr_loss=1.019e-14, over 944034.00 frames. 2024-09-25 12:37:48,888 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 12:37:53,632 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.354e+02 1.491e+02 1.700e+02 3.066e+02, threshold=2.981e+02, percent-clipped=1.0 2024-09-25 12:38:06,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745486.0, ans=0.1 2024-09-25 12:38:16,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=745486.0, ans=0.125 2024-09-25 12:38:42,308 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2024-09-25 12:38:55,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=745626.0, ans=0.0 2024-09-25 12:39:04,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=745626.0, ans=0.0 2024-09-25 12:39:11,033 INFO [train.py:1198] (0/4) Epoch 42, batch 50, loss[loss=0.1589, ctc_loss=0.1015, cr_loss=0.287, over 17101.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1212, cr_loss=0.341, over 764778.99 frames. ], batch size: 43, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:39:23,998 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=22.5 2024-09-25 12:39:38,697 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-09-25 12:39:43,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=745719.3333333334, ans=0.125 2024-09-25 12:39:48,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=745766.0, ans=10.0 2024-09-25 12:40:15,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-09-25 12:40:37,093 INFO [train.py:1198] (0/4) Epoch 42, batch 100, loss[loss=0.2388, ctc_loss=0.1638, cr_loss=0.375, over 11838.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1226, cr_loss=0.3414, over 1325673.42 frames. ], batch size: 125, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:40:41,769 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.296e+02 1.378e+02 1.504e+02 1.895e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 12:41:15,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=745999.3333333334, ans=0.0 2024-09-25 12:41:22,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-09-25 12:41:51,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=746092.6666666666, ans=0.035 2024-09-25 12:41:59,755 INFO [train.py:1198] (0/4) Epoch 42, batch 150, loss[loss=0.1393, ctc_loss=0.08932, cr_loss=0.2498, over 16789.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1224, cr_loss=0.3405, over 1769831.44 frames. ], batch size: 37, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:42:03,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2024-09-25 12:42:06,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=746139.3333333334, ans=0.2 2024-09-25 12:42:12,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=746139.3333333334, ans=0.125 2024-09-25 12:42:42,237 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-09-25 12:42:59,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746279.3333333334, ans=0.0 2024-09-25 12:42:59,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=746279.3333333334, ans=0.0 2024-09-25 12:43:01,452 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=12.0 2024-09-25 12:43:16,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=746326.0, ans=0.1 2024-09-25 12:43:19,882 INFO [train.py:1198] (0/4) Epoch 42, batch 200, loss[loss=0.1776, ctc_loss=0.1122, cr_loss=0.3273, over 17030.00 frames. ], tot_loss[loss=0.1919, ctc_loss=0.1234, cr_loss=0.3425, over 2120699.19 frames. ], batch size: 44, lr: 2.84e-03, grad_scale: 16.0 2024-09-25 12:43:20,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=746372.6666666666, ans=0.0 2024-09-25 12:43:26,430 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.308e+02 1.400e+02 1.491e+02 1.853e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-25 12:43:50,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=746419.3333333334, ans=0.05 2024-09-25 12:43:55,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=746466.0, ans=0.1 2024-09-25 12:44:06,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=746466.0, ans=0.125 2024-09-25 12:44:19,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=746512.6666666666, ans=0.0 2024-09-25 12:44:25,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=746512.6666666666, ans=0.05 2024-09-25 12:44:39,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=746559.3333333334, ans=0.125 2024-09-25 12:44:43,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=746559.3333333334, ans=0.0 2024-09-25 12:44:45,844 INFO [train.py:1198] (0/4) Epoch 42, batch 250, loss[loss=0.2168, ctc_loss=0.1397, cr_loss=0.3858, over 16973.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.123, cr_loss=0.3414, over 2382622.33 frames. ], batch size: 53, lr: 2.84e-03, grad_scale: 16.0 2024-09-25 12:45:05,493 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-160000.pt 2024-09-25 12:45:24,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.62 vs. limit=6.0 2024-09-25 12:45:30,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-09-25 12:45:53,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=746792.6666666666, ans=0.0 2024-09-25 12:46:06,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=746792.6666666666, ans=0.5 2024-09-25 12:46:13,666 INFO [train.py:1198] (0/4) Epoch 42, batch 300, loss[loss=0.2347, ctc_loss=0.1551, cr_loss=0.3984, over 11841.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1224, cr_loss=0.3406, over 2598100.28 frames. ], batch size: 123, lr: 2.84e-03, grad_scale: 16.0 2024-09-25 12:46:14,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=746839.3333333334, ans=0.05 2024-09-25 12:46:15,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=746839.3333333334, ans=0.125 2024-09-25 12:46:20,041 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.307e+02 1.390e+02 1.481e+02 2.059e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 12:47:21,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2024-09-25 12:47:30,209 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-25 12:47:33,914 INFO [train.py:1198] (0/4) Epoch 42, batch 350, loss[loss=0.2155, ctc_loss=0.1393, cr_loss=0.3811, over 14998.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1226, cr_loss=0.3406, over 2761342.03 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 16.0 2024-09-25 12:47:55,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=747119.3333333334, ans=0.035 2024-09-25 12:48:02,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=747119.3333333334, ans=0.025 2024-09-25 12:48:17,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747166.0, ans=0.1 2024-09-25 12:48:18,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747166.0, ans=0.125 2024-09-25 12:48:20,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=747212.6666666666, ans=0.1 2024-09-25 12:48:28,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=747212.6666666666, ans=0.125 2024-09-25 12:48:47,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747259.3333333334, ans=0.1 2024-09-25 12:48:53,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747259.3333333334, ans=0.1 2024-09-25 12:48:56,552 INFO [train.py:1198] (0/4) Epoch 42, batch 400, loss[loss=0.208, ctc_loss=0.1362, cr_loss=0.3591, over 17290.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1223, cr_loss=0.3403, over 2895280.32 frames. ], batch size: 46, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:49:02,847 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.303e+02 1.377e+02 1.468e+02 2.064e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 12:49:11,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=747352.6666666666, ans=0.025 2024-09-25 12:49:58,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=747446.0, ans=0.0 2024-09-25 12:50:04,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747492.6666666666, ans=0.125 2024-09-25 12:50:21,630 INFO [train.py:1198] (0/4) Epoch 42, batch 450, loss[loss=0.188, ctc_loss=0.1211, cr_loss=0.3345, over 17299.00 frames. ], tot_loss[loss=0.1916, ctc_loss=0.1232, cr_loss=0.3421, over 2999173.91 frames. ], batch size: 49, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:50:23,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=747539.3333333334, ans=15.0 2024-09-25 12:50:28,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=747539.3333333334, ans=0.1 2024-09-25 12:51:14,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=747679.3333333334, ans=0.125 2024-09-25 12:51:17,758 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=747679.3333333334, ans=0.125 2024-09-25 12:51:20,864 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 12:51:44,413 INFO [train.py:1198] (0/4) Epoch 42, batch 500, loss[loss=0.1537, ctc_loss=0.09491, cr_loss=0.2939, over 17028.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1221, cr_loss=0.3406, over 3086572.12 frames. ], batch size: 39, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:51:50,867 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.301e+02 1.386e+02 1.488e+02 2.359e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-25 12:52:15,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=747866.0, ans=0.125 2024-09-25 12:52:18,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=747866.0, ans=0.125 2024-09-25 12:52:35,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=747912.6666666666, ans=0.125 2024-09-25 12:52:49,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.19 vs. limit=15.0 2024-09-25 12:53:04,114 INFO [train.py:1198] (0/4) Epoch 42, batch 550, loss[loss=0.1763, ctc_loss=0.1109, cr_loss=0.3267, over 16996.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1221, cr_loss=0.3399, over 3150214.34 frames. ], batch size: 44, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:53:13,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=748006.0, ans=0.125 2024-09-25 12:53:45,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=748099.3333333334, ans=0.025 2024-09-25 12:53:56,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=748146.0, ans=0.125 2024-09-25 12:54:06,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748146.0, ans=0.125 2024-09-25 12:54:29,012 INFO [train.py:1198] (0/4) Epoch 42, batch 600, loss[loss=0.1699, ctc_loss=0.1062, cr_loss=0.3182, over 17269.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1229, cr_loss=0.341, over 3197278.78 frames. ], batch size: 42, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:54:35,341 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.284e+02 1.358e+02 1.453e+02 2.155e+02, threshold=2.716e+02, percent-clipped=0.0 2024-09-25 12:54:56,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748286.0, ans=0.1 2024-09-25 12:54:56,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=748286.0, ans=0.125 2024-09-25 12:55:04,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=748332.6666666666, ans=0.2 2024-09-25 12:55:09,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=748332.6666666666, ans=0.0 2024-09-25 12:55:39,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748426.0, ans=0.1 2024-09-25 12:55:51,815 INFO [train.py:1198] (0/4) Epoch 42, batch 650, loss[loss=0.2433, ctc_loss=0.1628, cr_loss=0.4025, over 11607.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1232, cr_loss=0.3414, over 3226092.06 frames. ], batch size: 123, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:55:56,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=748472.6666666666, ans=0.0 2024-09-25 12:56:03,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-25 12:56:15,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=748519.3333333334, ans=0.125 2024-09-25 12:56:25,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=748566.0, ans=0.125 2024-09-25 12:56:28,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748566.0, ans=0.125 2024-09-25 12:56:42,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=748612.6666666666, ans=0.025 2024-09-25 12:56:57,884 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=12.0 2024-09-25 12:57:05,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=748659.3333333334, ans=0.125 2024-09-25 12:57:07,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=748659.3333333334, ans=0.125 2024-09-25 12:57:14,805 INFO [train.py:1198] (0/4) Epoch 42, batch 700, loss[loss=0.1834, ctc_loss=0.1186, cr_loss=0.3239, over 17087.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.123, cr_loss=0.3406, over 3255546.59 frames. ], batch size: 49, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:57:21,246 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.310e+02 1.390e+02 1.481e+02 1.937e+02, threshold=2.780e+02, percent-clipped=0.0 2024-09-25 12:57:36,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=748752.6666666666, ans=0.0 2024-09-25 12:57:54,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-25 12:58:06,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=748846.0, ans=0.125 2024-09-25 12:58:07,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=748846.0, ans=0.0 2024-09-25 12:58:34,915 INFO [train.py:1198] (0/4) Epoch 42, batch 750, loss[loss=0.178, ctc_loss=0.1153, cr_loss=0.3137, over 16253.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1227, cr_loss=0.3399, over 3281155.43 frames. ], batch size: 36, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 12:58:54,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2024-09-25 12:59:02,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2024-09-25 12:59:03,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=22.5 2024-09-25 12:59:06,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748986.0, ans=0.1 2024-09-25 12:59:35,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=749079.3333333334, ans=0.125 2024-09-25 12:59:49,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=749126.0, ans=0.125 2024-09-25 12:59:52,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=749126.0, ans=0.125 2024-09-25 13:00:00,474 INFO [train.py:1198] (0/4) Epoch 42, batch 800, loss[loss=0.1559, ctc_loss=0.09599, cr_loss=0.2997, over 17269.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1224, cr_loss=0.3392, over 3279375.62 frames. ], batch size: 42, lr: 2.84e-03, grad_scale: 32.0 2024-09-25 13:00:06,822 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.092e+02 1.303e+02 1.405e+02 1.532e+02 1.999e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-25 13:00:20,220 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-09-25 13:00:41,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=749266.0, ans=0.025 2024-09-25 13:01:15,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749359.3333333334, ans=0.1 2024-09-25 13:01:22,381 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2024-09-25 13:01:26,599 INFO [train.py:1198] (0/4) Epoch 42, batch 850, loss[loss=0.1846, ctc_loss=0.1155, cr_loss=0.3458, over 17151.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1229, cr_loss=0.3407, over 3303724.15 frames. ], batch size: 45, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:01:38,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=749406.0, ans=0.125 2024-09-25 13:01:51,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=749452.6666666666, ans=0.0 2024-09-25 13:01:53,599 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=15.0 2024-09-25 13:02:00,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=749499.3333333334, ans=0.125 2024-09-25 13:02:04,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2024-09-25 13:02:12,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=749499.3333333334, ans=0.025 2024-09-25 13:02:30,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=749592.6666666666, ans=0.125 2024-09-25 13:02:47,618 INFO [train.py:1198] (0/4) Epoch 42, batch 900, loss[loss=0.1825, ctc_loss=0.1158, cr_loss=0.3337, over 17262.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1229, cr_loss=0.3409, over 3311460.36 frames. ], batch size: 44, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:02:53,984 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.278e+02 1.357e+02 1.447e+02 3.889e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-25 13:03:09,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=749686.0, ans=0.0 2024-09-25 13:03:15,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=749686.0, ans=0.125 2024-09-25 13:03:20,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=749732.6666666666, ans=0.2 2024-09-25 13:03:27,366 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=22.5 2024-09-25 13:03:49,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=749779.3333333334, ans=0.125 2024-09-25 13:03:49,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=749779.3333333334, ans=0.125 2024-09-25 13:03:51,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2024-09-25 13:03:52,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=749779.3333333334, ans=0.1 2024-09-25 13:04:07,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=15.0 2024-09-25 13:04:11,574 INFO [train.py:1198] (0/4) Epoch 42, batch 950, loss[loss=0.1855, ctc_loss=0.1204, cr_loss=0.3252, over 17355.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1229, cr_loss=0.3413, over 3325671.94 frames. ], batch size: 48, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:04:35,840 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-09-25 13:04:58,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=749966.0, ans=0.025 2024-09-25 13:05:04,657 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=750012.6666666666, ans=0.0 2024-09-25 13:05:28,711 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2024-09-25 13:05:37,407 INFO [train.py:1198] (0/4) Epoch 42, batch 1000, loss[loss=0.2058, ctc_loss=0.1322, cr_loss=0.3681, over 16926.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1219, cr_loss=0.3387, over 3336643.40 frames. ], batch size: 58, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:05:41,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=12.0 2024-09-25 13:05:43,608 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.041e+02 1.283e+02 1.358e+02 1.462e+02 1.840e+02, threshold=2.715e+02, percent-clipped=0.0 2024-09-25 13:05:59,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=750152.6666666666, ans=0.95 2024-09-25 13:06:31,915 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=12.0 2024-09-25 13:06:43,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=750292.6666666666, ans=0.125 2024-09-25 13:06:59,678 INFO [train.py:1198] (0/4) Epoch 42, batch 1050, loss[loss=0.1907, ctc_loss=0.1218, cr_loss=0.3445, over 17021.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1229, cr_loss=0.3399, over 3327193.39 frames. ], batch size: 51, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:07:29,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=750386.0, ans=0.0 2024-09-25 13:07:30,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=750432.6666666666, ans=0.0 2024-09-25 13:07:35,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=750432.6666666666, ans=0.125 2024-09-25 13:07:37,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=750432.6666666666, ans=0.0 2024-09-25 13:07:54,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750479.3333333334, ans=0.1 2024-09-25 13:07:55,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.02 vs. limit=12.0 2024-09-25 13:08:03,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=750526.0, ans=0.0 2024-09-25 13:08:04,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=750526.0, ans=0.0 2024-09-25 13:08:17,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=750526.0, ans=0.1 2024-09-25 13:08:17,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=750526.0, ans=0.125 2024-09-25 13:08:20,230 INFO [train.py:1198] (0/4) Epoch 42, batch 1100, loss[loss=0.1862, ctc_loss=0.1198, cr_loss=0.332, over 16975.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1236, cr_loss=0.3411, over 3320069.61 frames. ], batch size: 56, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:08:25,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=750572.6666666666, ans=0.0 2024-09-25 13:08:28,292 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.338e+02 1.422e+02 1.527e+02 1.799e+02, threshold=2.844e+02, percent-clipped=0.0 2024-09-25 13:08:58,063 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750666.0, ans=0.1 2024-09-25 13:09:18,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=750712.6666666666, ans=0.125 2024-09-25 13:09:19,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=750712.6666666666, ans=0.125 2024-09-25 13:09:28,083 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-25 13:09:28,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750759.3333333334, ans=0.1 2024-09-25 13:09:37,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=22.5 2024-09-25 13:09:44,705 INFO [train.py:1198] (0/4) Epoch 42, batch 1150, loss[loss=0.1962, ctc_loss=0.127, cr_loss=0.3462, over 17141.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1233, cr_loss=0.3412, over 3338390.33 frames. ], batch size: 48, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:09:54,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=750806.0, ans=0.125 2024-09-25 13:09:56,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=750806.0, ans=0.125 2024-09-25 13:09:56,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=750806.0, ans=0.1 2024-09-25 13:10:09,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=750852.6666666666, ans=0.125 2024-09-25 13:10:15,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=750852.6666666666, ans=0.2 2024-09-25 13:10:21,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=750899.3333333334, ans=0.125 2024-09-25 13:10:22,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=750899.3333333334, ans=0.0 2024-09-25 13:10:37,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.10 vs. limit=15.0 2024-09-25 13:10:54,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=750992.6666666666, ans=0.0 2024-09-25 13:10:58,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=750992.6666666666, ans=0.125 2024-09-25 13:11:09,662 INFO [train.py:1198] (0/4) Epoch 42, batch 1200, loss[loss=0.1438, ctc_loss=0.09207, cr_loss=0.2588, over 17277.00 frames. ], tot_loss[loss=0.1915, ctc_loss=0.1233, cr_loss=0.3411, over 3338409.96 frames. ], batch size: 42, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:11:09,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=751039.3333333334, ans=0.0 2024-09-25 13:11:17,515 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.296e+02 1.388e+02 1.485e+02 1.813e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 13:12:21,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=751226.0, ans=0.2 2024-09-25 13:12:29,347 INFO [train.py:1198] (0/4) Epoch 42, batch 1250, loss[loss=0.1956, ctc_loss=0.1233, cr_loss=0.3614, over 17019.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1226, cr_loss=0.3405, over 3345323.90 frames. ], batch size: 44, lr: 2.83e-03, grad_scale: 32.0 2024-09-25 13:12:33,185 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.10 vs. limit=10.0 2024-09-25 13:12:33,503 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2024-09-25 13:12:38,077 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2024-09-25 13:12:49,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=751319.3333333334, ans=0.0 2024-09-25 13:12:51,181 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2024-09-25 13:12:57,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=15.0 2024-09-25 13:13:09,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=751366.0, ans=0.0 2024-09-25 13:13:13,252 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-09-25 13:13:43,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=751459.3333333334, ans=0.125 2024-09-25 13:13:51,191 INFO [train.py:1198] (0/4) Epoch 42, batch 1300, loss[loss=0.1417, ctc_loss=0.09012, cr_loss=0.2578, over 16709.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1221, cr_loss=0.3398, over 3351497.63 frames. ], batch size: 37, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:13:55,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=22.5 2024-09-25 13:13:58,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=751506.0, ans=0.125 2024-09-25 13:14:00,855 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.307e+02 1.370e+02 1.468e+02 2.127e+02, threshold=2.740e+02, percent-clipped=0.0 2024-09-25 13:14:10,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=751552.6666666666, ans=0.025 2024-09-25 13:14:19,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=751552.6666666666, ans=0.07 2024-09-25 13:14:29,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=751599.3333333334, ans=0.0 2024-09-25 13:14:36,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=751599.3333333334, ans=0.0 2024-09-25 13:14:52,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2024-09-25 13:14:58,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751692.6666666666, ans=0.1 2024-09-25 13:15:06,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=751692.6666666666, ans=0.0 2024-09-25 13:15:16,745 INFO [train.py:1198] (0/4) Epoch 42, batch 1350, loss[loss=0.1773, ctc_loss=0.1148, cr_loss=0.3124, over 17186.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1211, cr_loss=0.3379, over 3362681.85 frames. ], batch size: 41, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:15:26,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=751739.3333333334, ans=0.125 2024-09-25 13:16:16,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=751879.3333333334, ans=0.125 2024-09-25 13:16:23,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=751926.0, ans=0.125 2024-09-25 13:16:30,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=751926.0, ans=0.0 2024-09-25 13:16:38,720 INFO [train.py:1198] (0/4) Epoch 42, batch 1400, loss[loss=0.1719, ctc_loss=0.1083, cr_loss=0.3179, over 17290.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1215, cr_loss=0.338, over 3354983.80 frames. ], batch size: 46, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:16:48,177 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.272e+02 1.355e+02 1.432e+02 2.085e+02, threshold=2.710e+02, percent-clipped=0.0 2024-09-25 13:17:13,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=752066.0, ans=0.035 2024-09-25 13:17:33,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=752112.6666666666, ans=0.2 2024-09-25 13:17:38,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=752112.6666666666, ans=0.125 2024-09-25 13:17:58,207 INFO [train.py:1198] (0/4) Epoch 42, batch 1450, loss[loss=0.205, ctc_loss=0.1323, cr_loss=0.3636, over 17031.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1202, cr_loss=0.3358, over 3364710.69 frames. ], batch size: 53, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:18:09,719 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:18:14,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=752252.6666666666, ans=0.025 2024-09-25 13:18:29,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=752299.3333333334, ans=0.025 2024-09-25 13:18:41,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=752299.3333333334, ans=0.0 2024-09-25 13:18:52,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=752346.0, ans=0.125 2024-09-25 13:18:54,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=752346.0, ans=0.025 2024-09-25 13:18:59,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=752346.0, ans=0.025 2024-09-25 13:19:12,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=752392.6666666666, ans=0.125 2024-09-25 13:19:23,691 INFO [train.py:1198] (0/4) Epoch 42, batch 1500, loss[loss=0.1783, ctc_loss=0.1145, cr_loss=0.319, over 17144.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1202, cr_loss=0.3353, over 3347747.23 frames. ], batch size: 48, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:19:28,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=752439.3333333334, ans=0.0 2024-09-25 13:19:33,186 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.208e+02 1.278e+02 1.340e+02 1.432e+02 2.576e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-25 13:19:38,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=752486.0, ans=0.125 2024-09-25 13:19:43,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-09-25 13:19:44,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=752486.0, ans=0.015 2024-09-25 13:19:51,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=752486.0, ans=0.0 2024-09-25 13:19:52,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=752486.0, ans=0.5 2024-09-25 13:20:00,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=752532.6666666666, ans=0.0 2024-09-25 13:20:05,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=752532.6666666666, ans=0.125 2024-09-25 13:20:19,506 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:20:24,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=752579.3333333334, ans=10.0 2024-09-25 13:20:25,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=752579.3333333334, ans=0.0 2024-09-25 13:20:48,641 INFO [train.py:1198] (0/4) Epoch 42, batch 1550, loss[loss=0.172, ctc_loss=0.1106, cr_loss=0.3071, over 17200.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1208, cr_loss=0.3368, over 3339147.90 frames. ], batch size: 41, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:20:49,054 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:21:38,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=752812.6666666666, ans=0.125 2024-09-25 13:21:54,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=752859.3333333334, ans=0.09899494936611666 2024-09-25 13:22:09,036 INFO [train.py:1198] (0/4) Epoch 42, batch 1600, loss[loss=0.1609, ctc_loss=0.1014, cr_loss=0.2973, over 16695.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1202, cr_loss=0.336, over 3348738.98 frames. ], batch size: 37, lr: 2.83e-03, grad_scale: 16.0 2024-09-25 13:22:11,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=752906.0, ans=0.125 2024-09-25 13:22:17,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=752906.0, ans=0.125 2024-09-25 13:22:20,349 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.308e+02 1.394e+02 1.520e+02 2.214e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 13:22:23,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=752952.6666666666, ans=0.0 2024-09-25 13:22:48,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=752999.3333333334, ans=0.0 2024-09-25 13:22:50,076 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2024-09-25 13:22:55,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=753046.0, ans=0.125 2024-09-25 13:22:56,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=753046.0, ans=0.2 2024-09-25 13:23:31,969 INFO [train.py:1198] (0/4) Epoch 42, batch 1650, loss[loss=0.1858, ctc_loss=0.1197, cr_loss=0.3302, over 17278.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1202, cr_loss=0.3356, over 3342305.74 frames. ], batch size: 51, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:23:59,204 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-25 13:24:23,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753279.3333333334, ans=0.125 2024-09-25 13:24:35,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2024-09-25 13:24:47,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=753326.0, ans=0.125 2024-09-25 13:24:48,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2024-09-25 13:24:55,037 INFO [train.py:1198] (0/4) Epoch 42, batch 1700, loss[loss=0.154, ctc_loss=0.09629, cr_loss=0.2883, over 16351.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1213, cr_loss=0.3373, over 3338050.39 frames. ], batch size: 36, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:25:10,263 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.072e+02 1.274e+02 1.365e+02 1.477e+02 2.615e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-25 13:25:24,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=753419.3333333334, ans=0.1 2024-09-25 13:25:28,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753466.0, ans=0.125 2024-09-25 13:25:53,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=753512.6666666666, ans=0.125 2024-09-25 13:26:12,726 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-25 13:26:19,505 INFO [train.py:1198] (0/4) Epoch 42, batch 1750, loss[loss=0.2041, ctc_loss=0.1302, cr_loss=0.3693, over 16692.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1217, cr_loss=0.338, over 3343127.20 frames. ], batch size: 61, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:26:37,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=753652.6666666666, ans=0.04949747468305833 2024-09-25 13:27:11,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=753746.0, ans=0.125 2024-09-25 13:27:17,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=753746.0, ans=0.0 2024-09-25 13:27:21,651 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2024-09-25 13:27:22,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=22.5 2024-09-25 13:27:33,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-25 13:27:40,019 INFO [train.py:1198] (0/4) Epoch 42, batch 1800, loss[loss=0.1946, ctc_loss=0.1254, cr_loss=0.3457, over 17019.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1212, cr_loss=0.3373, over 3354706.72 frames. ], batch size: 51, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:27:40,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=753839.3333333334, ans=12.0 2024-09-25 13:27:51,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753839.3333333334, ans=0.125 2024-09-25 13:27:52,875 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.293e+02 1.332e+02 1.454e+02 1.867e+02, threshold=2.665e+02, percent-clipped=0.0 2024-09-25 13:27:53,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=753839.3333333334, ans=0.125 2024-09-25 13:28:01,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753886.0, ans=0.125 2024-09-25 13:28:06,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=753886.0, ans=0.125 2024-09-25 13:28:28,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2024-09-25 13:28:29,029 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=22.5 2024-09-25 13:28:33,727 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.74 vs. limit=6.0 2024-09-25 13:28:52,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=754026.0, ans=0.0 2024-09-25 13:29:04,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=754072.6666666666, ans=0.125 2024-09-25 13:29:05,784 INFO [train.py:1198] (0/4) Epoch 42, batch 1850, loss[loss=0.1934, ctc_loss=0.123, cr_loss=0.3517, over 17069.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1214, cr_loss=0.3377, over 3364999.44 frames. ], batch size: 46, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:29:12,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=754072.6666666666, ans=0.125 2024-09-25 13:29:39,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2024-09-25 13:29:44,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=754166.0, ans=0.025 2024-09-25 13:29:59,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=754212.6666666666, ans=0.125 2024-09-25 13:30:10,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.89 vs. limit=10.0 2024-09-25 13:30:13,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2024-09-25 13:30:31,314 INFO [train.py:1198] (0/4) Epoch 42, batch 1900, loss[loss=0.1729, ctc_loss=0.1082, cr_loss=0.3234, over 17174.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1218, cr_loss=0.3386, over 3363096.30 frames. ], batch size: 41, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:30:38,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=754306.0, ans=0.1 2024-09-25 13:30:44,084 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.292e+02 1.376e+02 1.481e+02 2.312e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 13:30:44,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=754306.0, ans=0.125 2024-09-25 13:31:37,140 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2024-09-25 13:31:44,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=754492.6666666666, ans=0.2 2024-09-25 13:31:50,796 INFO [train.py:1198] (0/4) Epoch 42, batch 1950, loss[loss=0.1986, ctc_loss=0.1281, cr_loss=0.3525, over 17180.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1228, cr_loss=0.3404, over 3359279.85 frames. ], batch size: 45, lr: 2.83e-03, grad_scale: 8.0 2024-09-25 13:31:52,869 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:32:08,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=754586.0, ans=0.125 2024-09-25 13:32:10,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754586.0, ans=0.125 2024-09-25 13:32:12,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=754586.0, ans=0.0 2024-09-25 13:32:17,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=754586.0, ans=0.125 2024-09-25 13:32:18,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754586.0, ans=0.1 2024-09-25 13:32:36,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=754632.6666666666, ans=0.2 2024-09-25 13:32:39,441 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:32:47,263 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=754679.3333333334, ans=0.2 2024-09-25 13:33:08,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=754726.0, ans=0.0 2024-09-25 13:33:13,475 INFO [train.py:1198] (0/4) Epoch 42, batch 2000, loss[loss=0.1964, ctc_loss=0.1259, cr_loss=0.3523, over 17229.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1224, cr_loss=0.3399, over 3367234.16 frames. ], batch size: 50, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:33:25,999 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.295e+02 1.343e+02 1.422e+02 2.059e+02, threshold=2.687e+02, percent-clipped=0.0 2024-09-25 13:34:10,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=754912.6666666666, ans=0.0 2024-09-25 13:34:11,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2024-09-25 13:34:36,195 INFO [train.py:1198] (0/4) Epoch 42, batch 2050, loss[loss=0.1919, ctc_loss=0.1213, cr_loss=0.3529, over 16983.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1224, cr_loss=0.3399, over 3364262.49 frames. ], batch size: 56, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:34:43,941 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-25 13:34:47,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=755006.0, ans=0.125 2024-09-25 13:35:08,158 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=755052.6666666666, ans=0.2 2024-09-25 13:35:26,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755099.3333333334, ans=0.1 2024-09-25 13:36:01,520 INFO [train.py:1198] (0/4) Epoch 42, batch 2100, loss[loss=0.1986, ctc_loss=0.1281, cr_loss=0.3524, over 17015.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1224, cr_loss=0.3399, over 3365939.32 frames. ], batch size: 52, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:36:05,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=755239.3333333334, ans=0.125 2024-09-25 13:36:13,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=755239.3333333334, ans=0.125 2024-09-25 13:36:14,536 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.293e+02 1.359e+02 1.448e+02 2.316e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-25 13:36:19,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=755286.0, ans=0.0 2024-09-25 13:36:20,704 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-25 13:36:24,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=755286.0, ans=0.0 2024-09-25 13:36:39,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=755332.6666666666, ans=0.025 2024-09-25 13:36:51,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755379.3333333334, ans=0.1 2024-09-25 13:37:07,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=755426.0, ans=0.0 2024-09-25 13:37:21,956 INFO [train.py:1198] (0/4) Epoch 42, batch 2150, loss[loss=0.2562, ctc_loss=0.1723, cr_loss=0.4194, over 11608.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1215, cr_loss=0.3382, over 3366809.63 frames. ], batch size: 123, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:37:32,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=755472.6666666666, ans=0.0 2024-09-25 13:37:33,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-25 13:38:44,293 INFO [train.py:1198] (0/4) Epoch 42, batch 2200, loss[loss=0.1834, ctc_loss=0.1187, cr_loss=0.3234, over 17204.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1215, cr_loss=0.3382, over 3370978.30 frames. ], batch size: 41, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:38:56,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=755706.0, ans=0.2 2024-09-25 13:38:58,676 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=12.0 2024-09-25 13:38:59,637 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.309e+02 1.387e+02 1.496e+02 2.285e+02, threshold=2.773e+02, percent-clipped=0.0 2024-09-25 13:39:03,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=755752.6666666666, ans=0.09899494936611666 2024-09-25 13:39:36,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=755846.0, ans=0.125 2024-09-25 13:40:09,794 INFO [train.py:1198] (0/4) Epoch 42, batch 2250, loss[loss=0.1406, ctc_loss=0.08703, cr_loss=0.2676, over 16671.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1212, cr_loss=0.3373, over 3374072.93 frames. ], batch size: 37, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:40:38,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=755986.0, ans=0.0 2024-09-25 13:40:38,722 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=755986.0, ans=0.125 2024-09-25 13:40:45,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=756032.6666666666, ans=0.125 2024-09-25 13:40:48,965 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2024-09-25 13:41:07,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=756079.3333333334, ans=0.125 2024-09-25 13:41:30,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=756126.0, ans=0.09899494936611666 2024-09-25 13:41:33,095 INFO [train.py:1198] (0/4) Epoch 42, batch 2300, loss[loss=0.2125, ctc_loss=0.1372, cr_loss=0.3764, over 17030.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.121, cr_loss=0.3373, over 3375730.21 frames. ], batch size: 51, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:41:45,871 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.265e+02 1.352e+02 1.460e+02 1.967e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-25 13:41:49,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=756219.3333333334, ans=0.2 2024-09-25 13:42:00,877 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=756219.3333333334, ans=0.125 2024-09-25 13:42:24,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-09-25 13:42:31,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=756312.6666666666, ans=0.035 2024-09-25 13:42:44,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=756359.3333333334, ans=0.125 2024-09-25 13:42:53,053 INFO [train.py:1198] (0/4) Epoch 42, batch 2350, loss[loss=0.1777, ctc_loss=0.1137, cr_loss=0.3198, over 17150.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.121, cr_loss=0.3371, over 3362701.49 frames. ], batch size: 48, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:43:04,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756406.0, ans=0.125 2024-09-25 13:43:11,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756452.6666666666, ans=0.125 2024-09-25 13:43:13,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=756452.6666666666, ans=0.125 2024-09-25 13:43:34,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756499.3333333334, ans=0.1 2024-09-25 13:44:07,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=756592.6666666666, ans=0.0 2024-09-25 13:44:09,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=756592.6666666666, ans=0.0 2024-09-25 13:44:15,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=756592.6666666666, ans=0.125 2024-09-25 13:44:18,218 INFO [train.py:1198] (0/4) Epoch 42, batch 2400, loss[loss=0.1891, ctc_loss=0.1191, cr_loss=0.3502, over 16206.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1209, cr_loss=0.3372, over 3365323.82 frames. ], batch size: 36, lr: 2.82e-03, grad_scale: 32.0 2024-09-25 13:44:18,963 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=12.0 2024-09-25 13:44:30,874 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.304e+02 1.412e+02 1.522e+02 4.391e+02, threshold=2.825e+02, percent-clipped=1.0 2024-09-25 13:44:31,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=756639.3333333334, ans=0.125 2024-09-25 13:44:33,362 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2024-09-25 13:45:02,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=756732.6666666666, ans=0.0 2024-09-25 13:45:23,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756779.3333333334, ans=0.1 2024-09-25 13:45:32,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756826.0, ans=0.125 2024-09-25 13:45:37,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=22.5 2024-09-25 13:45:39,634 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.35 vs. limit=10.0 2024-09-25 13:45:43,399 INFO [train.py:1198] (0/4) Epoch 42, batch 2450, loss[loss=0.1628, ctc_loss=0.104, cr_loss=0.294, over 17046.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1214, cr_loss=0.3381, over 3364201.62 frames. ], batch size: 39, lr: 2.82e-03, grad_scale: 32.0 2024-09-25 13:45:50,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=756872.6666666666, ans=0.0 2024-09-25 13:45:55,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756872.6666666666, ans=0.125 2024-09-25 13:46:03,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=756919.3333333334, ans=0.0 2024-09-25 13:46:36,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=757012.6666666666, ans=0.0 2024-09-25 13:46:37,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=757012.6666666666, ans=0.07 2024-09-25 13:46:51,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=757059.3333333334, ans=0.2 2024-09-25 13:47:03,855 INFO [train.py:1198] (0/4) Epoch 42, batch 2500, loss[loss=0.1857, ctc_loss=0.1194, cr_loss=0.3317, over 17318.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3391, over 3348776.76 frames. ], batch size: 51, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:47:18,224 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.315e+02 1.378e+02 1.471e+02 2.327e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 13:47:52,314 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:48:01,515 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 13:48:15,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=757292.6666666666, ans=0.0 2024-09-25 13:48:26,434 INFO [train.py:1198] (0/4) Epoch 42, batch 2550, loss[loss=0.2023, ctc_loss=0.1319, cr_loss=0.3519, over 17256.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1221, cr_loss=0.3386, over 3358749.74 frames. ], batch size: 44, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:49:12,071 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=757432.6666666666, ans=0.0 2024-09-25 13:49:28,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=757479.3333333334, ans=0.0 2024-09-25 13:49:29,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=757479.3333333334, ans=0.035 2024-09-25 13:49:43,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-09-25 13:49:47,660 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=757572.6666666666, ans=0.125 2024-09-25 13:49:48,977 INFO [train.py:1198] (0/4) Epoch 42, batch 2600, loss[loss=0.2053, ctc_loss=0.1331, cr_loss=0.3612, over 17020.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1224, cr_loss=0.3392, over 3356072.64 frames. ], batch size: 51, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:50:05,737 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.272e+02 1.357e+02 1.427e+02 2.038e+02, threshold=2.713e+02, percent-clipped=0.0 2024-09-25 13:50:10,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=757619.3333333334, ans=0.0 2024-09-25 13:50:43,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=757712.6666666666, ans=0.125 2024-09-25 13:50:44,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-09-25 13:51:09,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=757759.3333333334, ans=0.125 2024-09-25 13:51:13,670 INFO [train.py:1198] (0/4) Epoch 42, batch 2650, loss[loss=0.1967, ctc_loss=0.1302, cr_loss=0.3327, over 17215.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1223, cr_loss=0.3396, over 3368076.54 frames. ], batch size: 50, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:52:00,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757946.0, ans=0.0 2024-09-25 13:52:16,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=757992.6666666666, ans=0.125 2024-09-25 13:52:31,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=757992.6666666666, ans=0.2 2024-09-25 13:52:34,235 INFO [train.py:1198] (0/4) Epoch 42, batch 2700, loss[loss=0.2029, ctc_loss=0.134, cr_loss=0.3446, over 17003.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1224, cr_loss=0.3405, over 3372031.29 frames. ], batch size: 51, lr: 2.82e-03, grad_scale: 8.0 2024-09-25 13:52:34,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758039.3333333334, ans=0.1 2024-09-25 13:52:48,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=758086.0, ans=0.015 2024-09-25 13:52:50,046 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.301e+02 1.411e+02 1.546e+02 2.916e+02, threshold=2.822e+02, percent-clipped=1.0 2024-09-25 13:53:03,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=15.0 2024-09-25 13:53:10,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=758132.6666666666, ans=0.1 2024-09-25 13:53:20,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=758132.6666666666, ans=15.0 2024-09-25 13:53:31,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=758179.3333333334, ans=0.125 2024-09-25 13:53:55,652 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2024-09-25 13:53:59,184 INFO [train.py:1198] (0/4) Epoch 42, batch 2750, loss[loss=0.1793, ctc_loss=0.1123, cr_loss=0.3352, over 17235.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.122, cr_loss=0.3396, over 3367129.09 frames. ], batch size: 50, lr: 2.82e-03, grad_scale: 8.0 2024-09-25 13:54:20,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=758319.3333333334, ans=0.0 2024-09-25 13:54:40,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=758366.0, ans=0.125 2024-09-25 13:54:45,557 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=758412.6666666666, ans=0.125 2024-09-25 13:55:01,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=758412.6666666666, ans=0.0 2024-09-25 13:55:11,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2024-09-25 13:55:23,621 INFO [train.py:1198] (0/4) Epoch 42, batch 2800, loss[loss=0.2255, ctc_loss=0.1461, cr_loss=0.3972, over 15150.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1218, cr_loss=0.3395, over 3369083.18 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:55:24,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=758506.0, ans=0.125 2024-09-25 13:55:38,834 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=12.0 2024-09-25 13:55:39,669 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.295e+02 1.378e+02 1.487e+02 2.331e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 13:55:51,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=758552.6666666666, ans=0.125 2024-09-25 13:56:09,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2024-09-25 13:56:17,732 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2024-09-25 13:56:43,683 INFO [train.py:1198] (0/4) Epoch 42, batch 2850, loss[loss=0.1702, ctc_loss=0.1064, cr_loss=0.319, over 17277.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1218, cr_loss=0.34, over 3376382.37 frames. ], batch size: 42, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:56:47,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=758739.3333333334, ans=0.125 2024-09-25 13:57:00,952 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-25 13:57:08,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=758786.0, ans=0.04949747468305833 2024-09-25 13:57:48,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=758926.0, ans=0.025 2024-09-25 13:57:59,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=758926.0, ans=0.0 2024-09-25 13:58:06,779 INFO [train.py:1198] (0/4) Epoch 42, batch 2900, loss[loss=0.1523, ctc_loss=0.09603, cr_loss=0.2812, over 17248.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3383, over 3378240.02 frames. ], batch size: 44, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:58:15,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758972.6666666666, ans=0.1 2024-09-25 13:58:22,533 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.263e+02 1.349e+02 1.466e+02 2.283e+02, threshold=2.697e+02, percent-clipped=0.0 2024-09-25 13:58:26,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=759019.3333333334, ans=0.125 2024-09-25 13:58:41,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=759066.0, ans=0.025 2024-09-25 13:58:52,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2024-09-25 13:59:29,385 INFO [train.py:1198] (0/4) Epoch 42, batch 2950, loss[loss=0.2109, ctc_loss=0.1372, cr_loss=0.3685, over 17298.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1218, cr_loss=0.3392, over 3368698.37 frames. ], batch size: 49, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 13:59:48,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=759252.6666666666, ans=0.0 2024-09-25 14:00:32,569 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2024-09-25 14:00:53,413 INFO [train.py:1198] (0/4) Epoch 42, batch 3000, loss[loss=0.21, ctc_loss=0.1325, cr_loss=0.3875, over 16744.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1214, cr_loss=0.338, over 3371035.76 frames. ], batch size: 61, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 14:00:53,414 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 14:01:08,228 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.1608, 4.5672, 3.7436, 4.2639, 4.2568, 3.7631, 3.9376, 3.6973], device='cuda:0') 2024-09-25 14:01:08,871 INFO [train.py:1230] (0/4) Epoch 42, validation: loss=0.03543, ctc_loss=0.03543, cr_loss=1.019e-14, over 944034.00 frames. 2024-09-25 14:01:08,871 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 14:01:24,613 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.302e+02 1.378e+02 1.459e+02 2.338e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 14:02:26,582 INFO [train.py:1198] (0/4) Epoch 42, batch 3050, loss[loss=0.2427, ctc_loss=0.1616, cr_loss=0.4055, over 14998.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1207, cr_loss=0.3365, over 3365000.51 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 14:02:31,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=759672.6666666666, ans=0.05 2024-09-25 14:02:35,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.51 vs. limit=22.5 2024-09-25 14:02:44,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=759719.3333333334, ans=0.125 2024-09-25 14:03:00,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=759766.0, ans=0.2 2024-09-25 14:03:20,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=759812.6666666666, ans=0.125 2024-09-25 14:03:34,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=759859.3333333334, ans=0.125 2024-09-25 14:03:45,317 INFO [train.py:1198] (0/4) Epoch 42, batch 3100, loss[loss=0.2182, ctc_loss=0.143, cr_loss=0.3761, over 16892.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1211, cr_loss=0.3371, over 3362509.79 frames. ], batch size: 58, lr: 2.82e-03, grad_scale: 16.0 2024-09-25 14:03:45,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=759906.0, ans=0.0 2024-09-25 14:03:53,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=759906.0, ans=0.1 2024-09-25 14:03:57,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=759906.0, ans=0.2 2024-09-25 14:04:00,900 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.293e+02 1.378e+02 1.463e+02 2.447e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 14:04:20,105 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-25 14:04:24,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=759999.3333333334, ans=0.05 2024-09-25 14:04:57,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=760092.6666666666, ans=0.125 2024-09-25 14:05:03,356 INFO [train.py:1198] (0/4) Epoch 42, batch 3150, loss[loss=0.1732, ctc_loss=0.1082, cr_loss=0.3248, over 17269.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1209, cr_loss=0.3369, over 3362952.17 frames. ], batch size: 44, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:05:08,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=760139.3333333334, ans=0.125 2024-09-25 14:05:08,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=22.5 2024-09-25 14:05:40,466 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=760232.6666666666, ans=0.2 2024-09-25 14:05:41,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-09-25 14:05:48,759 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2024-09-25 14:06:17,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=760326.0, ans=0.95 2024-09-25 14:06:23,520 INFO [train.py:1198] (0/4) Epoch 42, batch 3200, loss[loss=0.1741, ctc_loss=0.1104, cr_loss=0.3184, over 17171.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1205, cr_loss=0.3373, over 3371658.33 frames. ], batch size: 41, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:06:39,248 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.277e+02 1.364e+02 1.425e+02 3.566e+02, threshold=2.728e+02, percent-clipped=1.0 2024-09-25 14:06:50,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=760419.3333333334, ans=0.125 2024-09-25 14:07:01,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=760466.0, ans=0.025 2024-09-25 14:07:17,026 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=760512.6666666666, ans=0.125 2024-09-25 14:07:32,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=12.0 2024-09-25 14:07:43,980 INFO [train.py:1198] (0/4) Epoch 42, batch 3250, loss[loss=0.2331, ctc_loss=0.1568, cr_loss=0.3814, over 11683.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1211, cr_loss=0.3381, over 3376881.16 frames. ], batch size: 123, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:08:04,749 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=760652.6666666666, ans=0.0 2024-09-25 14:08:31,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=760746.0, ans=0.125 2024-09-25 14:09:02,229 INFO [train.py:1198] (0/4) Epoch 42, batch 3300, loss[loss=0.1945, ctc_loss=0.1306, cr_loss=0.3193, over 11775.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1203, cr_loss=0.3366, over 3366142.98 frames. ], batch size: 123, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:09:08,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=760839.3333333334, ans=0.0 2024-09-25 14:09:10,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=760839.3333333334, ans=0.0 2024-09-25 14:09:18,663 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2024-09-25 14:09:19,406 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.292e+02 1.397e+02 1.503e+02 2.274e+02, threshold=2.794e+02, percent-clipped=0.0 2024-09-25 14:09:32,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=760932.6666666666, ans=0.025 2024-09-25 14:09:55,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=760979.3333333334, ans=0.125 2024-09-25 14:10:01,882 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:10:17,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=761026.0, ans=0.05 2024-09-25 14:10:17,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=761026.0, ans=0.0 2024-09-25 14:10:22,244 INFO [train.py:1198] (0/4) Epoch 42, batch 3350, loss[loss=0.1789, ctc_loss=0.1126, cr_loss=0.3319, over 17252.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3381, over 3361101.12 frames. ], batch size: 42, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:10:29,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=761072.6666666666, ans=0.2 2024-09-25 14:10:51,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=22.5 2024-09-25 14:11:41,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=761306.0, ans=0.125 2024-09-25 14:11:42,862 INFO [train.py:1198] (0/4) Epoch 42, batch 3400, loss[loss=0.2063, ctc_loss=0.1327, cr_loss=0.3681, over 17142.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1215, cr_loss=0.3388, over 3360382.87 frames. ], batch size: 48, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:12:00,049 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.312e+02 1.394e+02 1.492e+02 2.078e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-25 14:12:03,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=761352.6666666666, ans=0.0 2024-09-25 14:12:19,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=761399.3333333334, ans=0.0 2024-09-25 14:12:19,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761399.3333333334, ans=0.1 2024-09-25 14:12:27,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=761399.3333333334, ans=0.125 2024-09-25 14:12:36,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761446.0, ans=0.1 2024-09-25 14:12:44,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=761492.6666666666, ans=0.125 2024-09-25 14:13:01,389 INFO [train.py:1198] (0/4) Epoch 42, batch 3450, loss[loss=0.2087, ctc_loss=0.1364, cr_loss=0.3619, over 17017.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1218, cr_loss=0.3394, over 3359323.49 frames. ], batch size: 51, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:13:24,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-09-25 14:13:31,435 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:13:55,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=761679.3333333334, ans=10.0 2024-09-25 14:13:55,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=761679.3333333334, ans=0.0 2024-09-25 14:14:01,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=761679.3333333334, ans=0.125 2024-09-25 14:14:11,468 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2024-09-25 14:14:20,000 INFO [train.py:1198] (0/4) Epoch 42, batch 3500, loss[loss=0.177, ctc_loss=0.1123, cr_loss=0.3234, over 16932.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1222, cr_loss=0.3402, over 3359321.36 frames. ], batch size: 42, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:14:24,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2024-09-25 14:14:27,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=761772.6666666666, ans=0.125 2024-09-25 14:14:37,160 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.281e+02 1.384e+02 1.487e+02 1.767e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-25 14:14:48,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=761819.3333333334, ans=0.125 2024-09-25 14:14:54,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=761866.0, ans=0.0 2024-09-25 14:14:58,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=761866.0, ans=0.125 2024-09-25 14:15:18,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2024-09-25 14:15:40,266 INFO [train.py:1198] (0/4) Epoch 42, batch 3550, loss[loss=0.2203, ctc_loss=0.1445, cr_loss=0.3789, over 16606.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.123, cr_loss=0.3412, over 3352626.36 frames. ], batch size: 66, lr: 2.81e-03, grad_scale: 16.0 2024-09-25 14:15:51,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=762006.0, ans=0.125 2024-09-25 14:15:56,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=762052.6666666666, ans=0.2 2024-09-25 14:16:22,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=762099.3333333334, ans=0.0 2024-09-25 14:16:27,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=762146.0, ans=0.0 2024-09-25 14:16:36,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=762146.0, ans=0.07 2024-09-25 14:16:49,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=762192.6666666666, ans=0.125 2024-09-25 14:16:58,305 INFO [train.py:1198] (0/4) Epoch 42, batch 3600, loss[loss=0.1407, ctc_loss=0.08646, cr_loss=0.2712, over 16688.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1221, cr_loss=0.3392, over 3358786.03 frames. ], batch size: 37, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:17:09,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=762239.3333333334, ans=0.125 2024-09-25 14:17:09,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=762239.3333333334, ans=22.5 2024-09-25 14:17:12,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=762286.0, ans=0.2 2024-09-25 14:17:15,186 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.282e+02 1.377e+02 1.493e+02 1.761e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 14:17:51,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=762379.3333333334, ans=0.125 2024-09-25 14:18:01,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-09-25 14:18:16,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=762472.6666666666, ans=0.025 2024-09-25 14:18:17,994 INFO [train.py:1198] (0/4) Epoch 42, batch 3650, loss[loss=0.2004, ctc_loss=0.1288, cr_loss=0.3578, over 17298.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1224, cr_loss=0.34, over 3361658.25 frames. ], batch size: 51, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:18:19,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=762472.6666666666, ans=0.2 2024-09-25 14:18:24,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=762472.6666666666, ans=0.1 2024-09-25 14:18:27,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=762472.6666666666, ans=0.125 2024-09-25 14:19:03,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=762566.0, ans=0.125 2024-09-25 14:19:30,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2024-09-25 14:19:38,301 INFO [train.py:1198] (0/4) Epoch 42, batch 3700, loss[loss=0.2073, ctc_loss=0.13, cr_loss=0.3865, over 17147.00 frames. ], tot_loss[loss=0.1906, ctc_loss=0.1225, cr_loss=0.3406, over 3360392.26 frames. ], batch size: 48, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:19:41,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=762706.0, ans=0.125 2024-09-25 14:19:47,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=762706.0, ans=0.0 2024-09-25 14:19:55,539 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 1.273e+02 1.387e+02 1.507e+02 1.911e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-25 14:20:07,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=762752.6666666666, ans=0.1 2024-09-25 14:20:09,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=762799.3333333334, ans=0.125 2024-09-25 14:20:34,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=762846.0, ans=0.2 2024-09-25 14:20:53,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=762892.6666666666, ans=0.125 2024-09-25 14:20:57,615 INFO [train.py:1198] (0/4) Epoch 42, batch 3750, loss[loss=0.2361, ctc_loss=0.1635, cr_loss=0.3629, over 12165.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1228, cr_loss=0.3406, over 3344529.86 frames. ], batch size: 123, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:21:02,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=762939.3333333334, ans=0.5 2024-09-25 14:21:09,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=22.5 2024-09-25 14:21:14,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762986.0, ans=0.1 2024-09-25 14:21:30,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=763032.6666666666, ans=0.125 2024-09-25 14:21:41,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=763032.6666666666, ans=0.125 2024-09-25 14:22:03,555 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=763126.0, ans=0.0 2024-09-25 14:22:09,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=763126.0, ans=0.0 2024-09-25 14:22:15,974 INFO [train.py:1198] (0/4) Epoch 42, batch 3800, loss[loss=0.1671, ctc_loss=0.1093, cr_loss=0.2889, over 17252.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1226, cr_loss=0.3394, over 3322679.63 frames. ], batch size: 44, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:22:22,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=763172.6666666666, ans=0.2 2024-09-25 14:22:24,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=763172.6666666666, ans=0.125 2024-09-25 14:22:33,279 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.315e+02 1.408e+02 1.487e+02 2.777e+02, threshold=2.816e+02, percent-clipped=1.0 2024-09-25 14:22:43,403 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=22.5 2024-09-25 14:22:53,927 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=763266.0, ans=0.0 2024-09-25 14:23:00,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=763266.0, ans=0.07 2024-09-25 14:23:30,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=763359.3333333334, ans=0.1 2024-09-25 14:23:31,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763359.3333333334, ans=0.1 2024-09-25 14:23:34,531 INFO [train.py:1198] (0/4) Epoch 42, batch 3850, loss[loss=0.1726, ctc_loss=0.1081, cr_loss=0.3224, over 17026.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1217, cr_loss=0.3375, over 3298856.82 frames. ], batch size: 39, lr: 2.81e-03, grad_scale: 32.0 2024-09-25 14:23:50,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=763452.6666666666, ans=0.125 2024-09-25 14:23:52,222 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-09-25 14:24:21,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763546.0, ans=0.1 2024-09-25 14:24:29,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=763546.0, ans=0.2 2024-09-25 14:24:45,428 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-42.pt 2024-09-25 14:25:36,256 INFO [train.py:1198] (0/4) Epoch 43, batch 0, loss[loss=0.1661, ctc_loss=0.1091, cr_loss=0.2851, over 17176.00 frames. ], tot_loss[loss=0.1661, ctc_loss=0.1091, cr_loss=0.2851, over 17176.00 frames. ], batch size: 45, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:25:36,257 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 14:25:51,495 INFO [train.py:1230] (0/4) Epoch 43, validation: loss=0.03486, ctc_loss=0.03486, cr_loss=1.051e-14, over 944034.00 frames. 2024-09-25 14:25:51,496 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 14:25:51,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=763620.6666666666, ans=0.125 2024-09-25 14:26:04,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=763620.6666666666, ans=0.125 2024-09-25 14:26:15,238 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.334e+02 1.498e+02 1.673e+02 2.107e+02, threshold=2.995e+02, percent-clipped=0.0 2024-09-25 14:26:26,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=763714.0, ans=0.09899494936611666 2024-09-25 14:26:39,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=763760.6666666666, ans=0.125 2024-09-25 14:26:43,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2024-09-25 14:26:52,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=763760.6666666666, ans=0.125 2024-09-25 14:27:10,607 INFO [train.py:1198] (0/4) Epoch 43, batch 50, loss[loss=0.1917, ctc_loss=0.1205, cr_loss=0.3559, over 17136.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1203, cr_loss=0.3369, over 764820.71 frames. ], batch size: 45, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:28:12,009 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.01 vs. limit=22.5 2024-09-25 14:28:27,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=764040.6666666666, ans=0.125 2024-09-25 14:28:30,179 INFO [train.py:1198] (0/4) Epoch 43, batch 100, loss[loss=0.1507, ctc_loss=0.09589, cr_loss=0.2739, over 17264.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1213, cr_loss=0.3391, over 1338537.57 frames. ], batch size: 42, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:28:40,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=764087.3333333334, ans=0.125 2024-09-25 14:28:54,052 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.307e+02 1.401e+02 1.473e+02 2.012e+02, threshold=2.802e+02, percent-clipped=0.0 2024-09-25 14:28:59,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=764134.0, ans=10.0 2024-09-25 14:29:07,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=22.5 2024-09-25 14:29:12,781 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.47 vs. limit=15.0 2024-09-25 14:29:15,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=764180.6666666666, ans=0.125 2024-09-25 14:29:35,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2024-09-25 14:29:54,515 INFO [train.py:1198] (0/4) Epoch 43, batch 150, loss[loss=0.148, ctc_loss=0.09209, cr_loss=0.2793, over 16362.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3375, over 1786026.37 frames. ], batch size: 36, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:30:05,016 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-25 14:30:17,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=764367.3333333334, ans=0.0 2024-09-25 14:30:27,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=764414.0, ans=0.2 2024-09-25 14:30:28,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=764414.0, ans=0.0 2024-09-25 14:30:33,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=764414.0, ans=0.0 2024-09-25 14:31:04,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=764507.3333333334, ans=0.95 2024-09-25 14:31:09,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=764507.3333333334, ans=0.125 2024-09-25 14:31:19,834 INFO [train.py:1198] (0/4) Epoch 43, batch 200, loss[loss=0.2032, ctc_loss=0.1348, cr_loss=0.3422, over 16747.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3373, over 2129169.31 frames. ], batch size: 61, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:31:20,471 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.31 vs. limit=10.0 2024-09-25 14:31:29,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=764554.0, ans=0.125 2024-09-25 14:31:39,709 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=12.0 2024-09-25 14:31:43,639 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.287e+02 1.384e+02 1.479e+02 1.740e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-25 14:31:43,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=764600.6666666666, ans=0.2 2024-09-25 14:32:17,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=764694.0, ans=0.0 2024-09-25 14:32:20,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=764694.0, ans=0.025 2024-09-25 14:32:22,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-25 14:32:34,198 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-09-25 14:32:36,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=764740.6666666666, ans=0.125 2024-09-25 14:32:39,305 INFO [train.py:1198] (0/4) Epoch 43, batch 250, loss[loss=0.2208, ctc_loss=0.1452, cr_loss=0.3779, over 16505.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1207, cr_loss=0.3365, over 2401823.20 frames. ], batch size: 66, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:32:43,504 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2024-09-25 14:33:02,197 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2024-09-25 14:33:19,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=764880.6666666666, ans=0.125 2024-09-25 14:33:23,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=764880.6666666666, ans=0.125 2024-09-25 14:33:59,385 INFO [train.py:1198] (0/4) Epoch 43, batch 300, loss[loss=0.2073, ctc_loss=0.1316, cr_loss=0.3788, over 17203.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1216, cr_loss=0.3385, over 2618840.89 frames. ], batch size: 47, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:34:12,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=765020.6666666666, ans=0.0 2024-09-25 14:34:25,901 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.299e+02 1.370e+02 1.451e+02 2.001e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-25 14:34:49,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=765114.0, ans=0.0 2024-09-25 14:34:59,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=765160.6666666666, ans=0.0 2024-09-25 14:35:24,626 INFO [train.py:1198] (0/4) Epoch 43, batch 350, loss[loss=0.1587, ctc_loss=0.09871, cr_loss=0.3001, over 17182.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.122, cr_loss=0.3398, over 2789714.21 frames. ], batch size: 41, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:35:55,951 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-164000.pt 2024-09-25 14:36:26,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=765394.0, ans=0.125 2024-09-25 14:36:39,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=765440.6666666666, ans=0.025 2024-09-25 14:36:50,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=765487.3333333334, ans=0.0 2024-09-25 14:36:51,771 INFO [train.py:1198] (0/4) Epoch 43, batch 400, loss[loss=0.1787, ctc_loss=0.1147, cr_loss=0.3197, over 17012.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1205, cr_loss=0.3369, over 2926284.76 frames. ], batch size: 44, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:37:07,862 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:37:15,438 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.295e+02 1.341e+02 1.449e+02 2.336e+02, threshold=2.681e+02, percent-clipped=0.0 2024-09-25 14:37:34,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-09-25 14:37:42,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=765627.3333333334, ans=0.125 2024-09-25 14:37:57,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=765674.0, ans=0.2 2024-09-25 14:38:06,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=765674.0, ans=0.0 2024-09-25 14:38:10,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=765720.6666666666, ans=0.0 2024-09-25 14:38:11,351 INFO [train.py:1198] (0/4) Epoch 43, batch 450, loss[loss=0.176, ctc_loss=0.1133, cr_loss=0.3138, over 17224.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1211, cr_loss=0.3388, over 3032092.74 frames. ], batch size: 50, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:38:24,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=765720.6666666666, ans=0.1 2024-09-25 14:38:57,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=765860.6666666666, ans=0.0 2024-09-25 14:38:59,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=765860.6666666666, ans=0.95 2024-09-25 14:39:02,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=765860.6666666666, ans=0.09899494936611666 2024-09-25 14:39:26,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=765907.3333333334, ans=0.0 2024-09-25 14:39:33,565 INFO [train.py:1198] (0/4) Epoch 43, batch 500, loss[loss=0.1761, ctc_loss=0.1139, cr_loss=0.3113, over 17295.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1209, cr_loss=0.3379, over 3103860.37 frames. ], batch size: 46, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:39:41,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=765954.0, ans=0.0 2024-09-25 14:39:50,125 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2024-09-25 14:39:54,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-09-25 14:40:00,063 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.354e+02 1.449e+02 1.544e+02 3.228e+02, threshold=2.898e+02, percent-clipped=1.0 2024-09-25 14:40:08,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=766047.3333333334, ans=0.125 2024-09-25 14:40:30,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=766094.0, ans=0.1 2024-09-25 14:40:45,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=22.5 2024-09-25 14:40:52,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2024-09-25 14:40:53,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=766140.6666666666, ans=0.1 2024-09-25 14:41:00,790 INFO [train.py:1198] (0/4) Epoch 43, batch 550, loss[loss=0.1971, ctc_loss=0.1277, cr_loss=0.3472, over 16704.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1209, cr_loss=0.3382, over 3154637.93 frames. ], batch size: 61, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:41:01,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=766187.3333333334, ans=0.0 2024-09-25 14:41:07,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=766187.3333333334, ans=0.2 2024-09-25 14:41:08,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=766187.3333333334, ans=0.2 2024-09-25 14:41:20,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=766234.0, ans=0.125 2024-09-25 14:41:33,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=766280.6666666666, ans=0.2 2024-09-25 14:41:41,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=766280.6666666666, ans=0.125 2024-09-25 14:41:49,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.46 vs. limit=22.5 2024-09-25 14:41:53,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2024-09-25 14:42:15,846 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=766374.0, ans=0.2 2024-09-25 14:42:20,464 INFO [train.py:1198] (0/4) Epoch 43, batch 600, loss[loss=0.1884, ctc_loss=0.118, cr_loss=0.3518, over 17029.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.3386, over 3202229.66 frames. ], batch size: 44, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:42:27,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=766420.6666666666, ans=0.125 2024-09-25 14:42:44,284 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.128e+02 1.302e+02 1.423e+02 1.505e+02 3.409e+02, threshold=2.845e+02, percent-clipped=1.0 2024-09-25 14:43:19,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=766560.6666666666, ans=0.125 2024-09-25 14:43:40,676 INFO [train.py:1198] (0/4) Epoch 43, batch 650, loss[loss=0.2003, ctc_loss=0.1287, cr_loss=0.3581, over 17151.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1211, cr_loss=0.3392, over 3233250.48 frames. ], batch size: 48, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:43:44,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=766654.0, ans=0.025 2024-09-25 14:43:52,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=766654.0, ans=0.09899494936611666 2024-09-25 14:43:52,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=766654.0, ans=0.0 2024-09-25 14:44:00,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=766700.6666666666, ans=0.125 2024-09-25 14:44:02,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=766700.6666666666, ans=0.0 2024-09-25 14:44:23,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=766747.3333333334, ans=10.0 2024-09-25 14:44:38,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=766794.0, ans=0.125 2024-09-25 14:44:47,523 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=766794.0, ans=0.0 2024-09-25 14:45:06,412 INFO [train.py:1198] (0/4) Epoch 43, batch 700, loss[loss=0.1928, ctc_loss=0.1215, cr_loss=0.3564, over 17046.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1221, cr_loss=0.3406, over 3258094.79 frames. ], batch size: 44, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:45:30,412 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.308e+02 1.372e+02 1.490e+02 1.932e+02, threshold=2.743e+02, percent-clipped=0.0 2024-09-25 14:45:35,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=766934.0, ans=0.125 2024-09-25 14:45:55,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=767027.3333333334, ans=0.2 2024-09-25 14:46:05,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=767027.3333333334, ans=0.1 2024-09-25 14:46:15,037 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=767074.0, ans=0.1 2024-09-25 14:46:29,244 INFO [train.py:1198] (0/4) Epoch 43, batch 750, loss[loss=0.1693, ctc_loss=0.1037, cr_loss=0.3282, over 17191.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3389, over 3290619.92 frames. ], batch size: 41, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:46:33,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2024-09-25 14:46:53,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=767167.3333333334, ans=0.2 2024-09-25 14:46:55,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=767167.3333333334, ans=0.125 2024-09-25 14:46:56,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=767167.3333333334, ans=0.125 2024-09-25 14:47:30,573 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 14:47:32,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=767307.3333333334, ans=0.125 2024-09-25 14:47:49,079 INFO [train.py:1198] (0/4) Epoch 43, batch 800, loss[loss=0.1978, ctc_loss=0.1295, cr_loss=0.3416, over 17002.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1216, cr_loss=0.3393, over 3308988.73 frames. ], batch size: 51, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:47:57,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=767354.0, ans=0.5 2024-09-25 14:48:08,646 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-09-25 14:48:12,578 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.117e+02 1.303e+02 1.387e+02 1.475e+02 2.331e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-25 14:48:14,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=767400.6666666666, ans=0.125 2024-09-25 14:48:27,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=767447.3333333334, ans=0.125 2024-09-25 14:48:27,631 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2024-09-25 14:48:31,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=767447.3333333334, ans=0.125 2024-09-25 14:48:32,586 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2024-09-25 14:48:38,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=767494.0, ans=0.0 2024-09-25 14:49:08,215 INFO [train.py:1198] (0/4) Epoch 43, batch 850, loss[loss=0.173, ctc_loss=0.1083, cr_loss=0.3238, over 17101.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1218, cr_loss=0.3392, over 3322447.11 frames. ], batch size: 40, lr: 2.77e-03, grad_scale: 32.0 2024-09-25 14:49:11,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=767587.3333333334, ans=0.0 2024-09-25 14:49:32,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=767634.0, ans=0.125 2024-09-25 14:49:40,233 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=767634.0, ans=0.025 2024-09-25 14:50:16,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=767774.0, ans=0.125 2024-09-25 14:50:23,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=22.5 2024-09-25 14:50:25,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=767774.0, ans=0.0 2024-09-25 14:50:33,652 INFO [train.py:1198] (0/4) Epoch 43, batch 900, loss[loss=0.2277, ctc_loss=0.1545, cr_loss=0.3659, over 11619.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1214, cr_loss=0.338, over 3324760.44 frames. ], batch size: 123, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:50:43,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=767820.6666666666, ans=0.125 2024-09-25 14:51:01,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767867.3333333334, ans=0.1 2024-09-25 14:51:04,268 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.318e+02 1.401e+02 1.527e+02 2.167e+02, threshold=2.803e+02, percent-clipped=0.0 2024-09-25 14:51:09,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=767914.0, ans=0.125 2024-09-25 14:51:17,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=767914.0, ans=0.125 2024-09-25 14:51:30,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=767960.6666666666, ans=0.025 2024-09-25 14:51:39,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=767960.6666666666, ans=0.125 2024-09-25 14:51:46,492 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-09-25 14:51:47,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=768007.3333333334, ans=0.125 2024-09-25 14:51:51,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=12.0 2024-09-25 14:51:58,526 INFO [train.py:1198] (0/4) Epoch 43, batch 950, loss[loss=0.1888, ctc_loss=0.1211, cr_loss=0.3382, over 16118.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1216, cr_loss=0.3387, over 3329488.16 frames. ], batch size: 74, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:52:06,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=768054.0, ans=0.125 2024-09-25 14:52:42,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=768147.3333333334, ans=0.2 2024-09-25 14:52:59,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.83 vs. limit=15.0 2024-09-25 14:53:18,817 INFO [train.py:1198] (0/4) Epoch 43, batch 1000, loss[loss=0.1912, ctc_loss=0.1237, cr_loss=0.3375, over 17078.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1213, cr_loss=0.3386, over 3329078.88 frames. ], batch size: 46, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:53:44,360 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.294e+02 1.416e+02 1.489e+02 2.363e+02, threshold=2.832e+02, percent-clipped=0.0 2024-09-25 14:53:52,829 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=768380.6666666666, ans=0.2 2024-09-25 14:53:55,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=768380.6666666666, ans=0.125 2024-09-25 14:54:44,013 INFO [train.py:1198] (0/4) Epoch 43, batch 1050, loss[loss=0.2092, ctc_loss=0.1371, cr_loss=0.3604, over 15953.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1217, cr_loss=0.339, over 3342194.08 frames. ], batch size: 74, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:54:57,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=768520.6666666666, ans=0.125 2024-09-25 14:55:08,869 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.45 vs. limit=10.0 2024-09-25 14:55:11,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768567.3333333334, ans=0.1 2024-09-25 14:55:11,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=768567.3333333334, ans=0.0 2024-09-25 14:56:08,919 INFO [train.py:1198] (0/4) Epoch 43, batch 1100, loss[loss=0.1909, ctc_loss=0.1246, cr_loss=0.3317, over 17032.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1215, cr_loss=0.3391, over 3352316.62 frames. ], batch size: 52, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:56:34,424 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.288e+02 1.354e+02 1.479e+02 2.223e+02, threshold=2.708e+02, percent-clipped=0.0 2024-09-25 14:56:34,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=768800.6666666666, ans=0.125 2024-09-25 14:56:52,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=768847.3333333334, ans=0.0 2024-09-25 14:56:53,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768847.3333333334, ans=0.1 2024-09-25 14:56:55,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=768894.0, ans=0.1 2024-09-25 14:56:58,594 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=768894.0, ans=0.125 2024-09-25 14:57:05,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=768894.0, ans=0.07 2024-09-25 14:57:10,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2024-09-25 14:57:28,428 INFO [train.py:1198] (0/4) Epoch 43, batch 1150, loss[loss=0.2224, ctc_loss=0.1449, cr_loss=0.3872, over 17140.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3388, over 3359505.44 frames. ], batch size: 48, lr: 2.77e-03, grad_scale: 16.0 2024-09-25 14:57:47,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=769034.0, ans=0.125 2024-09-25 14:57:50,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=769034.0, ans=0.125 2024-09-25 14:57:54,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=769034.0, ans=0.125 2024-09-25 14:58:07,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=769080.6666666666, ans=0.125 2024-09-25 14:58:07,823 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=12.0 2024-09-25 14:58:26,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=769127.3333333334, ans=0.0 2024-09-25 14:58:36,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=769174.0, ans=0.0 2024-09-25 14:58:48,785 INFO [train.py:1198] (0/4) Epoch 43, batch 1200, loss[loss=0.1687, ctc_loss=0.1082, cr_loss=0.3024, over 17256.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1216, cr_loss=0.3387, over 3362887.11 frames. ], batch size: 44, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 14:58:52,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=769220.6666666666, ans=0.125 2024-09-25 14:59:08,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=769267.3333333334, ans=0.125 2024-09-25 14:59:08,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-09-25 14:59:14,326 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.304e+02 1.385e+02 1.497e+02 1.972e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-25 15:00:13,441 INFO [train.py:1198] (0/4) Epoch 43, batch 1250, loss[loss=0.164, ctc_loss=0.1031, cr_loss=0.3044, over 17250.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1212, cr_loss=0.3388, over 3370411.18 frames. ], batch size: 44, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:00:24,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=769454.0, ans=0.0 2024-09-25 15:00:48,010 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=769547.3333333334, ans=0.1 2024-09-25 15:01:16,400 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=769594.0, ans=0.0 2024-09-25 15:01:26,720 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2024-09-25 15:01:38,704 INFO [train.py:1198] (0/4) Epoch 43, batch 1300, loss[loss=0.1849, ctc_loss=0.1172, cr_loss=0.3385, over 17022.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1214, cr_loss=0.3391, over 3369017.35 frames. ], batch size: 39, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:01:41,210 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=22.5 2024-09-25 15:01:42,696 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=22.5 2024-09-25 15:02:04,023 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.298e+02 1.391e+02 1.472e+02 1.717e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-25 15:02:13,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=769780.6666666666, ans=0.025 2024-09-25 15:02:21,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=769780.6666666666, ans=0.1 2024-09-25 15:02:23,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=769780.6666666666, ans=0.125 2024-09-25 15:02:28,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=769827.3333333334, ans=0.09899494936611666 2024-09-25 15:02:58,585 INFO [train.py:1198] (0/4) Epoch 43, batch 1350, loss[loss=0.1783, ctc_loss=0.1157, cr_loss=0.313, over 17114.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.339, over 3365945.33 frames. ], batch size: 40, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:03:08,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=769920.6666666666, ans=0.0 2024-09-25 15:03:29,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=770014.0, ans=0.125 2024-09-25 15:04:16,584 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=8.0 2024-09-25 15:04:21,712 INFO [train.py:1198] (0/4) Epoch 43, batch 1400, loss[loss=0.2105, ctc_loss=0.1359, cr_loss=0.373, over 16595.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.121, cr_loss=0.3379, over 3369432.09 frames. ], batch size: 66, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:04:23,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=770154.0, ans=0.125 2024-09-25 15:04:36,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=770154.0, ans=0.0 2024-09-25 15:04:36,687 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-09-25 15:04:50,141 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.179e+02 1.309e+02 1.380e+02 1.484e+02 2.530e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-25 15:05:07,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770247.3333333334, ans=0.125 2024-09-25 15:05:39,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-25 15:05:46,978 INFO [train.py:1198] (0/4) Epoch 43, batch 1450, loss[loss=0.1928, ctc_loss=0.1245, cr_loss=0.3415, over 17304.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1213, cr_loss=0.3381, over 3366918.51 frames. ], batch size: 46, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:05:47,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=770387.3333333334, ans=0.125 2024-09-25 15:06:17,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=770434.0, ans=0.05 2024-09-25 15:06:34,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=770480.6666666666, ans=0.125 2024-09-25 15:06:49,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=770527.3333333334, ans=0.0 2024-09-25 15:06:49,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.74 vs. limit=22.5 2024-09-25 15:06:57,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-25 15:07:03,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770574.0, ans=0.1 2024-09-25 15:07:06,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=770574.0, ans=0.125 2024-09-25 15:07:09,746 INFO [train.py:1198] (0/4) Epoch 43, batch 1500, loss[loss=0.1757, ctc_loss=0.1131, cr_loss=0.313, over 17345.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1209, cr_loss=0.3375, over 3362522.71 frames. ], batch size: 48, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:07:11,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=770620.6666666666, ans=0.125 2024-09-25 15:07:35,089 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.284e+02 1.350e+02 1.419e+02 1.777e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-25 15:07:48,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=770714.0, ans=0.2 2024-09-25 15:08:02,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=770760.6666666666, ans=0.125 2024-09-25 15:08:29,385 INFO [train.py:1198] (0/4) Epoch 43, batch 1550, loss[loss=0.1726, ctc_loss=0.1068, cr_loss=0.3291, over 16959.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1215, cr_loss=0.3388, over 3365758.99 frames. ], batch size: 42, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:08:31,886 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-25 15:08:55,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770900.6666666666, ans=0.125 2024-09-25 15:08:58,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=770900.6666666666, ans=0.0 2024-09-25 15:09:16,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=770947.3333333334, ans=0.0 2024-09-25 15:09:23,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=770994.0, ans=0.125 2024-09-25 15:09:48,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=12.0 2024-09-25 15:09:54,493 INFO [train.py:1198] (0/4) Epoch 43, batch 1600, loss[loss=0.2031, ctc_loss=0.1325, cr_loss=0.3529, over 15929.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1224, cr_loss=0.3402, over 3351115.01 frames. ], batch size: 74, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:10:20,125 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.278e+02 1.396e+02 1.510e+02 2.224e+02, threshold=2.791e+02, percent-clipped=0.0 2024-09-25 15:10:25,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771180.6666666666, ans=0.1 2024-09-25 15:10:26,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=771180.6666666666, ans=0.0 2024-09-25 15:10:27,258 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-25 15:10:51,310 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=771227.3333333334, ans=0.09899494936611666 2024-09-25 15:10:54,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=771227.3333333334, ans=0.09899494936611666 2024-09-25 15:11:10,711 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=771274.0, ans=0.04949747468305833 2024-09-25 15:11:19,902 INFO [train.py:1198] (0/4) Epoch 43, batch 1650, loss[loss=0.1867, ctc_loss=0.1179, cr_loss=0.344, over 16766.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1223, cr_loss=0.3401, over 3350345.71 frames. ], batch size: 61, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:11:36,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2024-09-25 15:11:46,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2024-09-25 15:12:06,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=771460.6666666666, ans=0.2 2024-09-25 15:12:15,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=771460.6666666666, ans=0.0 2024-09-25 15:12:27,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=771507.3333333334, ans=0.025 2024-09-25 15:12:33,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=22.5 2024-09-25 15:12:39,451 INFO [train.py:1198] (0/4) Epoch 43, batch 1700, loss[loss=0.201, ctc_loss=0.1258, cr_loss=0.3763, over 17018.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1212, cr_loss=0.3387, over 3355009.59 frames. ], batch size: 53, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:12:57,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=771600.6666666666, ans=0.0 2024-09-25 15:13:05,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.300e+02 1.382e+02 1.486e+02 1.912e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-25 15:13:06,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=771600.6666666666, ans=0.125 2024-09-25 15:13:28,575 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-09-25 15:13:52,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=771740.6666666666, ans=0.0 2024-09-25 15:13:57,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=771740.6666666666, ans=0.125 2024-09-25 15:14:00,170 INFO [train.py:1198] (0/4) Epoch 43, batch 1750, loss[loss=0.1954, ctc_loss=0.1251, cr_loss=0.3516, over 17307.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1214, cr_loss=0.3383, over 3347263.34 frames. ], batch size: 49, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:14:00,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=22.5 2024-09-25 15:14:24,557 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=15.0 2024-09-25 15:14:27,548 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.48 vs. limit=10.0 2024-09-25 15:14:35,055 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.63 vs. limit=10.0 2024-09-25 15:14:50,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=771880.6666666666, ans=0.125 2024-09-25 15:14:54,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.25 vs. limit=6.0 2024-09-25 15:15:06,712 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=22.5 2024-09-25 15:15:12,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771974.0, ans=0.1 2024-09-25 15:15:22,280 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=12.0 2024-09-25 15:15:24,846 INFO [train.py:1198] (0/4) Epoch 43, batch 1800, loss[loss=0.1849, ctc_loss=0.1186, cr_loss=0.3314, over 17168.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1218, cr_loss=0.339, over 3348934.38 frames. ], batch size: 45, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:15:52,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=772067.3333333334, ans=0.09899494936611666 2024-09-25 15:15:55,554 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.211e+02 1.295e+02 1.383e+02 1.459e+02 2.589e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-25 15:16:40,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772207.3333333334, ans=0.1 2024-09-25 15:16:47,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=772207.3333333334, ans=0.0 2024-09-25 15:16:49,943 INFO [train.py:1198] (0/4) Epoch 43, batch 1850, loss[loss=0.1897, ctc_loss=0.1222, cr_loss=0.3371, over 17168.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1224, cr_loss=0.3401, over 3346333.04 frames. ], batch size: 45, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:17:02,978 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=772254.0, ans=0.125 2024-09-25 15:17:17,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=772300.6666666666, ans=0.125 2024-09-25 15:17:20,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=772347.3333333334, ans=0.2 2024-09-25 15:17:24,667 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-25 15:17:29,336 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2024-09-25 15:17:48,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=772394.0, ans=0.0 2024-09-25 15:17:58,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=772440.6666666666, ans=0.025 2024-09-25 15:18:10,631 INFO [train.py:1198] (0/4) Epoch 43, batch 1900, loss[loss=0.2046, ctc_loss=0.1313, cr_loss=0.3668, over 17213.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1222, cr_loss=0.3399, over 3352058.32 frames. ], batch size: 50, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:18:16,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=772487.3333333334, ans=15.0 2024-09-25 15:18:26,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772534.0, ans=0.1 2024-09-25 15:18:31,591 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=772534.0, ans=0.0 2024-09-25 15:18:36,192 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.296e+02 1.386e+02 1.467e+02 1.936e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 15:18:55,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=772580.6666666666, ans=0.125 2024-09-25 15:19:35,530 INFO [train.py:1198] (0/4) Epoch 43, batch 1950, loss[loss=0.1954, ctc_loss=0.1222, cr_loss=0.3661, over 17209.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1219, cr_loss=0.3396, over 3348098.57 frames. ], batch size: 47, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:19:43,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=772720.6666666666, ans=0.015 2024-09-25 15:19:51,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=772767.3333333334, ans=0.125 2024-09-25 15:20:00,045 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2024-09-25 15:20:37,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-09-25 15:20:41,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=772907.3333333334, ans=0.0 2024-09-25 15:21:00,968 INFO [train.py:1198] (0/4) Epoch 43, batch 2000, loss[loss=0.1667, ctc_loss=0.1047, cr_loss=0.31, over 17098.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1222, cr_loss=0.3398, over 3348221.51 frames. ], batch size: 43, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:21:26,607 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.269e+02 1.346e+02 1.449e+02 2.025e+02, threshold=2.692e+02, percent-clipped=0.0 2024-09-25 15:21:59,300 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2024-09-25 15:22:16,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=773140.6666666666, ans=0.125 2024-09-25 15:22:20,775 INFO [train.py:1198] (0/4) Epoch 43, batch 2050, loss[loss=0.162, ctc_loss=0.1018, cr_loss=0.3011, over 17126.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1221, cr_loss=0.3395, over 3333705.20 frames. ], batch size: 41, lr: 2.76e-03, grad_scale: 32.0 2024-09-25 15:22:27,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=773187.3333333334, ans=0.0 2024-09-25 15:22:36,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-09-25 15:22:48,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=773234.0, ans=0.2 2024-09-25 15:22:51,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=773280.6666666666, ans=0.125 2024-09-25 15:23:18,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=773327.3333333334, ans=0.0 2024-09-25 15:23:20,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=773327.3333333334, ans=0.2 2024-09-25 15:23:34,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773374.0, ans=0.1 2024-09-25 15:23:40,846 INFO [train.py:1198] (0/4) Epoch 43, batch 2100, loss[loss=0.2168, ctc_loss=0.1405, cr_loss=0.3818, over 16055.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1223, cr_loss=0.3398, over 3337474.47 frames. ], batch size: 74, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:23:41,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-25 15:23:46,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=773420.6666666666, ans=0.0 2024-09-25 15:23:52,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=773420.6666666666, ans=0.0 2024-09-25 15:23:58,802 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=773467.3333333334, ans=0.0 2024-09-25 15:24:10,504 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.257e+02 1.347e+02 1.435e+02 1.926e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-25 15:24:43,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773560.6666666666, ans=0.1 2024-09-25 15:24:56,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=773607.3333333334, ans=0.035 2024-09-25 15:24:56,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=773607.3333333334, ans=0.125 2024-09-25 15:25:05,723 INFO [train.py:1198] (0/4) Epoch 43, batch 2150, loss[loss=0.1779, ctc_loss=0.1127, cr_loss=0.3261, over 16961.00 frames. ], tot_loss[loss=0.1913, ctc_loss=0.1229, cr_loss=0.3416, over 3336283.58 frames. ], batch size: 42, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:25:13,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2024-09-25 15:25:17,407 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=773654.0, ans=0.125 2024-09-25 15:25:18,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=773654.0, ans=0.0 2024-09-25 15:25:26,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=773700.6666666666, ans=0.0 2024-09-25 15:26:31,340 INFO [train.py:1198] (0/4) Epoch 43, batch 2200, loss[loss=0.1519, ctc_loss=0.09741, cr_loss=0.2723, over 17111.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1228, cr_loss=0.3417, over 3338149.97 frames. ], batch size: 40, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:26:44,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=773887.3333333334, ans=0.04949747468305833 2024-09-25 15:26:49,203 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=773934.0, ans=0.2 2024-09-25 15:26:58,330 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.324e+02 1.413e+02 1.558e+02 2.550e+02, threshold=2.826e+02, percent-clipped=0.0 2024-09-25 15:27:50,923 INFO [train.py:1198] (0/4) Epoch 43, batch 2250, loss[loss=0.164, ctc_loss=0.1016, cr_loss=0.312, over 17119.00 frames. ], tot_loss[loss=0.1907, ctc_loss=0.1225, cr_loss=0.3409, over 3342519.88 frames. ], batch size: 40, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:28:35,186 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=22.5 2024-09-25 15:28:36,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774214.0, ans=0.125 2024-09-25 15:28:37,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=774260.6666666666, ans=0.025 2024-09-25 15:29:00,178 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=774307.3333333334, ans=0.125 2024-09-25 15:29:12,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=774354.0, ans=0.125 2024-09-25 15:29:13,566 INFO [train.py:1198] (0/4) Epoch 43, batch 2300, loss[loss=0.1889, ctc_loss=0.1214, cr_loss=0.3371, over 17095.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1214, cr_loss=0.3386, over 3339329.99 frames. ], batch size: 49, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:29:15,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=774354.0, ans=0.0 2024-09-25 15:29:32,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=774400.6666666666, ans=0.1 2024-09-25 15:29:35,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=774400.6666666666, ans=0.125 2024-09-25 15:29:38,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=774400.6666666666, ans=0.0 2024-09-25 15:29:42,935 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.301e+02 1.372e+02 1.481e+02 2.158e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-25 15:29:59,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=774447.3333333334, ans=0.2 2024-09-25 15:29:59,772 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2024-09-25 15:30:11,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=774494.0, ans=0.0 2024-09-25 15:30:13,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=774494.0, ans=0.125 2024-09-25 15:30:15,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.49 vs. limit=6.0 2024-09-25 15:30:35,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-09-25 15:30:37,779 INFO [train.py:1198] (0/4) Epoch 43, batch 2350, loss[loss=0.2127, ctc_loss=0.1371, cr_loss=0.3778, over 17047.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1218, cr_loss=0.3399, over 3334085.01 frames. ], batch size: 52, lr: 2.76e-03, grad_scale: 16.0 2024-09-25 15:30:42,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=774587.3333333334, ans=0.015 2024-09-25 15:30:51,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=774587.3333333334, ans=0.125 2024-09-25 15:31:54,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=774774.0, ans=0.025 2024-09-25 15:32:00,370 INFO [train.py:1198] (0/4) Epoch 43, batch 2400, loss[loss=0.1806, ctc_loss=0.1152, cr_loss=0.3268, over 17351.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.122, cr_loss=0.3399, over 3341276.72 frames. ], batch size: 48, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:32:04,483 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2024-09-25 15:32:27,696 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.292e+02 1.363e+02 1.432e+02 2.291e+02, threshold=2.726e+02, percent-clipped=0.0 2024-09-25 15:32:32,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=774914.0, ans=0.125 2024-09-25 15:32:44,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=774914.0, ans=0.2 2024-09-25 15:32:47,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=774960.6666666666, ans=0.125 2024-09-25 15:32:58,836 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:33:06,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=775007.3333333334, ans=0.0 2024-09-25 15:33:16,446 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:33:20,862 INFO [train.py:1198] (0/4) Epoch 43, batch 2450, loss[loss=0.2174, ctc_loss=0.1405, cr_loss=0.3843, over 17018.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1217, cr_loss=0.3393, over 3353541.95 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:33:32,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=775054.0, ans=0.125 2024-09-25 15:33:35,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=775100.6666666666, ans=0.2 2024-09-25 15:34:04,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=775147.3333333334, ans=0.125 2024-09-25 15:34:11,949 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2024-09-25 15:34:17,713 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=775194.0, ans=0.07 2024-09-25 15:34:34,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=775240.6666666666, ans=0.0 2024-09-25 15:34:39,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=775240.6666666666, ans=0.125 2024-09-25 15:34:44,130 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=775287.3333333334, ans=0.125 2024-09-25 15:34:45,576 INFO [train.py:1198] (0/4) Epoch 43, batch 2500, loss[loss=0.1668, ctc_loss=0.1061, cr_loss=0.3039, over 16738.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1211, cr_loss=0.3379, over 3358630.39 frames. ], batch size: 37, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:35:06,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=775334.0, ans=0.04949747468305833 2024-09-25 15:35:13,017 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.056e+02 1.284e+02 1.374e+02 1.483e+02 2.994e+02, threshold=2.747e+02, percent-clipped=1.0 2024-09-25 15:35:16,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=775380.6666666666, ans=0.0 2024-09-25 15:35:24,954 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775380.6666666666, ans=0.1 2024-09-25 15:35:41,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=775427.3333333334, ans=0.07 2024-09-25 15:35:47,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=775427.3333333334, ans=0.2 2024-09-25 15:36:08,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775474.0, ans=0.1 2024-09-25 15:36:10,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=775520.6666666666, ans=0.125 2024-09-25 15:36:11,635 INFO [train.py:1198] (0/4) Epoch 43, batch 2550, loss[loss=0.1828, ctc_loss=0.1171, cr_loss=0.3283, over 17088.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1208, cr_loss=0.3378, over 3361088.30 frames. ], batch size: 43, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:36:25,550 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2024-09-25 15:36:33,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-25 15:37:32,114 INFO [train.py:1198] (0/4) Epoch 43, batch 2600, loss[loss=0.2147, ctc_loss=0.1356, cr_loss=0.3956, over 17210.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1209, cr_loss=0.3383, over 3356739.82 frames. ], batch size: 47, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:37:58,960 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.271e+02 1.387e+02 1.465e+02 2.023e+02, threshold=2.774e+02, percent-clipped=0.0 2024-09-25 15:38:15,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=775847.3333333334, ans=0.125 2024-09-25 15:38:34,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=775940.6666666666, ans=0.0 2024-09-25 15:38:51,971 INFO [train.py:1198] (0/4) Epoch 43, batch 2650, loss[loss=0.1688, ctc_loss=0.1081, cr_loss=0.3034, over 17152.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3378, over 3356567.25 frames. ], batch size: 48, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:38:59,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=775987.3333333334, ans=0.125 2024-09-25 15:39:31,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=776080.6666666666, ans=0.125 2024-09-25 15:39:46,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=776127.3333333334, ans=0.125 2024-09-25 15:39:50,757 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-09-25 15:39:59,593 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=2.352e-02 2024-09-25 15:40:06,549 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2024-09-25 15:40:09,313 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=776174.0, ans=0.04949747468305833 2024-09-25 15:40:16,630 INFO [train.py:1198] (0/4) Epoch 43, batch 2700, loss[loss=0.1897, ctc_loss=0.1219, cr_loss=0.3389, over 17269.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.3376, over 3364152.12 frames. ], batch size: 44, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:40:49,149 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.311e+02 1.386e+02 1.489e+02 1.705e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 15:40:52,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=776314.0, ans=0.0 2024-09-25 15:41:16,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=776360.6666666666, ans=0.0 2024-09-25 15:41:16,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-25 15:41:24,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-25 15:41:31,137 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=22.5 2024-09-25 15:41:31,299 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2024-09-25 15:41:32,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=776407.3333333334, ans=0.0 2024-09-25 15:41:41,506 INFO [train.py:1198] (0/4) Epoch 43, batch 2750, loss[loss=0.1639, ctc_loss=0.09878, cr_loss=0.3257, over 16712.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1202, cr_loss=0.3372, over 3371110.64 frames. ], batch size: 37, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:41:51,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=776454.0, ans=0.0 2024-09-25 15:42:14,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=776547.3333333334, ans=0.2 2024-09-25 15:42:22,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=776547.3333333334, ans=0.0 2024-09-25 15:42:38,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=776594.0, ans=0.125 2024-09-25 15:42:51,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=776640.6666666666, ans=0.025 2024-09-25 15:42:52,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=776640.6666666666, ans=0.125 2024-09-25 15:43:02,157 INFO [train.py:1198] (0/4) Epoch 43, batch 2800, loss[loss=0.2192, ctc_loss=0.1433, cr_loss=0.3794, over 17000.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1213, cr_loss=0.339, over 3368633.13 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:43:29,177 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.317e+02 1.406e+02 1.487e+02 1.823e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 15:43:50,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776827.3333333334, ans=0.1 2024-09-25 15:43:59,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=776827.3333333334, ans=0.125 2024-09-25 15:44:27,275 INFO [train.py:1198] (0/4) Epoch 43, batch 2850, loss[loss=0.1973, ctc_loss=0.1282, cr_loss=0.3453, over 17314.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1213, cr_loss=0.3386, over 3368354.70 frames. ], batch size: 46, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:44:59,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=777014.0, ans=0.125 2024-09-25 15:45:00,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=777014.0, ans=0.015 2024-09-25 15:45:52,192 INFO [train.py:1198] (0/4) Epoch 43, batch 2900, loss[loss=0.1879, ctc_loss=0.1198, cr_loss=0.3405, over 17143.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3389, over 3370572.22 frames. ], batch size: 48, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:46:19,411 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.318e+02 1.398e+02 1.451e+02 2.249e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-25 15:47:06,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777340.6666666666, ans=0.1 2024-09-25 15:47:12,743 INFO [train.py:1198] (0/4) Epoch 43, batch 2950, loss[loss=0.1805, ctc_loss=0.1172, cr_loss=0.3168, over 17229.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1214, cr_loss=0.3385, over 3369117.73 frames. ], batch size: 50, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:47:18,193 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-09-25 15:47:42,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=777434.0, ans=0.125 2024-09-25 15:47:54,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=777480.6666666666, ans=0.125 2024-09-25 15:48:33,058 INFO [train.py:1198] (0/4) Epoch 43, batch 3000, loss[loss=0.1877, ctc_loss=0.1201, cr_loss=0.338, over 17249.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.122, cr_loss=0.3396, over 3362151.87 frames. ], batch size: 44, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:48:33,059 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 15:48:50,096 INFO [train.py:1230] (0/4) Epoch 43, validation: loss=0.03539, ctc_loss=0.03539, cr_loss=1.015e-14, over 944034.00 frames. 2024-09-25 15:48:50,096 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 15:48:58,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=777620.6666666666, ans=0.125 2024-09-25 15:48:59,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=777620.6666666666, ans=0.125 2024-09-25 15:49:06,635 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2024-09-25 15:49:12,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-09-25 15:49:16,945 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.046e+02 1.300e+02 1.373e+02 1.452e+02 1.992e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-25 15:49:20,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=777714.0, ans=0.125 2024-09-25 15:49:38,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777760.6666666666, ans=0.1 2024-09-25 15:50:09,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=777854.0, ans=0.125 2024-09-25 15:50:10,659 INFO [train.py:1198] (0/4) Epoch 43, batch 3050, loss[loss=0.1607, ctc_loss=0.1022, cr_loss=0.2923, over 17101.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1219, cr_loss=0.3392, over 3357852.76 frames. ], batch size: 49, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 15:50:21,646 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=777854.0, ans=0.125 2024-09-25 15:51:00,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=777994.0, ans=0.0 2024-09-25 15:51:05,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=777994.0, ans=0.125 2024-09-25 15:51:11,537 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=778040.6666666666, ans=0.05 2024-09-25 15:51:28,335 INFO [train.py:1198] (0/4) Epoch 43, batch 3100, loss[loss=0.1953, ctc_loss=0.1255, cr_loss=0.3491, over 16998.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.122, cr_loss=0.3395, over 3366327.96 frames. ], batch size: 56, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 15:51:55,992 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:51:58,668 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.089e+02 1.269e+02 1.353e+02 1.457e+02 2.895e+02, threshold=2.707e+02, percent-clipped=1.0 2024-09-25 15:52:14,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=778180.6666666666, ans=0.2 2024-09-25 15:52:19,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778227.3333333334, ans=0.1 2024-09-25 15:52:47,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=778274.0, ans=0.125 2024-09-25 15:52:51,553 INFO [train.py:1198] (0/4) Epoch 43, batch 3150, loss[loss=0.23, ctc_loss=0.1484, cr_loss=0.4081, over 16566.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1221, cr_loss=0.3395, over 3361892.70 frames. ], batch size: 66, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 15:53:22,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778414.0, ans=0.1 2024-09-25 15:53:30,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=778414.0, ans=0.125 2024-09-25 15:53:40,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=778460.6666666666, ans=0.0 2024-09-25 15:53:51,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=778460.6666666666, ans=0.125 2024-09-25 15:54:10,165 INFO [train.py:1198] (0/4) Epoch 43, batch 3200, loss[loss=0.1594, ctc_loss=0.1009, cr_loss=0.2925, over 16953.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.121, cr_loss=0.3377, over 3369733.07 frames. ], batch size: 42, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:54:10,827 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=12.0 2024-09-25 15:54:15,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=778554.0, ans=0.125 2024-09-25 15:54:34,244 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=22.5 2024-09-25 15:54:38,171 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.301e+02 1.369e+02 1.499e+02 2.028e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 15:54:51,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=778647.3333333334, ans=0.05 2024-09-25 15:55:02,970 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-09-25 15:55:04,577 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2024-09-25 15:55:11,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=778740.6666666666, ans=0.2 2024-09-25 15:55:28,992 INFO [train.py:1198] (0/4) Epoch 43, batch 3250, loss[loss=0.2165, ctc_loss=0.139, cr_loss=0.3873, over 17230.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1211, cr_loss=0.3383, over 3366308.88 frames. ], batch size: 50, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:55:34,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2024-09-25 15:55:41,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=778787.3333333334, ans=0.125 2024-09-25 15:55:56,159 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.33 vs. limit=15.0 2024-09-25 15:56:46,770 INFO [train.py:1198] (0/4) Epoch 43, batch 3300, loss[loss=0.1986, ctc_loss=0.1251, cr_loss=0.3676, over 17212.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1216, cr_loss=0.3393, over 3354237.38 frames. ], batch size: 50, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:56:55,184 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=12.0 2024-09-25 15:56:56,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=779020.6666666666, ans=0.0 2024-09-25 15:57:11,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=779067.3333333334, ans=0.125 2024-09-25 15:57:13,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=779067.3333333334, ans=0.0 2024-09-25 15:57:14,796 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.314e+02 1.383e+02 1.471e+02 2.059e+02, threshold=2.766e+02, percent-clipped=0.0 2024-09-25 15:57:46,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=779160.6666666666, ans=0.2 2024-09-25 15:57:55,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=779207.3333333334, ans=0.5 2024-09-25 15:58:04,609 INFO [train.py:1198] (0/4) Epoch 43, batch 3350, loss[loss=0.199, ctc_loss=0.1297, cr_loss=0.3463, over 17061.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1218, cr_loss=0.3394, over 3345704.67 frames. ], batch size: 46, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:58:16,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=22.5 2024-09-25 15:58:18,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=779300.6666666666, ans=0.09899494936611666 2024-09-25 15:58:20,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=779300.6666666666, ans=0.2 2024-09-25 15:58:24,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=779300.6666666666, ans=0.015 2024-09-25 15:58:46,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=779347.3333333334, ans=0.0 2024-09-25 15:58:50,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=779394.0, ans=0.0 2024-09-25 15:59:01,680 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 15:59:16,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.24 vs. limit=6.0 2024-09-25 15:59:23,220 INFO [train.py:1198] (0/4) Epoch 43, batch 3400, loss[loss=0.1648, ctc_loss=0.1038, cr_loss=0.3046, over 17179.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1219, cr_loss=0.3397, over 3352196.94 frames. ], batch size: 41, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 15:59:28,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=779487.3333333334, ans=0.125 2024-09-25 15:59:55,330 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.291e+02 1.360e+02 1.462e+02 4.308e+02, threshold=2.720e+02, percent-clipped=1.0 2024-09-25 16:00:03,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=779580.6666666666, ans=0.2 2024-09-25 16:00:14,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=779627.3333333334, ans=0.125 2024-09-25 16:00:28,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=779674.0, ans=0.05 2024-09-25 16:00:42,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=779674.0, ans=0.125 2024-09-25 16:00:45,027 INFO [train.py:1198] (0/4) Epoch 43, batch 3450, loss[loss=0.2229, ctc_loss=0.1443, cr_loss=0.3932, over 17223.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1221, cr_loss=0.3399, over 3364500.70 frames. ], batch size: 55, lr: 2.75e-03, grad_scale: 32.0 2024-09-25 16:00:57,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=779720.6666666666, ans=0.125 2024-09-25 16:01:06,337 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2024-09-25 16:01:18,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=779814.0, ans=0.5 2024-09-25 16:01:41,865 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2024-09-25 16:01:43,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=779860.6666666666, ans=0.0 2024-09-25 16:01:55,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=779907.3333333334, ans=0.125 2024-09-25 16:01:57,841 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2024-09-25 16:02:03,508 INFO [train.py:1198] (0/4) Epoch 43, batch 3500, loss[loss=0.1995, ctc_loss=0.1282, cr_loss=0.3565, over 16984.00 frames. ], tot_loss[loss=0.1905, ctc_loss=0.1224, cr_loss=0.3405, over 3352503.83 frames. ], batch size: 53, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 16:02:13,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.69 vs. limit=6.0 2024-09-25 16:02:26,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=780000.6666666666, ans=0.125 2024-09-25 16:02:26,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-09-25 16:02:27,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=780000.6666666666, ans=0.125 2024-09-25 16:02:35,356 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.093e+02 1.311e+02 1.389e+02 1.472e+02 1.940e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-25 16:02:56,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780094.0, ans=0.1 2024-09-25 16:03:16,910 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=780140.6666666666, ans=0.0 2024-09-25 16:03:24,820 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:03:26,021 INFO [train.py:1198] (0/4) Epoch 43, batch 3550, loss[loss=0.1619, ctc_loss=0.1028, cr_loss=0.2957, over 17158.00 frames. ], tot_loss[loss=0.1909, ctc_loss=0.1226, cr_loss=0.3414, over 3348023.55 frames. ], batch size: 41, lr: 2.75e-03, grad_scale: 16.0 2024-09-25 16:03:27,753 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780187.3333333334, ans=0.1 2024-09-25 16:03:59,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=780280.6666666666, ans=0.125 2024-09-25 16:04:00,579 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=22.5 2024-09-25 16:04:06,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=780280.6666666666, ans=0.2 2024-09-25 16:04:14,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.47 vs. limit=15.0 2024-09-25 16:04:16,057 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.10 vs. limit=10.0 2024-09-25 16:04:43,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=780420.6666666666, ans=0.1 2024-09-25 16:04:44,773 INFO [train.py:1198] (0/4) Epoch 43, batch 3600, loss[loss=0.188, ctc_loss=0.1209, cr_loss=0.3354, over 17016.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1227, cr_loss=0.3421, over 3354965.65 frames. ], batch size: 51, lr: 2.74e-03, grad_scale: 32.0 2024-09-25 16:04:45,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2024-09-25 16:04:49,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=780420.6666666666, ans=0.125 2024-09-25 16:05:10,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=780467.3333333334, ans=0.1 2024-09-25 16:05:14,401 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.305e+02 1.407e+02 1.507e+02 1.948e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 16:05:19,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=780514.0, ans=0.0 2024-09-25 16:06:03,265 INFO [train.py:1198] (0/4) Epoch 43, batch 3650, loss[loss=0.1927, ctc_loss=0.1232, cr_loss=0.3475, over 17109.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.121, cr_loss=0.3391, over 3368900.53 frames. ], batch size: 49, lr: 2.74e-03, grad_scale: 32.0 2024-09-25 16:06:05,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.17 vs. limit=15.0 2024-09-25 16:06:07,461 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2024-09-25 16:06:10,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=780654.0, ans=0.125 2024-09-25 16:06:13,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=780654.0, ans=0.0 2024-09-25 16:06:22,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=780700.6666666666, ans=0.0 2024-09-25 16:06:36,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=780747.3333333334, ans=0.1 2024-09-25 16:06:42,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780747.3333333334, ans=0.1 2024-09-25 16:06:55,381 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:07:06,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=780840.6666666666, ans=0.035 2024-09-25 16:07:21,990 INFO [train.py:1198] (0/4) Epoch 43, batch 3700, loss[loss=0.1603, ctc_loss=0.101, cr_loss=0.2965, over 17021.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1211, cr_loss=0.3388, over 3362242.37 frames. ], batch size: 39, lr: 2.74e-03, grad_scale: 32.0 2024-09-25 16:07:25,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=780887.3333333334, ans=0.0 2024-09-25 16:07:45,117 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=780934.0, ans=0.04949747468305833 2024-09-25 16:07:52,634 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.295e+02 1.384e+02 1.500e+02 2.354e+02, threshold=2.769e+02, percent-clipped=0.0 2024-09-25 16:08:08,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=781027.3333333334, ans=0.0 2024-09-25 16:08:19,435 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=781027.3333333334, ans=0.125 2024-09-25 16:08:41,139 INFO [train.py:1198] (0/4) Epoch 43, batch 3750, loss[loss=0.2213, ctc_loss=0.1443, cr_loss=0.385, over 16646.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3374, over 3354124.41 frames. ], batch size: 66, lr: 2.74e-03, grad_scale: 16.0 2024-09-25 16:09:04,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=781167.3333333334, ans=0.125 2024-09-25 16:09:06,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=781167.3333333334, ans=0.2 2024-09-25 16:09:23,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=781214.0, ans=0.0 2024-09-25 16:09:28,734 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=22.5 2024-09-25 16:10:00,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=781354.0, ans=0.125 2024-09-25 16:10:01,217 INFO [train.py:1198] (0/4) Epoch 43, batch 3800, loss[loss=0.1718, ctc_loss=0.1078, cr_loss=0.3201, over 17026.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.122, cr_loss=0.34, over 3332789.31 frames. ], batch size: 53, lr: 2.74e-03, grad_scale: 16.0 2024-09-25 16:10:18,964 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=781400.6666666666, ans=0.0 2024-09-25 16:10:23,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=781400.6666666666, ans=0.0 2024-09-25 16:10:32,810 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.355e+02 1.451e+02 1.569e+02 1.796e+02, threshold=2.903e+02, percent-clipped=0.0 2024-09-25 16:10:49,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-25 16:11:08,760 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-25 16:11:20,436 INFO [train.py:1198] (0/4) Epoch 43, batch 3850, loss[loss=0.2067, ctc_loss=0.1308, cr_loss=0.3793, over 16924.00 frames. ], tot_loss[loss=0.1918, ctc_loss=0.1234, cr_loss=0.342, over 3295692.73 frames. ], batch size: 58, lr: 2.74e-03, grad_scale: 16.0 2024-09-25 16:11:21,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2024-09-25 16:11:21,367 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2024-09-25 16:11:48,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781634.0, ans=0.1 2024-09-25 16:11:59,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=781680.6666666666, ans=0.0 2024-09-25 16:12:01,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2024-09-25 16:12:10,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.94 vs. limit=10.0 2024-09-25 16:12:19,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=781727.3333333334, ans=0.025 2024-09-25 16:12:24,119 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2024-09-25 16:12:30,923 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-43.pt 2024-09-25 16:13:18,296 INFO [train.py:1198] (0/4) Epoch 44, batch 0, loss[loss=0.2347, ctc_loss=0.1569, cr_loss=0.3893, over 15071.00 frames. ], tot_loss[loss=0.2347, ctc_loss=0.1569, cr_loss=0.3893, over 15071.00 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:13:18,296 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 16:13:33,583 INFO [train.py:1230] (0/4) Epoch 44, validation: loss=0.03507, ctc_loss=0.03507, cr_loss=1.053e-14, over 944034.00 frames. 2024-09-25 16:13:33,584 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 16:13:54,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=781848.6666666666, ans=0.2 2024-09-25 16:14:14,724 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.434e+02 1.591e+02 1.721e+02 2.734e+02, threshold=3.183e+02, percent-clipped=0.0 2024-09-25 16:14:21,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=781895.3333333334, ans=0.125 2024-09-25 16:14:52,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781988.6666666666, ans=0.1 2024-09-25 16:14:54,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=781988.6666666666, ans=0.0 2024-09-25 16:14:59,049 INFO [train.py:1198] (0/4) Epoch 44, batch 50, loss[loss=0.1583, ctc_loss=0.1004, cr_loss=0.2896, over 17012.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1198, cr_loss=0.3362, over 758253.06 frames. ], batch size: 44, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:15:15,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=782082.0, ans=0.015 2024-09-25 16:15:18,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=782082.0, ans=0.125 2024-09-25 16:15:24,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=782082.0, ans=0.1 2024-09-25 16:15:28,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=782082.0, ans=0.125 2024-09-25 16:15:31,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=782128.6666666666, ans=0.0 2024-09-25 16:15:34,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=782128.6666666666, ans=0.125 2024-09-25 16:15:50,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782175.3333333334, ans=0.1 2024-09-25 16:16:01,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=782222.0, ans=0.025 2024-09-25 16:16:10,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=782222.0, ans=15.0 2024-09-25 16:16:18,759 INFO [train.py:1198] (0/4) Epoch 44, batch 100, loss[loss=0.2044, ctc_loss=0.1289, cr_loss=0.3775, over 17151.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1204, cr_loss=0.3365, over 1334218.55 frames. ], batch size: 45, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:16:19,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=782268.6666666666, ans=0.0 2024-09-25 16:16:33,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=782315.3333333334, ans=0.125 2024-09-25 16:16:53,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=782362.0, ans=0.0 2024-09-25 16:16:57,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=782362.0, ans=0.2 2024-09-25 16:17:00,120 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.102e+02 1.309e+02 1.396e+02 1.536e+02 2.062e+02, threshold=2.792e+02, percent-clipped=0.0 2024-09-25 16:17:00,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=782362.0, ans=0.0 2024-09-25 16:17:41,587 INFO [train.py:1198] (0/4) Epoch 44, batch 150, loss[loss=0.2039, ctc_loss=0.1343, cr_loss=0.348, over 16239.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1206, cr_loss=0.3367, over 1766740.09 frames. ], batch size: 74, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:17:48,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=782502.0, ans=0.125 2024-09-25 16:18:05,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=782548.6666666666, ans=0.125 2024-09-25 16:18:38,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-09-25 16:18:48,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=782688.6666666666, ans=0.125 2024-09-25 16:18:52,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2024-09-25 16:19:07,047 INFO [train.py:1198] (0/4) Epoch 44, batch 200, loss[loss=0.2195, ctc_loss=0.1426, cr_loss=0.3848, over 16075.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1205, cr_loss=0.3366, over 2123106.74 frames. ], batch size: 74, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:19:07,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=782735.3333333334, ans=0.125 2024-09-25 16:19:10,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=782735.3333333334, ans=0.025 2024-09-25 16:19:18,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=782735.3333333334, ans=0.125 2024-09-25 16:19:48,491 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.333e+02 1.408e+02 1.534e+02 2.430e+02, threshold=2.816e+02, percent-clipped=0.0 2024-09-25 16:20:30,832 INFO [train.py:1198] (0/4) Epoch 44, batch 250, loss[loss=0.2097, ctc_loss=0.1324, cr_loss=0.3863, over 17224.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1201, cr_loss=0.3355, over 2391925.13 frames. ], batch size: 47, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:20:47,070 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:20:59,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=783015.3333333334, ans=0.0 2024-09-25 16:21:10,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2024-09-25 16:21:29,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2024-09-25 16:21:54,066 INFO [train.py:1198] (0/4) Epoch 44, batch 300, loss[loss=0.1628, ctc_loss=0.1011, cr_loss=0.3084, over 17274.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1204, cr_loss=0.3359, over 2598370.76 frames. ], batch size: 42, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:21:56,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=783202.0, ans=0.125 2024-09-25 16:22:07,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=783202.0, ans=0.125 2024-09-25 16:22:08,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=783248.6666666666, ans=0.0 2024-09-25 16:22:12,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=783248.6666666666, ans=0.2 2024-09-25 16:22:18,389 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:22:29,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2024-09-25 16:22:32,113 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.326e+02 1.428e+02 1.549e+02 2.699e+02, threshold=2.856e+02, percent-clipped=0.0 2024-09-25 16:22:43,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=783342.0, ans=0.125 2024-09-25 16:22:46,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=783342.0, ans=0.125 2024-09-25 16:23:16,543 INFO [train.py:1198] (0/4) Epoch 44, batch 350, loss[loss=0.1851, ctc_loss=0.1155, cr_loss=0.3476, over 17212.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.121, cr_loss=0.3368, over 2751630.76 frames. ], batch size: 41, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:23:50,992 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.20 vs. limit=22.5 2024-09-25 16:23:57,000 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=783528.6666666666, ans=0.0 2024-09-25 16:23:57,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=783528.6666666666, ans=0.07 2024-09-25 16:24:10,976 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=783575.3333333334, ans=0.125 2024-09-25 16:24:33,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2024-09-25 16:24:39,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783622.0, ans=0.1 2024-09-25 16:24:42,413 INFO [train.py:1198] (0/4) Epoch 44, batch 400, loss[loss=0.1744, ctc_loss=0.1103, cr_loss=0.3204, over 17117.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.122, cr_loss=0.3379, over 2869447.31 frames. ], batch size: 40, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:24:43,127 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2024-09-25 16:25:19,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=783762.0, ans=0.0 2024-09-25 16:25:20,975 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.320e+02 1.420e+02 1.550e+02 2.069e+02, threshold=2.840e+02, percent-clipped=0.0 2024-09-25 16:25:23,464 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=22.5 2024-09-25 16:26:01,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=783902.0, ans=0.2 2024-09-25 16:26:02,567 INFO [train.py:1198] (0/4) Epoch 44, batch 450, loss[loss=0.219, ctc_loss=0.1421, cr_loss=0.3844, over 17032.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1218, cr_loss=0.3383, over 2979298.66 frames. ], batch size: 56, lr: 2.71e-03, grad_scale: 32.0 2024-09-25 16:26:35,399 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-168000.pt 2024-09-25 16:27:00,341 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=15.0 2024-09-25 16:27:28,174 INFO [train.py:1198] (0/4) Epoch 44, batch 500, loss[loss=0.177, ctc_loss=0.1135, cr_loss=0.3178, over 17206.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1205, cr_loss=0.3365, over 3071889.60 frames. ], batch size: 47, lr: 2.71e-03, grad_scale: 16.0 2024-09-25 16:27:41,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2024-09-25 16:28:08,419 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.61 vs. limit=15.0 2024-09-25 16:28:10,663 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.312e+02 1.366e+02 1.455e+02 1.775e+02, threshold=2.732e+02, percent-clipped=0.0 2024-09-25 16:28:39,441 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=784322.0, ans=0.025 2024-09-25 16:28:41,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784322.0, ans=0.1 2024-09-25 16:28:44,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784322.0, ans=0.1 2024-09-25 16:28:45,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=784322.0, ans=0.2 2024-09-25 16:28:50,234 INFO [train.py:1198] (0/4) Epoch 44, batch 550, loss[loss=0.1772, ctc_loss=0.111, cr_loss=0.3312, over 17107.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1202, cr_loss=0.336, over 3134579.36 frames. ], batch size: 43, lr: 2.71e-03, grad_scale: 16.0 2024-09-25 16:28:55,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=784368.6666666666, ans=0.5 2024-09-25 16:29:00,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=784368.6666666666, ans=0.0 2024-09-25 16:29:07,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=784415.3333333334, ans=0.125 2024-09-25 16:29:08,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=784415.3333333334, ans=0.125 2024-09-25 16:29:18,761 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:30:13,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=784602.0, ans=0.0 2024-09-25 16:30:14,115 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2024-09-25 16:30:15,134 INFO [train.py:1198] (0/4) Epoch 44, batch 600, loss[loss=0.1842, ctc_loss=0.1166, cr_loss=0.3379, over 17156.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1188, cr_loss=0.3334, over 3194466.59 frames. ], batch size: 48, lr: 2.71e-03, grad_scale: 16.0 2024-09-25 16:30:17,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=784602.0, ans=0.125 2024-09-25 16:30:20,292 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:30:55,331 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.308e+02 1.386e+02 1.485e+02 2.110e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 16:30:57,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=22.5 2024-09-25 16:31:07,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0 2024-09-25 16:31:11,982 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=784742.0, ans=0.125 2024-09-25 16:31:35,712 INFO [train.py:1198] (0/4) Epoch 44, batch 650, loss[loss=0.1896, ctc_loss=0.1221, cr_loss=0.3373, over 16666.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1188, cr_loss=0.3334, over 3224592.49 frames. ], batch size: 61, lr: 2.71e-03, grad_scale: 16.0 2024-09-25 16:31:39,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=784835.3333333334, ans=0.02 2024-09-25 16:32:20,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=784928.6666666666, ans=0.05 2024-09-25 16:32:26,103 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=22.5 2024-09-25 16:32:28,901 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=784975.3333333334, ans=0.125 2024-09-25 16:32:35,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=784975.3333333334, ans=0.125 2024-09-25 16:32:36,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=784975.3333333334, ans=0.0 2024-09-25 16:32:44,998 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=785022.0, ans=0.07 2024-09-25 16:32:46,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785022.0, ans=0.1 2024-09-25 16:32:56,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=785022.0, ans=0.125 2024-09-25 16:32:59,116 INFO [train.py:1198] (0/4) Epoch 44, batch 700, loss[loss=0.1531, ctc_loss=0.09496, cr_loss=0.2909, over 16323.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1186, cr_loss=0.3331, over 3256203.93 frames. ], batch size: 36, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:33:13,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=785068.6666666666, ans=0.0 2024-09-25 16:33:15,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=12.0 2024-09-25 16:33:29,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=785115.3333333334, ans=0.125 2024-09-25 16:33:41,733 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.292e+02 1.388e+02 1.458e+02 2.267e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 16:33:50,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2024-09-25 16:33:58,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=12.0 2024-09-25 16:33:59,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785208.6666666666, ans=0.1 2024-09-25 16:34:11,738 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=785255.3333333334, ans=0.0 2024-09-25 16:34:22,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=785302.0, ans=0.025 2024-09-25 16:34:24,044 INFO [train.py:1198] (0/4) Epoch 44, batch 750, loss[loss=0.2109, ctc_loss=0.1357, cr_loss=0.3756, over 15956.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1182, cr_loss=0.3326, over 3280994.97 frames. ], batch size: 74, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:34:26,227 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=12.0 2024-09-25 16:35:34,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=785488.6666666666, ans=0.0 2024-09-25 16:35:46,669 INFO [train.py:1198] (0/4) Epoch 44, batch 800, loss[loss=0.1568, ctc_loss=0.09919, cr_loss=0.2881, over 17279.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1189, cr_loss=0.334, over 3301590.91 frames. ], batch size: 42, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:35:47,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2024-09-25 16:35:58,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=785535.3333333334, ans=0.2 2024-09-25 16:35:59,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785535.3333333334, ans=0.1 2024-09-25 16:36:01,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=785582.0, ans=0.125 2024-09-25 16:36:07,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785582.0, ans=0.1 2024-09-25 16:36:10,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=785582.0, ans=0.125 2024-09-25 16:36:17,591 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=15.0 2024-09-25 16:36:26,438 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.304e+02 1.360e+02 1.506e+02 2.198e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 16:36:34,896 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=785675.3333333334, ans=0.125 2024-09-25 16:36:51,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=785722.0, ans=0.125 2024-09-25 16:37:02,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=785722.0, ans=0.0 2024-09-25 16:37:09,076 INFO [train.py:1198] (0/4) Epoch 44, batch 850, loss[loss=0.1842, ctc_loss=0.12, cr_loss=0.3211, over 16760.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1201, cr_loss=0.3364, over 3323809.56 frames. ], batch size: 61, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:37:28,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=785815.3333333334, ans=0.0 2024-09-25 16:37:53,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=785862.0, ans=0.125 2024-09-25 16:37:54,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=785862.0, ans=0.125 2024-09-25 16:37:58,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785908.6666666666, ans=0.1 2024-09-25 16:38:02,052 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=22.5 2024-09-25 16:38:02,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=785908.6666666666, ans=0.04949747468305833 2024-09-25 16:38:26,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=785955.3333333334, ans=0.025 2024-09-25 16:38:32,617 INFO [train.py:1198] (0/4) Epoch 44, batch 900, loss[loss=0.2107, ctc_loss=0.1392, cr_loss=0.3571, over 16995.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1195, cr_loss=0.3349, over 3331481.37 frames. ], batch size: 53, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:38:32,859 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=786002.0, ans=0.0 2024-09-25 16:39:15,049 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.314e+02 1.378e+02 1.470e+02 1.852e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 16:39:35,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=786142.0, ans=0.0 2024-09-25 16:39:48,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=786188.6666666666, ans=0.125 2024-09-25 16:39:58,022 INFO [train.py:1198] (0/4) Epoch 44, batch 950, loss[loss=0.1954, ctc_loss=0.1253, cr_loss=0.3506, over 17289.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1199, cr_loss=0.3359, over 3346246.84 frames. ], batch size: 46, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:40:01,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=786235.3333333334, ans=0.1 2024-09-25 16:40:23,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=786282.0, ans=0.05 2024-09-25 16:40:44,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=786375.3333333334, ans=0.125 2024-09-25 16:41:09,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-25 16:41:12,339 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-09-25 16:41:15,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=786422.0, ans=0.125 2024-09-25 16:41:16,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=786468.6666666666, ans=0.025 2024-09-25 16:41:18,074 INFO [train.py:1198] (0/4) Epoch 44, batch 1000, loss[loss=0.1655, ctc_loss=0.1032, cr_loss=0.3116, over 17153.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1198, cr_loss=0.3361, over 3352264.19 frames. ], batch size: 41, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:41:21,956 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-09-25 16:41:27,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=786468.6666666666, ans=0.0 2024-09-25 16:41:43,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=786515.3333333334, ans=0.0 2024-09-25 16:42:02,020 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.298e+02 1.360e+02 1.466e+02 2.434e+02, threshold=2.721e+02, percent-clipped=0.0 2024-09-25 16:42:10,293 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=786608.6666666666, ans=0.125 2024-09-25 16:42:21,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=786608.6666666666, ans=0.0 2024-09-25 16:42:23,002 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=786655.3333333334, ans=0.025 2024-09-25 16:42:40,316 INFO [train.py:1198] (0/4) Epoch 44, batch 1050, loss[loss=0.1766, ctc_loss=0.1137, cr_loss=0.3144, over 17082.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1203, cr_loss=0.3366, over 3344812.21 frames. ], batch size: 49, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:42:50,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2024-09-25 16:42:52,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2024-09-25 16:43:40,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=786842.0, ans=0.05 2024-09-25 16:44:02,232 INFO [train.py:1198] (0/4) Epoch 44, batch 1100, loss[loss=0.1817, ctc_loss=0.1148, cr_loss=0.3344, over 17021.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1208, cr_loss=0.3371, over 3345339.02 frames. ], batch size: 44, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:44:12,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=786935.3333333334, ans=0.125 2024-09-25 16:44:14,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=786935.3333333334, ans=0.125 2024-09-25 16:44:16,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2024-09-25 16:44:17,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=786935.3333333334, ans=0.125 2024-09-25 16:44:39,931 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-09-25 16:44:45,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=787028.6666666666, ans=0.0 2024-09-25 16:44:48,724 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.281e+02 1.343e+02 1.422e+02 3.447e+02, threshold=2.685e+02, percent-clipped=1.0 2024-09-25 16:44:52,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=787028.6666666666, ans=0.125 2024-09-25 16:45:21,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=12.0 2024-09-25 16:45:27,409 INFO [train.py:1198] (0/4) Epoch 44, batch 1150, loss[loss=0.1956, ctc_loss=0.1285, cr_loss=0.3356, over 17241.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1217, cr_loss=0.3393, over 3352353.43 frames. ], batch size: 50, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:46:05,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=787262.0, ans=0.125 2024-09-25 16:46:08,211 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-09-25 16:46:18,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=787308.6666666666, ans=0.125 2024-09-25 16:46:21,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=787308.6666666666, ans=0.125 2024-09-25 16:46:47,230 INFO [train.py:1198] (0/4) Epoch 44, batch 1200, loss[loss=0.1556, ctc_loss=0.09714, cr_loss=0.2921, over 17086.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.122, cr_loss=0.3401, over 3349118.18 frames. ], batch size: 43, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:47:25,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=787495.3333333334, ans=0.125 2024-09-25 16:47:31,121 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.306e+02 1.377e+02 1.476e+02 2.006e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-25 16:47:31,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=787495.3333333334, ans=0.015 2024-09-25 16:47:44,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=787542.0, ans=0.07 2024-09-25 16:48:11,731 INFO [train.py:1198] (0/4) Epoch 44, batch 1250, loss[loss=0.2069, ctc_loss=0.1375, cr_loss=0.3468, over 16084.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1222, cr_loss=0.3409, over 3359290.96 frames. ], batch size: 74, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:48:11,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=787635.3333333334, ans=0.025 2024-09-25 16:48:21,955 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.40 vs. limit=22.5 2024-09-25 16:48:31,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=787682.0, ans=0.2 2024-09-25 16:48:33,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-09-25 16:48:42,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=787728.6666666666, ans=0.1 2024-09-25 16:48:47,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=787728.6666666666, ans=0.125 2024-09-25 16:48:55,782 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787728.6666666666, ans=0.1 2024-09-25 16:49:18,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.85 vs. limit=10.0 2024-09-25 16:49:24,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-25 16:49:37,342 INFO [train.py:1198] (0/4) Epoch 44, batch 1300, loss[loss=0.1792, ctc_loss=0.1131, cr_loss=0.3307, over 17163.00 frames. ], tot_loss[loss=0.1908, ctc_loss=0.1225, cr_loss=0.3416, over 3361268.07 frames. ], batch size: 45, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:49:41,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-09-25 16:49:45,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787868.6666666666, ans=0.1 2024-09-25 16:49:53,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=787915.3333333334, ans=0.2 2024-09-25 16:50:00,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=22.5 2024-09-25 16:50:17,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=787962.0, ans=0.2 2024-09-25 16:50:18,917 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.318e+02 1.377e+02 1.473e+02 1.934e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-25 16:50:20,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=787962.0, ans=0.07 2024-09-25 16:50:35,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=22.5 2024-09-25 16:50:49,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=788055.3333333334, ans=0.05 2024-09-25 16:50:52,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=788055.3333333334, ans=0.09899494936611666 2024-09-25 16:50:57,003 INFO [train.py:1198] (0/4) Epoch 44, batch 1350, loss[loss=0.2373, ctc_loss=0.1531, cr_loss=0.4211, over 17049.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1221, cr_loss=0.3411, over 3358010.85 frames. ], batch size: 52, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:51:06,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=788102.0, ans=0.125 2024-09-25 16:51:17,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=788148.6666666666, ans=0.0 2024-09-25 16:51:43,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=788242.0, ans=0.025 2024-09-25 16:51:48,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=788242.0, ans=0.0 2024-09-25 16:51:48,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=788242.0, ans=0.125 2024-09-25 16:51:58,950 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=788242.0, ans=0.0 2024-09-25 16:52:19,418 INFO [train.py:1198] (0/4) Epoch 44, batch 1400, loss[loss=0.2042, ctc_loss=0.1331, cr_loss=0.3556, over 17226.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1214, cr_loss=0.3393, over 3358582.61 frames. ], batch size: 50, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:52:50,355 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=788428.6666666666, ans=0.0 2024-09-25 16:52:58,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=788428.6666666666, ans=0.125 2024-09-25 16:53:01,170 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.288e+02 1.388e+02 1.492e+02 2.105e+02, threshold=2.776e+02, percent-clipped=0.0 2024-09-25 16:53:01,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=788428.6666666666, ans=0.125 2024-09-25 16:53:42,246 INFO [train.py:1198] (0/4) Epoch 44, batch 1450, loss[loss=0.1771, ctc_loss=0.1133, cr_loss=0.3187, over 17073.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1212, cr_loss=0.3384, over 3356971.28 frames. ], batch size: 46, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:54:17,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=788662.0, ans=0.0 2024-09-25 16:54:36,330 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=788708.6666666666, ans=10.0 2024-09-25 16:54:40,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=788708.6666666666, ans=0.125 2024-09-25 16:55:07,625 INFO [train.py:1198] (0/4) Epoch 44, batch 1500, loss[loss=0.1938, ctc_loss=0.1226, cr_loss=0.356, over 16904.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1207, cr_loss=0.3378, over 3353588.65 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:55:30,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=788848.6666666666, ans=0.125 2024-09-25 16:55:36,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=788848.6666666666, ans=0.125 2024-09-25 16:55:40,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=788895.3333333334, ans=0.125 2024-09-25 16:55:41,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=788895.3333333334, ans=0.95 2024-09-25 16:55:50,914 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.297e+02 1.379e+02 1.448e+02 1.999e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-25 16:56:27,698 INFO [train.py:1198] (0/4) Epoch 44, batch 1550, loss[loss=0.206, ctc_loss=0.1332, cr_loss=0.3642, over 16898.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1201, cr_loss=0.3365, over 3360116.76 frames. ], batch size: 58, lr: 2.70e-03, grad_scale: 16.0 2024-09-25 16:56:40,808 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 16:57:11,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=789128.6666666666, ans=0.125 2024-09-25 16:57:24,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=12.0 2024-09-25 16:57:30,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=789175.3333333334, ans=0.0 2024-09-25 16:57:31,011 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-25 16:57:33,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=789222.0, ans=0.05 2024-09-25 16:57:49,474 INFO [train.py:1198] (0/4) Epoch 44, batch 1600, loss[loss=0.1806, ctc_loss=0.1123, cr_loss=0.3418, over 17029.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1206, cr_loss=0.3374, over 3358909.24 frames. ], batch size: 44, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:58:16,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=789315.3333333334, ans=0.0 2024-09-25 16:58:24,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=789362.0, ans=0.125 2024-09-25 16:58:24,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789362.0, ans=0.1 2024-09-25 16:58:34,987 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.295e+02 1.391e+02 1.482e+02 2.401e+02, threshold=2.782e+02, percent-clipped=0.0 2024-09-25 16:59:06,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=789455.3333333334, ans=0.025 2024-09-25 16:59:14,304 INFO [train.py:1198] (0/4) Epoch 44, batch 1650, loss[loss=0.1575, ctc_loss=0.1012, cr_loss=0.2812, over 17174.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1208, cr_loss=0.338, over 3364165.07 frames. ], batch size: 41, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 16:59:25,611 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=22.5 2024-09-25 16:59:28,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=789502.0, ans=0.05 2024-09-25 16:59:55,304 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=789595.3333333334, ans=0.125 2024-09-25 17:00:06,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=789642.0, ans=0.125 2024-09-25 17:00:11,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789642.0, ans=0.1 2024-09-25 17:00:18,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-09-25 17:00:29,118 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789688.6666666666, ans=0.1 2024-09-25 17:00:36,857 INFO [train.py:1198] (0/4) Epoch 44, batch 1700, loss[loss=0.2055, ctc_loss=0.1322, cr_loss=0.3667, over 17329.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1211, cr_loss=0.3386, over 3349053.29 frames. ], batch size: 52, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:00:39,266 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2024-09-25 17:00:53,312 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789782.0, ans=0.1 2024-09-25 17:01:07,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=789828.6666666666, ans=0.125 2024-09-25 17:01:07,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=789828.6666666666, ans=0.125 2024-09-25 17:01:09,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=789828.6666666666, ans=0.1 2024-09-25 17:01:20,179 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.326e+02 1.402e+02 1.495e+02 1.823e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-25 17:01:50,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=789922.0, ans=0.5 2024-09-25 17:01:59,535 INFO [train.py:1198] (0/4) Epoch 44, batch 1750, loss[loss=0.2, ctc_loss=0.1295, cr_loss=0.3528, over 16554.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1221, cr_loss=0.3401, over 3348589.83 frames. ], batch size: 66, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:02:14,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=790015.3333333334, ans=0.025 2024-09-25 17:03:22,076 INFO [train.py:1198] (0/4) Epoch 44, batch 1800, loss[loss=0.1635, ctc_loss=0.1014, cr_loss=0.3104, over 16710.00 frames. ], tot_loss[loss=0.1911, ctc_loss=0.1228, cr_loss=0.3417, over 3350064.98 frames. ], batch size: 37, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:03:25,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=790202.0, ans=0.125 2024-09-25 17:03:27,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=790202.0, ans=0.125 2024-09-25 17:04:05,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=790295.3333333334, ans=0.0 2024-09-25 17:04:08,009 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.325e+02 1.389e+02 1.497e+02 2.037e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-25 17:04:28,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=790342.0, ans=0.025 2024-09-25 17:04:39,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=790388.6666666666, ans=0.2 2024-09-25 17:04:43,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=790388.6666666666, ans=0.0 2024-09-25 17:04:47,704 INFO [train.py:1198] (0/4) Epoch 44, batch 1850, loss[loss=0.1719, ctc_loss=0.1089, cr_loss=0.3152, over 17268.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1221, cr_loss=0.3397, over 3346551.21 frames. ], batch size: 42, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:05:10,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=790482.0, ans=0.0 2024-09-25 17:05:34,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=790575.3333333334, ans=0.125 2024-09-25 17:05:41,044 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=790575.3333333334, ans=0.0 2024-09-25 17:05:45,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=790575.3333333334, ans=0.0 2024-09-25 17:05:47,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790575.3333333334, ans=0.1 2024-09-25 17:06:08,121 INFO [train.py:1198] (0/4) Epoch 44, batch 1900, loss[loss=0.1937, ctc_loss=0.1223, cr_loss=0.3572, over 16756.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1218, cr_loss=0.3397, over 3349007.52 frames. ], batch size: 61, lr: 2.70e-03, grad_scale: 32.0 2024-09-25 17:06:19,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=790668.6666666666, ans=0.0 2024-09-25 17:06:19,735 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=790668.6666666666, ans=0.5 2024-09-25 17:06:29,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=790715.3333333334, ans=0.125 2024-09-25 17:06:32,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=790715.3333333334, ans=0.125 2024-09-25 17:06:33,068 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-09-25 17:06:54,120 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.292e+02 1.379e+02 1.443e+02 2.422e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-25 17:06:56,192 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=790762.0, ans=0.125 2024-09-25 17:07:05,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=790808.6666666666, ans=0.0 2024-09-25 17:07:31,365 INFO [train.py:1198] (0/4) Epoch 44, batch 1950, loss[loss=0.1907, ctc_loss=0.1229, cr_loss=0.3394, over 17040.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.3388, over 3358340.51 frames. ], batch size: 56, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:07:34,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=790902.0, ans=0.125 2024-09-25 17:07:47,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=790948.6666666666, ans=0.0 2024-09-25 17:07:57,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=790948.6666666666, ans=0.0 2024-09-25 17:08:14,401 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:08:25,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=791042.0, ans=0.125 2024-09-25 17:08:35,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791042.0, ans=0.1 2024-09-25 17:08:41,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=791088.6666666666, ans=0.125 2024-09-25 17:08:49,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=791088.6666666666, ans=0.125 2024-09-25 17:08:56,405 INFO [train.py:1198] (0/4) Epoch 44, batch 2000, loss[loss=0.1732, ctc_loss=0.1125, cr_loss=0.3037, over 17309.00 frames. ], tot_loss[loss=0.1903, ctc_loss=0.1222, cr_loss=0.3402, over 3354766.28 frames. ], batch size: 51, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:09:00,502 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-09-25 17:09:29,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=791228.6666666666, ans=0.025 2024-09-25 17:09:30,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=791228.6666666666, ans=0.0 2024-09-25 17:09:43,344 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.332e+02 1.440e+02 1.511e+02 2.187e+02, threshold=2.879e+02, percent-clipped=0.0 2024-09-25 17:09:48,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=791275.3333333334, ans=15.0 2024-09-25 17:10:01,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=791322.0, ans=0.125 2024-09-25 17:10:14,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791322.0, ans=0.1 2024-09-25 17:10:18,722 INFO [train.py:1198] (0/4) Epoch 44, batch 2050, loss[loss=0.2076, ctc_loss=0.1325, cr_loss=0.3754, over 17200.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1221, cr_loss=0.3405, over 3355832.92 frames. ], batch size: 47, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:10:30,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=791368.6666666666, ans=0.125 2024-09-25 17:10:32,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=12.0 2024-09-25 17:10:57,449 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:11:16,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=22.5 2024-09-25 17:11:22,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=791555.3333333334, ans=0.125 2024-09-25 17:11:38,301 INFO [train.py:1198] (0/4) Epoch 44, batch 2100, loss[loss=0.1994, ctc_loss=0.1271, cr_loss=0.3612, over 16508.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3391, over 3359345.38 frames. ], batch size: 66, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:11:52,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=791602.0, ans=0.125 2024-09-25 17:11:55,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=791648.6666666666, ans=0.2 2024-09-25 17:12:03,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=791648.6666666666, ans=0.125 2024-09-25 17:12:08,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-25 17:12:15,000 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2024-09-25 17:12:25,395 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.332e+02 1.398e+02 1.467e+02 2.660e+02, threshold=2.796e+02, percent-clipped=0.0 2024-09-25 17:13:00,854 INFO [train.py:1198] (0/4) Epoch 44, batch 2150, loss[loss=0.1989, ctc_loss=0.1258, cr_loss=0.3651, over 17306.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3391, over 3351142.54 frames. ], batch size: 51, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:13:01,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=791835.3333333334, ans=0.0 2024-09-25 17:13:11,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-09-25 17:13:18,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=791882.0, ans=0.025 2024-09-25 17:13:55,666 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=22.5 2024-09-25 17:14:08,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=792022.0, ans=0.0 2024-09-25 17:14:28,855 INFO [train.py:1198] (0/4) Epoch 44, batch 2200, loss[loss=0.1591, ctc_loss=0.1022, cr_loss=0.2841, over 16326.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.1212, cr_loss=0.3386, over 3355886.82 frames. ], batch size: 36, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:15:13,652 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.270e+02 1.367e+02 1.447e+02 1.926e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-25 17:15:39,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=792255.3333333334, ans=0.125 2024-09-25 17:15:45,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=792255.3333333334, ans=0.0 2024-09-25 17:15:48,056 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.93 vs. limit=10.0 2024-09-25 17:15:48,885 INFO [train.py:1198] (0/4) Epoch 44, batch 2250, loss[loss=0.204, ctc_loss=0.1363, cr_loss=0.3387, over 15161.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.3379, over 3350677.60 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:16:10,546 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2024-09-25 17:16:30,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=792395.3333333334, ans=0.5 2024-09-25 17:16:47,813 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:16:59,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-09-25 17:17:11,216 INFO [train.py:1198] (0/4) Epoch 44, batch 2300, loss[loss=0.1548, ctc_loss=0.09702, cr_loss=0.2888, over 17284.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1204, cr_loss=0.3366, over 3350612.17 frames. ], batch size: 42, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:17:42,054 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2024-09-25 17:17:51,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=792628.6666666666, ans=0.0 2024-09-25 17:17:55,698 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.062e+02 1.302e+02 1.387e+02 1.511e+02 2.811e+02, threshold=2.774e+02, percent-clipped=1.0 2024-09-25 17:18:08,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=792675.3333333334, ans=0.125 2024-09-25 17:18:32,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=792768.6666666666, ans=0.125 2024-09-25 17:18:33,651 INFO [train.py:1198] (0/4) Epoch 44, batch 2350, loss[loss=0.2099, ctc_loss=0.136, cr_loss=0.3697, over 16931.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1197, cr_loss=0.3355, over 3341121.17 frames. ], batch size: 58, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:18:48,870 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=22.5 2024-09-25 17:18:49,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=792815.3333333334, ans=0.125 2024-09-25 17:19:16,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=792862.0, ans=0.125 2024-09-25 17:19:34,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=22.5 2024-09-25 17:19:37,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=792908.6666666666, ans=0.125 2024-09-25 17:19:59,164 INFO [train.py:1198] (0/4) Epoch 44, batch 2400, loss[loss=0.1858, ctc_loss=0.1191, cr_loss=0.3337, over 17037.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1205, cr_loss=0.337, over 3354777.30 frames. ], batch size: 52, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:20:19,377 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-09-25 17:20:21,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=793048.6666666666, ans=0.125 2024-09-25 17:20:24,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=793048.6666666666, ans=0.125 2024-09-25 17:20:45,393 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.312e+02 1.417e+02 1.545e+02 2.964e+02, threshold=2.835e+02, percent-clipped=1.0 2024-09-25 17:21:02,505 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-25 17:21:11,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=793188.6666666666, ans=0.125 2024-09-25 17:21:19,099 INFO [train.py:1198] (0/4) Epoch 44, batch 2450, loss[loss=0.1862, ctc_loss=0.1161, cr_loss=0.3504, over 17002.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1209, cr_loss=0.3378, over 3359659.27 frames. ], batch size: 39, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:21:23,262 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2024-09-25 17:22:23,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=793375.3333333334, ans=0.125 2024-09-25 17:22:25,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2024-09-25 17:22:42,519 INFO [train.py:1198] (0/4) Epoch 44, batch 2500, loss[loss=0.166, ctc_loss=0.1067, cr_loss=0.2969, over 17012.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1206, cr_loss=0.3368, over 3365149.69 frames. ], batch size: 44, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:22:47,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=793468.6666666666, ans=0.0 2024-09-25 17:22:47,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=793468.6666666666, ans=0.125 2024-09-25 17:23:24,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=793562.0, ans=0.0 2024-09-25 17:23:31,902 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.312e+02 1.368e+02 1.453e+02 1.982e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 17:24:05,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=793655.3333333334, ans=0.0 2024-09-25 17:24:08,165 INFO [train.py:1198] (0/4) Epoch 44, batch 2550, loss[loss=0.2189, ctc_loss=0.1403, cr_loss=0.3928, over 15833.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1209, cr_loss=0.3371, over 3360875.38 frames. ], batch size: 74, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:24:30,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=793748.6666666666, ans=0.125 2024-09-25 17:24:38,818 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-09-25 17:24:44,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=793795.3333333334, ans=0.0 2024-09-25 17:24:45,487 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.32 vs. limit=15.0 2024-09-25 17:25:02,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=793842.0, ans=0.0 2024-09-25 17:25:09,476 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.85 vs. limit=10.0 2024-09-25 17:25:13,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=793888.6666666666, ans=0.0 2024-09-25 17:25:31,105 INFO [train.py:1198] (0/4) Epoch 44, batch 2600, loss[loss=0.2195, ctc_loss=0.1436, cr_loss=0.3792, over 16534.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1213, cr_loss=0.3381, over 3362225.57 frames. ], batch size: 66, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:25:43,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-09-25 17:25:50,135 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-25 17:26:19,164 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.289e+02 1.404e+02 1.511e+02 2.276e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-25 17:26:54,095 INFO [train.py:1198] (0/4) Epoch 44, batch 2650, loss[loss=0.2241, ctc_loss=0.1454, cr_loss=0.3933, over 15237.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1216, cr_loss=0.3381, over 3363322.16 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:28:11,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=794355.3333333334, ans=0.125 2024-09-25 17:28:17,365 INFO [train.py:1198] (0/4) Epoch 44, batch 2700, loss[loss=0.1607, ctc_loss=0.1021, cr_loss=0.293, over 17009.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1208, cr_loss=0.3367, over 3363834.37 frames. ], batch size: 44, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:28:38,747 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:28:51,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=794495.3333333334, ans=0.0 2024-09-25 17:29:07,857 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.291e+02 1.350e+02 1.441e+02 1.690e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-25 17:29:43,112 INFO [train.py:1198] (0/4) Epoch 44, batch 2750, loss[loss=0.174, ctc_loss=0.1108, cr_loss=0.316, over 17081.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1205, cr_loss=0.3358, over 3362209.61 frames. ], batch size: 43, lr: 2.69e-03, grad_scale: 16.0 2024-09-25 17:29:49,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794635.3333333334, ans=0.1 2024-09-25 17:30:02,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=794682.0, ans=0.09899494936611666 2024-09-25 17:30:10,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=794682.0, ans=0.0 2024-09-25 17:30:12,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=794682.0, ans=0.2 2024-09-25 17:30:20,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2024-09-25 17:30:42,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=794775.3333333334, ans=0.125 2024-09-25 17:30:50,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=794822.0, ans=0.1 2024-09-25 17:30:58,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=12.0 2024-09-25 17:31:02,586 INFO [train.py:1198] (0/4) Epoch 44, batch 2800, loss[loss=0.1945, ctc_loss=0.1219, cr_loss=0.3629, over 17217.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1209, cr_loss=0.3371, over 3368006.78 frames. ], batch size: 50, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:31:05,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=794868.6666666666, ans=0.025 2024-09-25 17:31:09,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794868.6666666666, ans=0.1 2024-09-25 17:31:21,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=794915.3333333334, ans=0.0 2024-09-25 17:31:31,688 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=794915.3333333334, ans=0.125 2024-09-25 17:31:52,769 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.331e+02 1.405e+02 1.538e+02 1.952e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-25 17:31:57,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=795008.6666666666, ans=0.125 2024-09-25 17:31:59,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=795008.6666666666, ans=0.0 2024-09-25 17:32:15,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=795055.3333333334, ans=0.0 2024-09-25 17:32:23,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=795102.0, ans=0.025 2024-09-25 17:32:24,906 INFO [train.py:1198] (0/4) Epoch 44, batch 2850, loss[loss=0.133, ctc_loss=0.08145, cr_loss=0.2575, over 17120.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1202, cr_loss=0.3359, over 3371725.47 frames. ], batch size: 40, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:32:28,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=795102.0, ans=0.0 2024-09-25 17:33:11,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=795195.3333333334, ans=0.125 2024-09-25 17:33:17,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=795242.0, ans=0.125 2024-09-25 17:33:40,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-09-25 17:33:47,969 INFO [train.py:1198] (0/4) Epoch 44, batch 2900, loss[loss=0.2075, ctc_loss=0.1339, cr_loss=0.3681, over 17035.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.3384, over 3374482.77 frames. ], batch size: 52, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:34:32,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=795428.6666666666, ans=0.125 2024-09-25 17:34:35,336 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=795428.6666666666, ans=0.125 2024-09-25 17:34:40,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=795475.3333333334, ans=0.0 2024-09-25 17:34:40,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=795475.3333333334, ans=0.0 2024-09-25 17:34:41,410 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.299e+02 1.362e+02 1.425e+02 2.572e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-25 17:35:13,454 INFO [train.py:1198] (0/4) Epoch 44, batch 2950, loss[loss=0.187, ctc_loss=0.1208, cr_loss=0.3309, over 17356.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1209, cr_loss=0.3379, over 3369885.19 frames. ], batch size: 48, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:35:32,274 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-09-25 17:36:07,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=795708.6666666666, ans=0.125 2024-09-25 17:36:12,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=795708.6666666666, ans=0.125 2024-09-25 17:36:12,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=795708.6666666666, ans=0.125 2024-09-25 17:36:32,578 INFO [train.py:1198] (0/4) Epoch 44, batch 3000, loss[loss=0.2348, ctc_loss=0.1547, cr_loss=0.4005, over 16530.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1206, cr_loss=0.3375, over 3364576.30 frames. ], batch size: 66, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:36:32,579 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 17:36:47,862 INFO [train.py:1230] (0/4) Epoch 44, validation: loss=0.03521, ctc_loss=0.03521, cr_loss=1.022e-14, over 944034.00 frames. 2024-09-25 17:36:47,863 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 17:37:02,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=795848.6666666666, ans=0.125 2024-09-25 17:37:02,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=795848.6666666666, ans=0.125 2024-09-25 17:37:07,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=795848.6666666666, ans=0.5 2024-09-25 17:37:34,778 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.305e+02 1.379e+02 1.486e+02 1.924e+02, threshold=2.758e+02, percent-clipped=0.0 2024-09-25 17:38:03,293 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:38:06,190 INFO [train.py:1198] (0/4) Epoch 44, batch 3050, loss[loss=0.1776, ctc_loss=0.1129, cr_loss=0.3233, over 17300.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1207, cr_loss=0.3378, over 3361950.30 frames. ], batch size: 46, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:38:14,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=796035.3333333334, ans=0.0 2024-09-25 17:38:39,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796128.6666666666, ans=0.1 2024-09-25 17:39:26,747 INFO [train.py:1198] (0/4) Epoch 44, batch 3100, loss[loss=0.228, ctc_loss=0.1484, cr_loss=0.3984, over 17057.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3383, over 3358826.99 frames. ], batch size: 56, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:39:56,810 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=796362.0, ans=0.0 2024-09-25 17:40:01,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=796362.0, ans=0.125 2024-09-25 17:40:13,597 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.310e+02 1.386e+02 1.469e+02 1.981e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 17:40:31,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=796455.3333333334, ans=0.125 2024-09-25 17:40:37,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=796455.3333333334, ans=0.5 2024-09-25 17:40:44,923 INFO [train.py:1198] (0/4) Epoch 44, batch 3150, loss[loss=0.1874, ctc_loss=0.1184, cr_loss=0.345, over 16984.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3384, over 3365569.36 frames. ], batch size: 53, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:41:03,367 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=796548.6666666666, ans=0.125 2024-09-25 17:41:03,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796548.6666666666, ans=0.1 2024-09-25 17:41:11,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=796548.6666666666, ans=0.0 2024-09-25 17:42:08,196 INFO [train.py:1198] (0/4) Epoch 44, batch 3200, loss[loss=0.1887, ctc_loss=0.1193, cr_loss=0.347, over 17172.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.339, over 3363793.70 frames. ], batch size: 45, lr: 2.69e-03, grad_scale: 32.0 2024-09-25 17:42:24,057 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=796782.0, ans=0.125 2024-09-25 17:42:45,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=796828.6666666666, ans=0.125 2024-09-25 17:42:55,048 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=796875.3333333334, ans=0.125 2024-09-25 17:42:57,901 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.350e+02 1.424e+02 1.561e+02 1.915e+02, threshold=2.848e+02, percent-clipped=0.0 2024-09-25 17:43:09,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=796922.0, ans=0.0 2024-09-25 17:43:26,125 INFO [train.py:1198] (0/4) Epoch 44, batch 3250, loss[loss=0.2252, ctc_loss=0.1482, cr_loss=0.3848, over 16531.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1215, cr_loss=0.3395, over 3350585.43 frames. ], batch size: 66, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:43:54,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=797015.3333333334, ans=0.125 2024-09-25 17:44:12,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=797108.6666666666, ans=0.025 2024-09-25 17:44:16,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=797108.6666666666, ans=0.015 2024-09-25 17:44:16,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=797108.6666666666, ans=0.07 2024-09-25 17:44:23,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=797108.6666666666, ans=0.125 2024-09-25 17:44:45,005 INFO [train.py:1198] (0/4) Epoch 44, batch 3300, loss[loss=0.2072, ctc_loss=0.1324, cr_loss=0.3741, over 17249.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1217, cr_loss=0.3403, over 3355148.89 frames. ], batch size: 55, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:44:49,900 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=797202.0, ans=0.025 2024-09-25 17:45:14,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=797295.3333333334, ans=0.0 2024-09-25 17:45:18,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=797295.3333333334, ans=0.05 2024-09-25 17:45:19,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=797295.3333333334, ans=0.1 2024-09-25 17:45:30,792 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2024-09-25 17:45:34,581 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.296e+02 1.381e+02 1.496e+02 2.395e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-25 17:45:34,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=797342.0, ans=0.025 2024-09-25 17:45:51,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=797388.6666666666, ans=0.0 2024-09-25 17:46:02,727 INFO [train.py:1198] (0/4) Epoch 44, batch 3350, loss[loss=0.1593, ctc_loss=0.09942, cr_loss=0.2996, over 16807.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.121, cr_loss=0.3387, over 3350132.31 frames. ], batch size: 37, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:46:21,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=797482.0, ans=0.1 2024-09-25 17:47:21,460 INFO [train.py:1198] (0/4) Epoch 44, batch 3400, loss[loss=0.1632, ctc_loss=0.1022, cr_loss=0.3053, over 17022.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1212, cr_loss=0.3391, over 3351748.07 frames. ], batch size: 39, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:47:57,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=797762.0, ans=0.125 2024-09-25 17:48:04,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=797762.0, ans=0.125 2024-09-25 17:48:10,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=797808.6666666666, ans=0.2 2024-09-25 17:48:13,775 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.159e+02 1.291e+02 1.376e+02 1.452e+02 2.263e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 17:48:18,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=797808.6666666666, ans=0.025 2024-09-25 17:48:22,066 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=797808.6666666666, ans=0.125 2024-09-25 17:48:34,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=797855.3333333334, ans=0.125 2024-09-25 17:48:42,147 INFO [train.py:1198] (0/4) Epoch 44, batch 3450, loss[loss=0.1802, ctc_loss=0.1142, cr_loss=0.3297, over 17036.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1207, cr_loss=0.3386, over 3364352.56 frames. ], batch size: 52, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:48:42,802 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-09-25 17:49:12,560 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.92 vs. limit=15.0 2024-09-25 17:49:28,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2024-09-25 17:49:33,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=798042.0, ans=0.0 2024-09-25 17:49:43,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=798088.6666666666, ans=0.125 2024-09-25 17:49:46,574 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.26 vs. limit=22.5 2024-09-25 17:49:51,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=798088.6666666666, ans=0.125 2024-09-25 17:50:02,279 INFO [train.py:1198] (0/4) Epoch 44, batch 3500, loss[loss=0.1838, ctc_loss=0.1188, cr_loss=0.3255, over 17081.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1206, cr_loss=0.3384, over 3365928.15 frames. ], batch size: 43, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:50:07,506 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 17:50:26,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2024-09-25 17:50:37,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=798228.6666666666, ans=0.2 2024-09-25 17:50:40,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798228.6666666666, ans=0.1 2024-09-25 17:50:52,795 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.122e+02 1.299e+02 1.358e+02 1.430e+02 3.438e+02, threshold=2.715e+02, percent-clipped=1.0 2024-09-25 17:51:22,863 INFO [train.py:1198] (0/4) Epoch 44, batch 3550, loss[loss=0.1608, ctc_loss=0.1002, cr_loss=0.3029, over 17212.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3386, over 3368775.13 frames. ], batch size: 47, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:51:29,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=798368.6666666666, ans=0.0 2024-09-25 17:52:03,443 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-09-25 17:52:04,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=798462.0, ans=0.125 2024-09-25 17:52:43,566 INFO [train.py:1198] (0/4) Epoch 44, batch 3600, loss[loss=0.1987, ctc_loss=0.1266, cr_loss=0.3608, over 16057.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1209, cr_loss=0.3387, over 3368520.89 frames. ], batch size: 74, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:53:34,704 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.315e+02 1.375e+02 1.458e+02 2.973e+02, threshold=2.750e+02, percent-clipped=1.0 2024-09-25 17:53:59,081 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-09-25 17:54:01,436 INFO [train.py:1198] (0/4) Epoch 44, batch 3650, loss[loss=0.1905, ctc_loss=0.1214, cr_loss=0.3459, over 17152.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1202, cr_loss=0.3369, over 3360149.36 frames. ], batch size: 45, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:54:15,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=798882.0, ans=0.125 2024-09-25 17:54:35,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2024-09-25 17:54:36,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=798928.6666666666, ans=0.125 2024-09-25 17:54:48,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=798975.3333333334, ans=0.125 2024-09-25 17:55:17,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=799022.0, ans=0.125 2024-09-25 17:55:20,637 INFO [train.py:1198] (0/4) Epoch 44, batch 3700, loss[loss=0.1406, ctc_loss=0.08842, cr_loss=0.2609, over 16761.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1197, cr_loss=0.3364, over 3358886.33 frames. ], batch size: 37, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:55:36,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=799115.3333333334, ans=0.125 2024-09-25 17:55:55,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=799162.0, ans=0.0 2024-09-25 17:56:10,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=799208.6666666666, ans=0.2 2024-09-25 17:56:11,869 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.103e+02 1.294e+02 1.350e+02 1.442e+02 2.627e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-25 17:56:19,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=799208.6666666666, ans=0.0 2024-09-25 17:56:26,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=799255.3333333334, ans=0.05 2024-09-25 17:56:38,699 INFO [train.py:1198] (0/4) Epoch 44, batch 3750, loss[loss=0.1473, ctc_loss=0.09295, cr_loss=0.2717, over 16688.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1191, cr_loss=0.3349, over 3357329.46 frames. ], batch size: 37, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:56:49,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=799302.0, ans=0.0 2024-09-25 17:57:03,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=799348.6666666666, ans=0.125 2024-09-25 17:57:04,783 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=799348.6666666666, ans=0.125 2024-09-25 17:57:26,832 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=22.5 2024-09-25 17:57:36,034 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2024-09-25 17:57:57,079 INFO [train.py:1198] (0/4) Epoch 44, batch 3800, loss[loss=0.1593, ctc_loss=0.0996, cr_loss=0.2983, over 16937.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1201, cr_loss=0.3365, over 3321031.04 frames. ], batch size: 42, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:58:03,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=799535.3333333334, ans=0.125 2024-09-25 17:58:18,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=799582.0, ans=0.0 2024-09-25 17:58:47,998 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.185e+02 1.317e+02 1.409e+02 1.513e+02 1.849e+02, threshold=2.818e+02, percent-clipped=0.0 2024-09-25 17:59:01,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=799722.0, ans=0.0 2024-09-25 17:59:11,507 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=799722.0, ans=0.0 2024-09-25 17:59:14,336 INFO [train.py:1198] (0/4) Epoch 44, batch 3850, loss[loss=0.2326, ctc_loss=0.1533, cr_loss=0.3965, over 14950.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1214, cr_loss=0.3372, over 3275048.57 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 16.0 2024-09-25 17:59:29,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=799815.3333333334, ans=0.125 2024-09-25 17:59:34,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=799815.3333333334, ans=0.0 2024-09-25 18:00:14,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=799955.3333333334, ans=0.05 2024-09-25 18:00:20,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=799955.3333333334, ans=0.025 2024-09-25 18:00:24,402 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-44.pt 2024-09-25 18:01:14,624 INFO [train.py:1198] (0/4) Epoch 45, batch 0, loss[loss=0.2022, ctc_loss=0.131, cr_loss=0.3562, over 17109.00 frames. ], tot_loss[loss=0.2022, ctc_loss=0.131, cr_loss=0.3562, over 17109.00 frames. ], batch size: 49, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:01:14,625 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 18:01:29,796 INFO [train.py:1230] (0/4) Epoch 45, validation: loss=0.03539, ctc_loss=0.03539, cr_loss=1.113e-14, over 944034.00 frames. 2024-09-25 18:01:29,796 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 18:01:42,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=799983.3333333334, ans=0.125 2024-09-25 18:02:07,206 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=22.5 2024-09-25 18:02:08,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=800076.6666666666, ans=0.0 2024-09-25 18:02:18,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=800076.6666666666, ans=0.125 2024-09-25 18:02:21,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=800123.3333333334, ans=0.125 2024-09-25 18:02:31,990 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.406e+02 1.547e+02 1.686e+02 2.322e+02, threshold=3.093e+02, percent-clipped=0.0 2024-09-25 18:02:45,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=800170.0, ans=0.125 2024-09-25 18:02:52,920 INFO [train.py:1198] (0/4) Epoch 45, batch 50, loss[loss=0.1445, ctc_loss=0.09003, cr_loss=0.2722, over 17024.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.121, cr_loss=0.3382, over 760667.48 frames. ], batch size: 39, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:02:59,788 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=12.0 2024-09-25 18:03:04,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=800216.6666666666, ans=0.05 2024-09-25 18:03:17,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=800263.3333333334, ans=0.125 2024-09-25 18:03:26,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=800310.0, ans=0.025 2024-09-25 18:03:39,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=800356.6666666666, ans=0.125 2024-09-25 18:03:43,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=800356.6666666666, ans=0.0 2024-09-25 18:03:49,603 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.67 vs. limit=22.5 2024-09-25 18:04:08,856 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:04:13,067 INFO [train.py:1198] (0/4) Epoch 45, batch 100, loss[loss=0.1823, ctc_loss=0.1145, cr_loss=0.3389, over 17024.00 frames. ], tot_loss[loss=0.1902, ctc_loss=0.1221, cr_loss=0.3407, over 1329300.02 frames. ], batch size: 51, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:04:24,721 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:04:26,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=800450.0, ans=0.05 2024-09-25 18:04:26,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=800450.0, ans=0.125 2024-09-25 18:04:42,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=800496.6666666666, ans=0.0 2024-09-25 18:04:55,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=800543.3333333334, ans=0.125 2024-09-25 18:05:12,263 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.307e+02 1.359e+02 1.469e+02 2.520e+02, threshold=2.719e+02, percent-clipped=0.0 2024-09-25 18:05:12,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=800590.0, ans=0.2 2024-09-25 18:05:35,981 INFO [train.py:1198] (0/4) Epoch 45, batch 150, loss[loss=0.205, ctc_loss=0.1323, cr_loss=0.3634, over 16137.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1217, cr_loss=0.3392, over 1782509.34 frames. ], batch size: 74, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:05:39,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=800683.3333333334, ans=0.0 2024-09-25 18:05:50,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=800683.3333333334, ans=0.125 2024-09-25 18:05:59,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=800730.0, ans=0.0 2024-09-25 18:06:11,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800776.6666666666, ans=0.1 2024-09-25 18:06:39,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=800823.3333333334, ans=0.125 2024-09-25 18:07:02,960 INFO [train.py:1198] (0/4) Epoch 45, batch 200, loss[loss=0.2028, ctc_loss=0.1323, cr_loss=0.3526, over 17288.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1214, cr_loss=0.3391, over 2134862.67 frames. ], batch size: 51, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:07:06,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=800916.6666666666, ans=0.125 2024-09-25 18:07:14,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=800916.6666666666, ans=0.125 2024-09-25 18:07:25,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800963.3333333334, ans=0.1 2024-09-25 18:07:27,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=800963.3333333334, ans=0.125 2024-09-25 18:07:46,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=801010.0, ans=0.025 2024-09-25 18:08:01,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-09-25 18:08:02,078 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.278e+02 1.376e+02 1.483e+02 1.957e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 18:08:10,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=801103.3333333334, ans=0.09899494936611666 2024-09-25 18:08:23,395 INFO [train.py:1198] (0/4) Epoch 45, batch 250, loss[loss=0.2126, ctc_loss=0.1375, cr_loss=0.3752, over 15020.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1215, cr_loss=0.34, over 2403822.11 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:08:28,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=801150.0, ans=0.025 2024-09-25 18:08:31,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801150.0, ans=0.1 2024-09-25 18:08:41,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=801196.6666666666, ans=0.125 2024-09-25 18:08:58,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=801243.3333333334, ans=0.0 2024-09-25 18:09:43,108 INFO [train.py:1198] (0/4) Epoch 45, batch 300, loss[loss=0.1639, ctc_loss=0.1001, cr_loss=0.3189, over 17105.00 frames. ], tot_loss[loss=0.189, ctc_loss=0.121, cr_loss=0.3399, over 2618608.62 frames. ], batch size: 40, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:09:55,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801383.3333333334, ans=0.1 2024-09-25 18:10:02,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=801430.0, ans=0.0 2024-09-25 18:10:14,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=801430.0, ans=0.125 2024-09-25 18:10:16,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=801476.6666666666, ans=0.125 2024-09-25 18:10:26,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=801476.6666666666, ans=0.09899494936611666 2024-09-25 18:10:48,158 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.282e+02 1.376e+02 1.463e+02 2.041e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 18:11:09,114 INFO [train.py:1198] (0/4) Epoch 45, batch 350, loss[loss=0.1816, ctc_loss=0.1159, cr_loss=0.3285, over 16961.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1212, cr_loss=0.3397, over 2777204.78 frames. ], batch size: 42, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:11:26,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=801663.3333333334, ans=0.125 2024-09-25 18:11:32,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=801663.3333333334, ans=0.125 2024-09-25 18:12:09,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=801756.6666666666, ans=0.125 2024-09-25 18:12:34,307 INFO [train.py:1198] (0/4) Epoch 45, batch 400, loss[loss=0.2086, ctc_loss=0.1378, cr_loss=0.354, over 17343.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1211, cr_loss=0.3387, over 2903727.79 frames. ], batch size: 48, lr: 2.65e-03, grad_scale: 32.0 2024-09-25 18:12:45,761 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-09-25 18:13:01,737 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=12.0 2024-09-25 18:13:19,700 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2024-09-25 18:13:20,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=801990.0, ans=0.2 2024-09-25 18:13:34,524 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.278e+02 1.369e+02 1.473e+02 1.980e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 18:13:35,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2024-09-25 18:13:53,910 INFO [train.py:1198] (0/4) Epoch 45, batch 450, loss[loss=0.2007, ctc_loss=0.1291, cr_loss=0.3578, over 16752.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1211, cr_loss=0.3388, over 3010539.11 frames. ], batch size: 61, lr: 2.65e-03, grad_scale: 16.0 2024-09-25 18:14:21,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=802130.0, ans=0.2 2024-09-25 18:14:23,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=802130.0, ans=0.0 2024-09-25 18:14:26,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=802176.6666666666, ans=0.2 2024-09-25 18:14:50,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=802223.3333333334, ans=0.125 2024-09-25 18:14:51,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=802223.3333333334, ans=0.125 2024-09-25 18:15:07,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=802270.0, ans=0.1 2024-09-25 18:15:15,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=802316.6666666666, ans=0.0 2024-09-25 18:15:16,372 INFO [train.py:1198] (0/4) Epoch 45, batch 500, loss[loss=0.1476, ctc_loss=0.09065, cr_loss=0.2847, over 17203.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.121, cr_loss=0.3378, over 3073125.44 frames. ], batch size: 41, lr: 2.65e-03, grad_scale: 16.0 2024-09-25 18:15:43,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802363.3333333334, ans=0.1 2024-09-25 18:15:56,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=802410.0, ans=0.125 2024-09-25 18:16:10,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=802456.6666666666, ans=0.95 2024-09-25 18:16:22,897 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.274e+02 1.342e+02 1.426e+02 1.767e+02, threshold=2.684e+02, percent-clipped=0.0 2024-09-25 18:16:23,238 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=802456.6666666666, ans=0.125 2024-09-25 18:16:42,133 INFO [train.py:1198] (0/4) Epoch 45, batch 550, loss[loss=0.1565, ctc_loss=0.09866, cr_loss=0.2891, over 16268.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1208, cr_loss=0.3374, over 3138561.57 frames. ], batch size: 36, lr: 2.65e-03, grad_scale: 16.0 2024-09-25 18:16:48,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=802550.0, ans=0.0 2024-09-25 18:16:51,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-25 18:16:52,318 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=22.5 2024-09-25 18:17:23,336 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-172000.pt 2024-09-25 18:17:52,598 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=802736.6666666666, ans=0.125 2024-09-25 18:17:52,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=802736.6666666666, ans=0.0 2024-09-25 18:18:05,060 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.57 vs. limit=8.0 2024-09-25 18:18:06,932 INFO [train.py:1198] (0/4) Epoch 45, batch 600, loss[loss=0.154, ctc_loss=0.096, cr_loss=0.2898, over 17126.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1204, cr_loss=0.3369, over 3188847.74 frames. ], batch size: 40, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:18:07,308 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802783.3333333334, ans=0.1 2024-09-25 18:18:43,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=802876.6666666666, ans=0.125 2024-09-25 18:18:48,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=802876.6666666666, ans=0.0 2024-09-25 18:19:07,374 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.202e+02 1.313e+02 1.405e+02 1.513e+02 2.621e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-25 18:19:26,372 INFO [train.py:1198] (0/4) Epoch 45, batch 650, loss[loss=0.2052, ctc_loss=0.1328, cr_loss=0.3618, over 16926.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3382, over 3228865.82 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:19:26,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=803016.6666666666, ans=0.125 2024-09-25 18:19:39,361 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=803016.6666666666, ans=0.125 2024-09-25 18:19:41,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=22.5 2024-09-25 18:19:44,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=803063.3333333334, ans=0.125 2024-09-25 18:19:47,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=803063.3333333334, ans=0.0 2024-09-25 18:19:49,395 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=22.5 2024-09-25 18:20:25,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=803156.6666666666, ans=0.125 2024-09-25 18:20:32,454 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.69 vs. limit=6.0 2024-09-25 18:20:39,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=803203.3333333334, ans=0.2 2024-09-25 18:20:52,183 INFO [train.py:1198] (0/4) Epoch 45, batch 700, loss[loss=0.2117, ctc_loss=0.1381, cr_loss=0.3677, over 16939.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1215, cr_loss=0.3396, over 3253051.61 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:20:54,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=803250.0, ans=0.125 2024-09-25 18:21:02,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=803250.0, ans=0.0 2024-09-25 18:21:11,958 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:21:13,395 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803296.6666666666, ans=0.1 2024-09-25 18:21:13,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=803296.6666666666, ans=0.1 2024-09-25 18:21:21,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803296.6666666666, ans=0.1 2024-09-25 18:21:37,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=803343.3333333334, ans=0.125 2024-09-25 18:21:37,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803343.3333333334, ans=0.1 2024-09-25 18:21:43,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=803390.0, ans=0.2 2024-09-25 18:21:48,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=803390.0, ans=0.125 2024-09-25 18:21:53,247 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:21:55,975 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.305e+02 1.379e+02 1.482e+02 2.055e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-25 18:22:18,092 INFO [train.py:1198] (0/4) Epoch 45, batch 750, loss[loss=0.1551, ctc_loss=0.09603, cr_loss=0.2953, over 17278.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1215, cr_loss=0.34, over 3275259.23 frames. ], batch size: 46, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:22:44,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.87 vs. limit=15.0 2024-09-25 18:22:55,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-09-25 18:23:15,782 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-25 18:23:34,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=803670.0, ans=0.125 2024-09-25 18:23:38,654 INFO [train.py:1198] (0/4) Epoch 45, batch 800, loss[loss=0.1685, ctc_loss=0.1046, cr_loss=0.3198, over 17005.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1216, cr_loss=0.34, over 3294669.62 frames. ], batch size: 39, lr: 2.64e-03, grad_scale: 32.0 2024-09-25 18:23:49,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=803716.6666666666, ans=0.025 2024-09-25 18:23:59,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=803763.3333333334, ans=0.125 2024-09-25 18:24:04,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=803763.3333333334, ans=0.1 2024-09-25 18:24:14,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=803810.0, ans=0.125 2024-09-25 18:24:39,375 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.279e+02 1.371e+02 1.474e+02 1.917e+02, threshold=2.742e+02, percent-clipped=0.0 2024-09-25 18:24:58,300 INFO [train.py:1198] (0/4) Epoch 45, batch 850, loss[loss=0.2038, ctc_loss=0.131, cr_loss=0.3638, over 15965.00 frames. ], tot_loss[loss=0.19, ctc_loss=0.1219, cr_loss=0.3402, over 3308538.73 frames. ], batch size: 74, lr: 2.64e-03, grad_scale: 32.0 2024-09-25 18:25:08,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-09-25 18:25:26,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=803996.6666666666, ans=0.125 2024-09-25 18:25:33,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=804043.3333333334, ans=0.125 2024-09-25 18:25:58,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=804090.0, ans=0.025 2024-09-25 18:26:26,523 INFO [train.py:1198] (0/4) Epoch 45, batch 900, loss[loss=0.204, ctc_loss=0.1329, cr_loss=0.3556, over 15955.00 frames. ], tot_loss[loss=0.1901, ctc_loss=0.1219, cr_loss=0.3408, over 3328208.14 frames. ], batch size: 74, lr: 2.64e-03, grad_scale: 32.0 2024-09-25 18:26:26,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=804183.3333333334, ans=0.125 2024-09-25 18:26:53,869 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=804230.0, ans=0.125 2024-09-25 18:27:05,925 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=804276.6666666666, ans=0.125 2024-09-25 18:27:15,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=804323.3333333334, ans=0.125 2024-09-25 18:27:29,592 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.180e+02 1.299e+02 1.367e+02 1.436e+02 2.740e+02, threshold=2.735e+02, percent-clipped=0.0 2024-09-25 18:27:34,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=804370.0, ans=0.2 2024-09-25 18:27:47,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=804416.6666666666, ans=0.125 2024-09-25 18:27:47,986 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2024-09-25 18:27:48,683 INFO [train.py:1198] (0/4) Epoch 45, batch 950, loss[loss=0.2032, ctc_loss=0.1334, cr_loss=0.3493, over 17014.00 frames. ], tot_loss[loss=0.1904, ctc_loss=0.1222, cr_loss=0.3411, over 3332702.43 frames. ], batch size: 53, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:27:50,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=804416.6666666666, ans=0.0 2024-09-25 18:27:52,716 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2024-09-25 18:27:57,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2024-09-25 18:28:27,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=804510.0, ans=0.0 2024-09-25 18:28:34,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=804556.6666666666, ans=0.0 2024-09-25 18:28:55,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=804603.3333333334, ans=0.125 2024-09-25 18:29:07,945 INFO [train.py:1198] (0/4) Epoch 45, batch 1000, loss[loss=0.2256, ctc_loss=0.1475, cr_loss=0.3908, over 15364.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1214, cr_loss=0.3396, over 3346307.86 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:29:17,162 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.88 vs. limit=15.0 2024-09-25 18:29:24,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=804696.6666666666, ans=0.025 2024-09-25 18:29:32,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=804696.6666666666, ans=0.1 2024-09-25 18:29:43,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=804743.3333333334, ans=0.125 2024-09-25 18:29:54,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=804790.0, ans=0.125 2024-09-25 18:30:13,227 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.285e+02 1.391e+02 1.486e+02 1.776e+02, threshold=2.782e+02, percent-clipped=0.0 2024-09-25 18:30:15,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.40 vs. limit=6.0 2024-09-25 18:30:33,441 INFO [train.py:1198] (0/4) Epoch 45, batch 1050, loss[loss=0.1919, ctc_loss=0.1224, cr_loss=0.3475, over 17294.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1205, cr_loss=0.3385, over 3340687.86 frames. ], batch size: 51, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:30:51,204 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=804930.0, ans=0.125 2024-09-25 18:30:54,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=804930.0, ans=0.025 2024-09-25 18:31:40,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=805070.0, ans=0.0 2024-09-25 18:31:58,556 INFO [train.py:1198] (0/4) Epoch 45, batch 1100, loss[loss=0.185, ctc_loss=0.1196, cr_loss=0.3272, over 17296.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1215, cr_loss=0.3394, over 3333669.73 frames. ], batch size: 46, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:32:17,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=805163.3333333334, ans=0.2 2024-09-25 18:32:48,836 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2024-09-25 18:33:00,831 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.299e+02 1.379e+02 1.493e+02 1.866e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-25 18:33:18,415 INFO [train.py:1198] (0/4) Epoch 45, batch 1150, loss[loss=0.1826, ctc_loss=0.1155, cr_loss=0.3358, over 17138.00 frames. ], tot_loss[loss=0.1898, ctc_loss=0.1217, cr_loss=0.3403, over 3339520.98 frames. ], batch size: 48, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:33:18,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=805350.0, ans=0.5 2024-09-25 18:33:26,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=805350.0, ans=22.5 2024-09-25 18:33:36,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805396.6666666666, ans=0.1 2024-09-25 18:34:01,089 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-09-25 18:34:28,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=805536.6666666666, ans=0.125 2024-09-25 18:34:38,827 INFO [train.py:1198] (0/4) Epoch 45, batch 1200, loss[loss=0.1889, ctc_loss=0.1225, cr_loss=0.332, over 17150.00 frames. ], tot_loss[loss=0.1894, ctc_loss=0.1215, cr_loss=0.3395, over 3340974.41 frames. ], batch size: 48, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:34:56,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=805630.0, ans=0.05 2024-09-25 18:34:58,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=805630.0, ans=0.0 2024-09-25 18:35:35,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=805723.3333333334, ans=0.125 2024-09-25 18:35:40,830 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=12.0 2024-09-25 18:35:47,622 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.307e+02 1.404e+02 1.482e+02 2.313e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-25 18:36:01,240 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=22.5 2024-09-25 18:36:06,488 INFO [train.py:1198] (0/4) Epoch 45, batch 1250, loss[loss=0.1902, ctc_loss=0.1254, cr_loss=0.3237, over 17034.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1216, cr_loss=0.3397, over 3345298.35 frames. ], batch size: 44, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:36:47,565 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805910.0, ans=0.1 2024-09-25 18:36:58,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=805956.6666666666, ans=0.125 2024-09-25 18:37:05,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=805956.6666666666, ans=0.125 2024-09-25 18:37:10,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805956.6666666666, ans=0.1 2024-09-25 18:37:15,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=806003.3333333334, ans=0.0 2024-09-25 18:37:21,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=806003.3333333334, ans=0.0 2024-09-25 18:37:23,004 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=806003.3333333334, ans=0.2 2024-09-25 18:37:27,907 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=806050.0, ans=0.0 2024-09-25 18:37:29,081 INFO [train.py:1198] (0/4) Epoch 45, batch 1300, loss[loss=0.1847, ctc_loss=0.118, cr_loss=0.3336, over 17366.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1219, cr_loss=0.34, over 3347380.13 frames. ], batch size: 48, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:37:48,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=806096.6666666666, ans=0.125 2024-09-25 18:38:04,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=806143.3333333334, ans=0.2 2024-09-25 18:38:17,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=806190.0, ans=0.1 2024-09-25 18:38:20,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806190.0, ans=0.1 2024-09-25 18:38:28,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=806190.0, ans=0.125 2024-09-25 18:38:29,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=806190.0, ans=0.025 2024-09-25 18:38:32,667 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.317e+02 1.369e+02 1.447e+02 1.900e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 18:38:45,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=806236.6666666666, ans=0.125 2024-09-25 18:38:45,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=806236.6666666666, ans=0.125 2024-09-25 18:38:48,677 INFO [train.py:1198] (0/4) Epoch 45, batch 1350, loss[loss=0.2097, ctc_loss=0.137, cr_loss=0.3636, over 17021.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1229, cr_loss=0.3411, over 3330504.04 frames. ], batch size: 53, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:39:09,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=806330.0, ans=0.2 2024-09-25 18:39:53,902 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-25 18:40:07,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=806470.0, ans=0.2 2024-09-25 18:40:14,350 INFO [train.py:1198] (0/4) Epoch 45, batch 1400, loss[loss=0.1524, ctc_loss=0.09544, cr_loss=0.2846, over 17265.00 frames. ], tot_loss[loss=0.1912, ctc_loss=0.1228, cr_loss=0.3417, over 3335833.82 frames. ], batch size: 42, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:40:30,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=806563.3333333334, ans=0.0 2024-09-25 18:40:36,172 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-09-25 18:41:21,116 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.028e+02 1.293e+02 1.370e+02 1.466e+02 2.131e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 18:41:31,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=806703.3333333334, ans=0.125 2024-09-25 18:41:34,534 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=806703.3333333334, ans=0.125 2024-09-25 18:41:37,458 INFO [train.py:1198] (0/4) Epoch 45, batch 1450, loss[loss=0.1924, ctc_loss=0.1242, cr_loss=0.3411, over 17303.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1213, cr_loss=0.3397, over 3352015.74 frames. ], batch size: 49, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:41:43,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=806750.0, ans=0.0 2024-09-25 18:41:46,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=806750.0, ans=0.125 2024-09-25 18:42:07,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=806796.6666666666, ans=0.025 2024-09-25 18:42:15,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=806843.3333333334, ans=0.125 2024-09-25 18:42:33,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=806890.0, ans=0.0 2024-09-25 18:42:38,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=806890.0, ans=0.125 2024-09-25 18:42:49,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=806936.6666666666, ans=0.0 2024-09-25 18:43:00,439 INFO [train.py:1198] (0/4) Epoch 45, batch 1500, loss[loss=0.1814, ctc_loss=0.1133, cr_loss=0.3404, over 16960.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1209, cr_loss=0.3389, over 3346940.23 frames. ], batch size: 42, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:43:13,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=806983.3333333334, ans=0.0 2024-09-25 18:43:51,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=807123.3333333334, ans=0.125 2024-09-25 18:43:53,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=807123.3333333334, ans=0.2 2024-09-25 18:44:04,287 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.307e+02 1.374e+02 1.496e+02 1.969e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-25 18:44:14,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=807170.0, ans=0.0 2024-09-25 18:44:20,510 INFO [train.py:1198] (0/4) Epoch 45, batch 1550, loss[loss=0.1946, ctc_loss=0.1302, cr_loss=0.3224, over 15062.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1206, cr_loss=0.3384, over 3353007.59 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:45:09,052 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=807310.0, ans=0.125 2024-09-25 18:45:15,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=807356.6666666666, ans=0.0 2024-09-25 18:45:26,710 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=807356.6666666666, ans=0.0 2024-09-25 18:45:26,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=807356.6666666666, ans=0.07 2024-09-25 18:45:36,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=807403.3333333334, ans=0.1 2024-09-25 18:45:45,594 INFO [train.py:1198] (0/4) Epoch 45, batch 1600, loss[loss=0.1856, ctc_loss=0.1168, cr_loss=0.3442, over 17280.00 frames. ], tot_loss[loss=0.1892, ctc_loss=0.1213, cr_loss=0.3395, over 3349025.07 frames. ], batch size: 46, lr: 2.64e-03, grad_scale: 32.0 2024-09-25 18:46:09,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=807496.6666666666, ans=0.125 2024-09-25 18:46:32,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2024-09-25 18:46:51,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=807590.0, ans=0.125 2024-09-25 18:46:56,354 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.296e+02 1.378e+02 1.483e+02 2.434e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-25 18:46:59,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=807636.6666666666, ans=0.09899494936611666 2024-09-25 18:47:03,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=807636.6666666666, ans=0.0 2024-09-25 18:47:10,854 INFO [train.py:1198] (0/4) Epoch 45, batch 1650, loss[loss=0.1578, ctc_loss=0.09741, cr_loss=0.3018, over 16691.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1206, cr_loss=0.3379, over 3342379.09 frames. ], batch size: 37, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:47:14,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=807683.3333333334, ans=0.125 2024-09-25 18:47:37,536 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2024-09-25 18:47:45,564 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-09-25 18:48:20,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=807870.0, ans=0.1 2024-09-25 18:48:30,032 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=12.0 2024-09-25 18:48:31,084 INFO [train.py:1198] (0/4) Epoch 45, batch 1700, loss[loss=0.1871, ctc_loss=0.1193, cr_loss=0.339, over 17164.00 frames. ], tot_loss[loss=0.1895, ctc_loss=0.1216, cr_loss=0.3393, over 3337796.47 frames. ], batch size: 41, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:49:03,917 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2024-09-25 18:49:19,029 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=808056.6666666666, ans=0.5 2024-09-25 18:49:36,102 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.276e+02 1.351e+02 1.443e+02 2.943e+02, threshold=2.702e+02, percent-clipped=1.0 2024-09-25 18:49:38,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=808103.3333333334, ans=0.1 2024-09-25 18:49:41,239 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=808103.3333333334, ans=0.0 2024-09-25 18:49:43,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=22.5 2024-09-25 18:49:50,482 INFO [train.py:1198] (0/4) Epoch 45, batch 1750, loss[loss=0.2198, ctc_loss=0.1394, cr_loss=0.402, over 16588.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1211, cr_loss=0.3376, over 3336889.97 frames. ], batch size: 66, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:50:00,124 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-09-25 18:50:21,907 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2024-09-25 18:50:55,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=808290.0, ans=0.035 2024-09-25 18:51:07,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=808336.6666666666, ans=0.2 2024-09-25 18:51:16,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=22.5 2024-09-25 18:51:18,076 INFO [train.py:1198] (0/4) Epoch 45, batch 1800, loss[loss=0.2033, ctc_loss=0.1317, cr_loss=0.358, over 17025.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1205, cr_loss=0.3364, over 3345819.65 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:51:49,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=808430.0, ans=0.125 2024-09-25 18:51:56,180 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=808476.6666666666, ans=0.025 2024-09-25 18:52:01,116 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=808476.6666666666, ans=0.1 2024-09-25 18:52:02,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=808476.6666666666, ans=0.025 2024-09-25 18:52:09,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=808523.3333333334, ans=0.2 2024-09-25 18:52:20,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=808523.3333333334, ans=0.125 2024-09-25 18:52:26,149 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.286e+02 1.376e+02 1.494e+02 2.508e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-25 18:52:34,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=808570.0, ans=0.2 2024-09-25 18:52:40,641 INFO [train.py:1198] (0/4) Epoch 45, batch 1850, loss[loss=0.212, ctc_loss=0.1355, cr_loss=0.3825, over 17223.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1206, cr_loss=0.3371, over 3352189.48 frames. ], batch size: 47, lr: 2.64e-03, grad_scale: 16.0 2024-09-25 18:53:09,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=808663.3333333334, ans=0.125 2024-09-25 18:53:09,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=808663.3333333334, ans=0.0 2024-09-25 18:53:14,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=808710.0, ans=0.0 2024-09-25 18:53:20,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=808710.0, ans=0.125 2024-09-25 18:53:25,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=808710.0, ans=0.125 2024-09-25 18:53:28,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=808756.6666666666, ans=0.125 2024-09-25 18:53:44,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=808803.3333333334, ans=0.125 2024-09-25 18:54:00,632 INFO [train.py:1198] (0/4) Epoch 45, batch 1900, loss[loss=0.1584, ctc_loss=0.0996, cr_loss=0.2941, over 17313.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1198, cr_loss=0.3354, over 3360571.24 frames. ], batch size: 49, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 18:54:18,937 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2024-09-25 18:54:28,284 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=808896.6666666666, ans=0.125 2024-09-25 18:55:12,117 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.309e+02 1.390e+02 1.516e+02 1.973e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-25 18:55:15,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=809036.6666666666, ans=0.125 2024-09-25 18:55:17,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=809036.6666666666, ans=0.0 2024-09-25 18:55:18,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=809036.6666666666, ans=0.0 2024-09-25 18:55:20,903 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2024-09-25 18:55:26,421 INFO [train.py:1198] (0/4) Epoch 45, batch 1950, loss[loss=0.195, ctc_loss=0.1261, cr_loss=0.3445, over 17200.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1203, cr_loss=0.3365, over 3367209.10 frames. ], batch size: 50, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 18:55:28,503 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=809083.3333333334, ans=0.125 2024-09-25 18:55:33,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2024-09-25 18:55:44,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=809130.0, ans=0.125 2024-09-25 18:55:52,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=809130.0, ans=0.2 2024-09-25 18:56:25,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=809223.3333333334, ans=0.125 2024-09-25 18:56:28,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809223.3333333334, ans=0.1 2024-09-25 18:56:52,062 INFO [train.py:1198] (0/4) Epoch 45, batch 2000, loss[loss=0.1731, ctc_loss=0.1081, cr_loss=0.3246, over 17262.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1212, cr_loss=0.3385, over 3359217.96 frames. ], batch size: 44, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 18:57:00,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=809316.6666666666, ans=0.2 2024-09-25 18:57:13,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=809363.3333333334, ans=0.125 2024-09-25 18:57:19,343 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=809363.3333333334, ans=0.125 2024-09-25 18:57:20,259 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.14 vs. limit=15.0 2024-09-25 18:57:57,599 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.313e+02 1.395e+02 1.516e+02 2.306e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 18:58:12,072 INFO [train.py:1198] (0/4) Epoch 45, batch 2050, loss[loss=0.1318, ctc_loss=0.08015, cr_loss=0.2582, over 17116.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1203, cr_loss=0.3364, over 3354232.70 frames. ], batch size: 40, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 18:58:25,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=809550.0, ans=0.125 2024-09-25 18:58:26,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809596.6666666666, ans=0.1 2024-09-25 18:58:46,571 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2024-09-25 18:59:02,620 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=15.0 2024-09-25 18:59:04,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2024-09-25 18:59:05,551 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2024-09-25 18:59:16,660 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 18:59:18,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=809736.6666666666, ans=0.125 2024-09-25 18:59:21,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=809736.6666666666, ans=0.125 2024-09-25 18:59:32,219 INFO [train.py:1198] (0/4) Epoch 45, batch 2100, loss[loss=0.1648, ctc_loss=0.1062, cr_loss=0.2933, over 17197.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1194, cr_loss=0.3351, over 3348346.20 frames. ], batch size: 50, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 18:59:37,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=809783.3333333334, ans=0.125 2024-09-25 18:59:42,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=809783.3333333334, ans=0.125 2024-09-25 18:59:54,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=809830.0, ans=0.125 2024-09-25 18:59:54,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=809830.0, ans=0.0 2024-09-25 18:59:58,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809830.0, ans=0.1 2024-09-25 19:00:40,213 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.255e+02 1.338e+02 1.412e+02 1.754e+02, threshold=2.676e+02, percent-clipped=0.0 2024-09-25 19:00:42,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=809970.0, ans=0.125 2024-09-25 19:00:57,180 INFO [train.py:1198] (0/4) Epoch 45, batch 2150, loss[loss=0.2414, ctc_loss=0.157, cr_loss=0.4222, over 16882.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1197, cr_loss=0.3355, over 3349479.31 frames. ], batch size: 58, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:01:00,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=810016.6666666666, ans=0.125 2024-09-25 19:01:09,326 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-09-25 19:01:30,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=810110.0, ans=0.0 2024-09-25 19:01:38,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=810110.0, ans=0.1 2024-09-25 19:02:05,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=810203.3333333334, ans=0.0 2024-09-25 19:02:20,042 INFO [train.py:1198] (0/4) Epoch 45, batch 2200, loss[loss=0.1812, ctc_loss=0.1141, cr_loss=0.3356, over 17016.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1196, cr_loss=0.3355, over 3345175.34 frames. ], batch size: 44, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:02:27,088 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2024-09-25 19:02:42,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=810296.6666666666, ans=0.125 2024-09-25 19:02:44,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=810296.6666666666, ans=0.025 2024-09-25 19:03:03,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=810343.3333333334, ans=0.125 2024-09-25 19:03:07,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.45 vs. limit=10.0 2024-09-25 19:03:27,150 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.297e+02 1.368e+02 1.487e+02 2.152e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 19:03:27,585 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=810436.6666666666, ans=0.0 2024-09-25 19:03:29,544 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.01 vs. limit=10.0 2024-09-25 19:03:40,110 INFO [train.py:1198] (0/4) Epoch 45, batch 2250, loss[loss=0.1902, ctc_loss=0.1185, cr_loss=0.3584, over 17023.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1196, cr_loss=0.3357, over 3352035.09 frames. ], batch size: 51, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:04:04,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=810530.0, ans=0.125 2024-09-25 19:04:28,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=810623.3333333334, ans=0.04949747468305833 2024-09-25 19:04:31,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=810623.3333333334, ans=0.04949747468305833 2024-09-25 19:04:34,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810623.3333333334, ans=0.1 2024-09-25 19:04:36,215 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=810623.3333333334, ans=0.2 2024-09-25 19:05:01,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=810716.6666666666, ans=0.125 2024-09-25 19:05:02,935 INFO [train.py:1198] (0/4) Epoch 45, batch 2300, loss[loss=0.1814, ctc_loss=0.113, cr_loss=0.3421, over 16942.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3374, over 3346186.84 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:05:23,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=810763.3333333334, ans=0.025 2024-09-25 19:05:46,923 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=810810.0, ans=0.09899494936611666 2024-09-25 19:06:12,005 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.306e+02 1.375e+02 1.455e+02 1.967e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-25 19:06:27,349 INFO [train.py:1198] (0/4) Epoch 45, batch 2350, loss[loss=0.1636, ctc_loss=0.1019, cr_loss=0.3086, over 17271.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1213, cr_loss=0.3388, over 3353164.58 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:06:52,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2024-09-25 19:07:07,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=811043.3333333334, ans=0.2 2024-09-25 19:07:11,288 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.06 vs. limit=6.0 2024-09-25 19:07:17,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=811090.0, ans=0.2 2024-09-25 19:07:20,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=811090.0, ans=0.0 2024-09-25 19:07:20,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=811090.0, ans=0.125 2024-09-25 19:07:22,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=22.5 2024-09-25 19:07:46,977 INFO [train.py:1198] (0/4) Epoch 45, batch 2400, loss[loss=0.2202, ctc_loss=0.1431, cr_loss=0.3858, over 17226.00 frames. ], tot_loss[loss=0.1896, ctc_loss=0.1216, cr_loss=0.3398, over 3351286.39 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:08:01,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=811230.0, ans=0.025 2024-09-25 19:08:19,603 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=811276.6666666666, ans=0.0 2024-09-25 19:08:55,592 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.292e+02 1.391e+02 1.494e+02 2.070e+02, threshold=2.781e+02, percent-clipped=0.0 2024-09-25 19:09:07,045 INFO [train.py:1198] (0/4) Epoch 45, batch 2450, loss[loss=0.2063, ctc_loss=0.1291, cr_loss=0.3858, over 17237.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1217, cr_loss=0.3401, over 3350724.02 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:09:23,267 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=811463.3333333334, ans=0.0 2024-09-25 19:09:54,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=811510.0, ans=0.025 2024-09-25 19:10:11,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=811556.6666666666, ans=0.0 2024-09-25 19:10:32,452 INFO [train.py:1198] (0/4) Epoch 45, batch 2500, loss[loss=0.16, ctc_loss=0.1012, cr_loss=0.294, over 17292.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1205, cr_loss=0.3375, over 3359531.66 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:10:43,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=811650.0, ans=0.125 2024-09-25 19:11:09,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=811743.3333333334, ans=0.125 2024-09-25 19:11:20,272 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-25 19:11:38,771 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=811790.0, ans=0.125 2024-09-25 19:11:46,432 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.307e+02 1.405e+02 1.520e+02 3.997e+02, threshold=2.811e+02, percent-clipped=1.0 2024-09-25 19:11:57,471 INFO [train.py:1198] (0/4) Epoch 45, batch 2550, loss[loss=0.1919, ctc_loss=0.1217, cr_loss=0.3509, over 17349.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1208, cr_loss=0.3385, over 3366681.00 frames. ], batch size: 48, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:12:00,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=811883.3333333334, ans=0.125 2024-09-25 19:12:16,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=811930.0, ans=0.07 2024-09-25 19:12:20,065 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=811930.0, ans=0.125 2024-09-25 19:12:20,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-09-25 19:12:55,840 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=812023.3333333334, ans=0.2 2024-09-25 19:13:18,003 INFO [train.py:1198] (0/4) Epoch 45, batch 2600, loss[loss=0.2008, ctc_loss=0.13, cr_loss=0.354, over 16993.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1211, cr_loss=0.3387, over 3365184.09 frames. ], batch size: 53, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:13:21,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=812116.6666666666, ans=0.0 2024-09-25 19:13:32,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=812163.3333333334, ans=0.125 2024-09-25 19:13:36,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=812163.3333333334, ans=15.0 2024-09-25 19:13:53,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=812210.0, ans=0.125 2024-09-25 19:14:23,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=812303.3333333334, ans=0.0 2024-09-25 19:14:26,619 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.298e+02 1.365e+02 1.508e+02 3.211e+02, threshold=2.730e+02, percent-clipped=1.0 2024-09-25 19:14:33,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=812303.3333333334, ans=0.125 2024-09-25 19:14:37,607 INFO [train.py:1198] (0/4) Epoch 45, batch 2650, loss[loss=0.1773, ctc_loss=0.1114, cr_loss=0.3299, over 17036.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1207, cr_loss=0.3375, over 3360763.96 frames. ], batch size: 44, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:14:58,050 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-09-25 19:14:59,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-09-25 19:15:44,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=812490.0, ans=0.0 2024-09-25 19:16:05,232 INFO [train.py:1198] (0/4) Epoch 45, batch 2700, loss[loss=0.1521, ctc_loss=0.09614, cr_loss=0.2796, over 16944.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1209, cr_loss=0.3381, over 3366852.43 frames. ], batch size: 42, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:16:12,265 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-09-25 19:16:20,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=812583.3333333334, ans=0.125 2024-09-25 19:16:38,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=812676.6666666666, ans=0.0 2024-09-25 19:16:43,222 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=812676.6666666666, ans=0.2 2024-09-25 19:16:43,425 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=12.0 2024-09-25 19:16:54,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=812723.3333333334, ans=0.125 2024-09-25 19:17:07,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=812723.3333333334, ans=0.125 2024-09-25 19:17:07,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=812723.3333333334, ans=6.0 2024-09-25 19:17:16,507 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.332e+02 1.395e+02 1.480e+02 1.969e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 19:17:25,379 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.88 vs. limit=10.0 2024-09-25 19:17:27,702 INFO [train.py:1198] (0/4) Epoch 45, batch 2750, loss[loss=0.1708, ctc_loss=0.1103, cr_loss=0.3027, over 17352.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1207, cr_loss=0.3379, over 3358047.59 frames. ], batch size: 48, lr: 2.63e-03, grad_scale: 16.0 2024-09-25 19:17:28,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=812816.6666666666, ans=0.1 2024-09-25 19:17:31,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=812816.6666666666, ans=0.2 2024-09-25 19:17:36,852 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=15.0 2024-09-25 19:17:37,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=812816.6666666666, ans=0.0 2024-09-25 19:18:03,077 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=812910.0, ans=0.125 2024-09-25 19:18:15,899 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=812956.6666666666, ans=0.125 2024-09-25 19:18:19,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=812956.6666666666, ans=0.0 2024-09-25 19:18:25,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=812956.6666666666, ans=0.0 2024-09-25 19:18:40,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2024-09-25 19:18:47,799 INFO [train.py:1198] (0/4) Epoch 45, batch 2800, loss[loss=0.1992, ctc_loss=0.1281, cr_loss=0.3558, over 17213.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1204, cr_loss=0.3365, over 3349994.81 frames. ], batch size: 55, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:19:45,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=813190.0, ans=0.0 2024-09-25 19:20:01,231 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.322e+02 1.415e+02 1.536e+02 2.357e+02, threshold=2.830e+02, percent-clipped=0.0 2024-09-25 19:20:12,480 INFO [train.py:1198] (0/4) Epoch 45, batch 2850, loss[loss=0.1777, ctc_loss=0.1141, cr_loss=0.3179, over 17030.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1207, cr_loss=0.337, over 3347936.28 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:20:37,599 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=813330.0, ans=0.1 2024-09-25 19:21:22,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=813470.0, ans=0.2 2024-09-25 19:21:37,530 INFO [train.py:1198] (0/4) Epoch 45, batch 2900, loss[loss=0.182, ctc_loss=0.1164, cr_loss=0.3282, over 17177.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1198, cr_loss=0.3359, over 3356412.00 frames. ], batch size: 45, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:21:38,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=22.5 2024-09-25 19:21:49,431 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2024-09-25 19:21:50,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=813516.6666666666, ans=0.1 2024-09-25 19:22:04,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=813563.3333333334, ans=0.1 2024-09-25 19:22:17,741 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=813610.0, ans=0.125 2024-09-25 19:22:22,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=813610.0, ans=0.125 2024-09-25 19:22:37,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=813656.6666666666, ans=0.125 2024-09-25 19:22:40,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=813703.3333333334, ans=0.95 2024-09-25 19:22:40,787 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=12.0 2024-09-25 19:22:46,286 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.314e+02 1.430e+02 1.564e+02 2.364e+02, threshold=2.859e+02, percent-clipped=0.0 2024-09-25 19:22:57,462 INFO [train.py:1198] (0/4) Epoch 45, batch 2950, loss[loss=0.1896, ctc_loss=0.122, cr_loss=0.3379, over 17317.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1189, cr_loss=0.3344, over 3368111.92 frames. ], batch size: 46, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:23:04,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=813750.0, ans=0.125 2024-09-25 19:23:18,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=813796.6666666666, ans=0.025 2024-09-25 19:23:41,413 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-09-25 19:23:47,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=813890.0, ans=0.125 2024-09-25 19:24:09,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=813936.6666666666, ans=0.125 2024-09-25 19:24:13,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=813936.6666666666, ans=0.02 2024-09-25 19:24:16,612 INFO [train.py:1198] (0/4) Epoch 45, batch 3000, loss[loss=0.2036, ctc_loss=0.1284, cr_loss=0.376, over 17006.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1185, cr_loss=0.3337, over 3375394.17 frames. ], batch size: 56, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:24:16,613 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 19:24:30,118 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6404, 3.8759, 3.2644, 3.6070], device='cuda:0') 2024-09-25 19:24:32,379 INFO [train.py:1230] (0/4) Epoch 45, validation: loss=0.03541, ctc_loss=0.03541, cr_loss=1.054e-14, over 944034.00 frames. 2024-09-25 19:24:32,380 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 19:25:05,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=814076.6666666666, ans=0.125 2024-09-25 19:25:17,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=814076.6666666666, ans=0.0 2024-09-25 19:25:29,918 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=814123.3333333334, ans=0.125 2024-09-25 19:25:44,657 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.310e+02 1.382e+02 1.500e+02 2.246e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-25 19:25:55,843 INFO [train.py:1198] (0/4) Epoch 45, batch 3050, loss[loss=0.1878, ctc_loss=0.1192, cr_loss=0.3426, over 17146.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1181, cr_loss=0.333, over 3370768.15 frames. ], batch size: 48, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:26:08,462 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=814216.6666666666, ans=0.2 2024-09-25 19:26:13,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=814263.3333333334, ans=0.125 2024-09-25 19:26:33,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=814310.0, ans=0.1 2024-09-25 19:26:38,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-25 19:27:13,877 INFO [train.py:1198] (0/4) Epoch 45, batch 3100, loss[loss=0.1973, ctc_loss=0.1281, cr_loss=0.3458, over 17003.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1194, cr_loss=0.3351, over 3365143.31 frames. ], batch size: 53, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:27:14,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=814450.0, ans=0.05 2024-09-25 19:28:13,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=814590.0, ans=0.0 2024-09-25 19:28:23,832 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.327e+02 1.418e+02 1.524e+02 1.856e+02, threshold=2.835e+02, percent-clipped=0.0 2024-09-25 19:28:27,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2024-09-25 19:28:36,921 INFO [train.py:1198] (0/4) Epoch 45, batch 3150, loss[loss=0.1907, ctc_loss=0.1246, cr_loss=0.3301, over 17209.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.12, cr_loss=0.3359, over 3348735.25 frames. ], batch size: 47, lr: 2.63e-03, grad_scale: 32.0 2024-09-25 19:28:37,140 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=814683.3333333334, ans=0.125 2024-09-25 19:29:29,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814823.3333333334, ans=0.1 2024-09-25 19:29:33,232 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-09-25 19:29:53,406 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2024-09-25 19:29:55,899 INFO [train.py:1198] (0/4) Epoch 45, batch 3200, loss[loss=0.1894, ctc_loss=0.126, cr_loss=0.3167, over 12088.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1194, cr_loss=0.3346, over 3345323.67 frames. ], batch size: 123, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:29:59,229 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=814916.6666666666, ans=0.1 2024-09-25 19:30:02,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=814916.6666666666, ans=0.1 2024-09-25 19:30:02,985 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=12.0 2024-09-25 19:30:18,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=814963.3333333334, ans=0.125 2024-09-25 19:30:19,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=814963.3333333334, ans=0.05 2024-09-25 19:30:55,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=815056.6666666666, ans=0.0 2024-09-25 19:31:00,266 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=815103.3333333334, ans=0.125 2024-09-25 19:31:03,191 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.085e+02 1.319e+02 1.396e+02 1.488e+02 2.041e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-25 19:31:14,382 INFO [train.py:1198] (0/4) Epoch 45, batch 3250, loss[loss=0.2111, ctc_loss=0.1366, cr_loss=0.3725, over 16014.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1201, cr_loss=0.3363, over 3359673.75 frames. ], batch size: 74, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:31:21,089 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=815150.0, ans=0.0 2024-09-25 19:31:33,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=815196.6666666666, ans=0.125 2024-09-25 19:31:41,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=815196.6666666666, ans=0.2 2024-09-25 19:31:49,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815243.3333333334, ans=0.1 2024-09-25 19:31:50,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=815243.3333333334, ans=0.0 2024-09-25 19:32:01,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=815290.0, ans=0.0 2024-09-25 19:32:11,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=815290.0, ans=0.0 2024-09-25 19:32:16,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=815336.6666666666, ans=0.125 2024-09-25 19:32:28,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=815336.6666666666, ans=0.95 2024-09-25 19:32:29,267 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-09-25 19:32:30,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=22.5 2024-09-25 19:32:33,043 INFO [train.py:1198] (0/4) Epoch 45, batch 3300, loss[loss=0.1521, ctc_loss=0.0946, cr_loss=0.2877, over 17178.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1199, cr_loss=0.3364, over 3368432.36 frames. ], batch size: 41, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:32:36,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=815383.3333333334, ans=0.1 2024-09-25 19:32:58,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=815430.0, ans=0.0 2024-09-25 19:33:01,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=815430.0, ans=0.125 2024-09-25 19:33:12,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=815476.6666666666, ans=0.025 2024-09-25 19:33:15,470 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815476.6666666666, ans=0.1 2024-09-25 19:33:16,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=815476.6666666666, ans=0.0 2024-09-25 19:33:40,405 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.311e+02 1.407e+02 1.511e+02 1.886e+02, threshold=2.815e+02, percent-clipped=0.0 2024-09-25 19:33:42,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815570.0, ans=0.1 2024-09-25 19:33:48,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=815570.0, ans=0.125 2024-09-25 19:33:51,419 INFO [train.py:1198] (0/4) Epoch 45, batch 3350, loss[loss=0.2088, ctc_loss=0.1338, cr_loss=0.3748, over 17214.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1196, cr_loss=0.3361, over 3372649.93 frames. ], batch size: 50, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:34:05,791 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=815663.3333333334, ans=0.0 2024-09-25 19:34:28,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=815710.0, ans=0.0 2024-09-25 19:34:36,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=12.0 2024-09-25 19:34:48,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=815756.6666666666, ans=0.0 2024-09-25 19:35:05,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=815803.3333333334, ans=0.125 2024-09-25 19:35:09,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=815850.0, ans=0.125 2024-09-25 19:35:10,348 INFO [train.py:1198] (0/4) Epoch 45, batch 3400, loss[loss=0.1572, ctc_loss=0.09639, cr_loss=0.304, over 17036.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1193, cr_loss=0.3352, over 3369784.92 frames. ], batch size: 39, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:35:23,513 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=15.0 2024-09-25 19:35:28,515 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.02 vs. limit=6.0 2024-09-25 19:36:14,207 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-09-25 19:36:17,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=816036.6666666666, ans=0.0 2024-09-25 19:36:19,819 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.104e+02 1.333e+02 1.429e+02 1.523e+02 2.294e+02, threshold=2.857e+02, percent-clipped=0.0 2024-09-25 19:36:30,575 INFO [train.py:1198] (0/4) Epoch 45, batch 3450, loss[loss=0.19, ctc_loss=0.1192, cr_loss=0.3541, over 17014.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1195, cr_loss=0.3353, over 3362297.12 frames. ], batch size: 51, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:36:38,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=816083.3333333334, ans=0.04949747468305833 2024-09-25 19:36:41,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=816083.3333333334, ans=0.1 2024-09-25 19:36:55,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=816130.0, ans=0.125 2024-09-25 19:36:57,035 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=816130.0, ans=0.125 2024-09-25 19:37:30,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=816223.3333333334, ans=0.125 2024-09-25 19:37:35,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=816270.0, ans=0.125 2024-09-25 19:37:35,831 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2024-09-25 19:37:41,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=816270.0, ans=0.125 2024-09-25 19:37:42,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=816270.0, ans=0.125 2024-09-25 19:37:50,526 INFO [train.py:1198] (0/4) Epoch 45, batch 3500, loss[loss=0.1921, ctc_loss=0.1218, cr_loss=0.3518, over 17177.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1197, cr_loss=0.3356, over 3369249.87 frames. ], batch size: 45, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:37:53,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=816316.6666666666, ans=0.025 2024-09-25 19:38:15,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=816363.3333333334, ans=0.125 2024-09-25 19:38:23,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=816410.0, ans=0.125 2024-09-25 19:38:25,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=816410.0, ans=0.2 2024-09-25 19:38:30,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=816410.0, ans=0.025 2024-09-25 19:38:41,796 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-25 19:38:51,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=816456.6666666666, ans=0.1 2024-09-25 19:38:53,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=816503.3333333334, ans=0.125 2024-09-25 19:39:01,037 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.307e+02 1.374e+02 1.461e+02 1.986e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-25 19:39:09,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=816550.0, ans=0.0 2024-09-25 19:39:10,439 INFO [train.py:1198] (0/4) Epoch 45, batch 3550, loss[loss=0.209, ctc_loss=0.1361, cr_loss=0.3646, over 16092.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1201, cr_loss=0.3371, over 3373290.91 frames. ], batch size: 74, lr: 2.62e-03, grad_scale: 16.0 2024-09-25 19:39:35,866 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=816596.6666666666, ans=0.1 2024-09-25 19:39:35,987 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=816596.6666666666, ans=0.125 2024-09-25 19:39:53,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=816643.3333333334, ans=0.125 2024-09-25 19:39:56,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=816690.0, ans=0.0 2024-09-25 19:40:15,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=816736.6666666666, ans=0.2 2024-09-25 19:40:28,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-09-25 19:40:28,820 INFO [train.py:1198] (0/4) Epoch 45, batch 3600, loss[loss=0.175, ctc_loss=0.1109, cr_loss=0.3209, over 17140.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1202, cr_loss=0.3371, over 3367953.52 frames. ], batch size: 48, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:40:35,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-09-25 19:40:46,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=816830.0, ans=0.125 2024-09-25 19:40:59,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=816876.6666666666, ans=0.125 2024-09-25 19:40:59,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=816876.6666666666, ans=0.125 2024-09-25 19:41:22,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=816923.3333333334, ans=0.125 2024-09-25 19:41:37,495 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 1.298e+02 1.374e+02 1.488e+02 2.136e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-25 19:41:46,856 INFO [train.py:1198] (0/4) Epoch 45, batch 3650, loss[loss=0.1757, ctc_loss=0.1106, cr_loss=0.3258, over 17269.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.12, cr_loss=0.3373, over 3371154.02 frames. ], batch size: 44, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:41:53,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=817016.6666666666, ans=0.025 2024-09-25 19:41:57,914 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=817016.6666666666, ans=0.125 2024-09-25 19:41:59,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817016.6666666666, ans=0.125 2024-09-25 19:42:11,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=817063.3333333334, ans=0.125 2024-09-25 19:42:21,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=817110.0, ans=0.125 2024-09-25 19:42:37,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=817156.6666666666, ans=0.2 2024-09-25 19:43:04,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=817250.0, ans=0.125 2024-09-25 19:43:05,644 INFO [train.py:1198] (0/4) Epoch 45, batch 3700, loss[loss=0.1551, ctc_loss=0.09707, cr_loss=0.2899, over 15790.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1201, cr_loss=0.3378, over 3373938.40 frames. ], batch size: 35, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:43:16,871 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=817250.0, ans=0.125 2024-09-25 19:43:30,799 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=817296.6666666666, ans=0.0 2024-09-25 19:43:30,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=817296.6666666666, ans=0.05 2024-09-25 19:43:49,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=817343.3333333334, ans=0.025 2024-09-25 19:43:56,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817390.0, ans=0.1 2024-09-25 19:44:14,696 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.153e+02 1.280e+02 1.340e+02 1.437e+02 1.807e+02, threshold=2.680e+02, percent-clipped=0.0 2024-09-25 19:44:24,088 INFO [train.py:1198] (0/4) Epoch 45, batch 3750, loss[loss=0.1602, ctc_loss=0.1016, cr_loss=0.2934, over 17024.00 frames. ], tot_loss[loss=0.1881, ctc_loss=0.1206, cr_loss=0.3378, over 3356300.66 frames. ], batch size: 39, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:44:33,631 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:44:46,342 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=817530.0, ans=0.2 2024-09-25 19:44:46,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=817530.0, ans=0.0 2024-09-25 19:44:51,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=817530.0, ans=0.125 2024-09-25 19:45:44,860 INFO [train.py:1198] (0/4) Epoch 45, batch 3800, loss[loss=0.152, ctc_loss=0.09718, cr_loss=0.2743, over 17186.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.12, cr_loss=0.337, over 3348326.22 frames. ], batch size: 41, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:46:23,353 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:46:26,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=817810.0, ans=0.125 2024-09-25 19:46:31,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=817856.6666666666, ans=0.04949747468305833 2024-09-25 19:46:48,488 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=12.0 2024-09-25 19:46:49,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=817903.3333333334, ans=0.125 2024-09-25 19:46:54,350 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.160e+02 1.363e+02 1.462e+02 1.575e+02 2.339e+02, threshold=2.925e+02, percent-clipped=0.0 2024-09-25 19:47:03,900 INFO [train.py:1198] (0/4) Epoch 45, batch 3850, loss[loss=0.1646, ctc_loss=0.1017, cr_loss=0.3148, over 16959.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.121, cr_loss=0.3386, over 3306225.62 frames. ], batch size: 42, lr: 2.62e-03, grad_scale: 32.0 2024-09-25 19:47:12,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=817950.0, ans=0.125 2024-09-25 19:47:38,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818043.3333333334, ans=0.1 2024-09-25 19:47:41,539 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=818043.3333333334, ans=0.025 2024-09-25 19:48:12,151 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=818136.6666666666, ans=0.125 2024-09-25 19:48:16,194 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-45.pt 2024-09-25 19:49:06,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=818164.6666666666, ans=0.0 2024-09-25 19:49:06,554 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-09-25 19:49:07,315 INFO [train.py:1198] (0/4) Epoch 46, batch 0, loss[loss=0.2276, ctc_loss=0.1461, cr_loss=0.4071, over 16995.00 frames. ], tot_loss[loss=0.2276, ctc_loss=0.1461, cr_loss=0.4071, over 16995.00 frames. ], batch size: 53, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:49:07,316 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 19:49:21,053 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1723, 4.5727, 4.2033, 4.3037], device='cuda:0') 2024-09-25 19:49:22,414 INFO [train.py:1230] (0/4) Epoch 46, validation: loss=0.03502, ctc_loss=0.03502, cr_loss=1.054e-14, over 944034.00 frames. 2024-09-25 19:49:22,414 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 19:49:35,433 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 19:49:38,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=818211.3333333334, ans=0.025 2024-09-25 19:49:40,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=818211.3333333334, ans=0.0 2024-09-25 19:49:51,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=818211.3333333334, ans=0.125 2024-09-25 19:49:59,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=818258.0, ans=0.0 2024-09-25 19:49:59,700 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818258.0, ans=0.1 2024-09-25 19:50:05,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=818258.0, ans=0.125 2024-09-25 19:50:38,875 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-25 19:50:40,889 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.337e+02 1.469e+02 1.679e+02 2.405e+02, threshold=2.939e+02, percent-clipped=0.0 2024-09-25 19:50:44,080 INFO [train.py:1198] (0/4) Epoch 46, batch 50, loss[loss=0.1743, ctc_loss=0.1098, cr_loss=0.3223, over 17248.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1201, cr_loss=0.3377, over 761174.88 frames. ], batch size: 44, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:51:35,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=818538.0, ans=0.0 2024-09-25 19:51:37,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=818538.0, ans=0.0 2024-09-25 19:51:56,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=818584.6666666666, ans=0.125 2024-09-25 19:51:58,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=818584.6666666666, ans=0.125 2024-09-25 19:52:08,976 INFO [train.py:1198] (0/4) Epoch 46, batch 100, loss[loss=0.2151, ctc_loss=0.1387, cr_loss=0.382, over 17167.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1177, cr_loss=0.3323, over 1340641.13 frames. ], batch size: 45, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:52:40,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=818724.6666666666, ans=0.2 2024-09-25 19:53:05,127 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818771.3333333334, ans=0.1 2024-09-25 19:53:05,684 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2024-09-25 19:53:06,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=818771.3333333334, ans=0.025 2024-09-25 19:53:28,960 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.282e+02 1.339e+02 1.432e+02 3.555e+02, threshold=2.678e+02, percent-clipped=2.0 2024-09-25 19:53:31,424 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2024-09-25 19:53:32,233 INFO [train.py:1198] (0/4) Epoch 46, batch 150, loss[loss=0.193, ctc_loss=0.1227, cr_loss=0.3516, over 17023.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1183, cr_loss=0.3332, over 1788024.00 frames. ], batch size: 53, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:53:59,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=818911.3333333334, ans=0.125 2024-09-25 19:54:08,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=818958.0, ans=0.2 2024-09-25 19:54:13,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=818958.0, ans=0.0 2024-09-25 19:54:15,618 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.01 vs. limit=22.5 2024-09-25 19:54:18,372 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=819004.6666666666, ans=0.125 2024-09-25 19:54:33,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=819004.6666666666, ans=0.0 2024-09-25 19:54:40,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=819051.3333333334, ans=0.125 2024-09-25 19:54:50,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=819098.0, ans=0.125 2024-09-25 19:54:51,568 INFO [train.py:1198] (0/4) Epoch 46, batch 200, loss[loss=0.152, ctc_loss=0.09488, cr_loss=0.2857, over 17091.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3357, over 2134902.59 frames. ], batch size: 43, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:55:07,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=819144.6666666666, ans=0.125 2024-09-25 19:55:20,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=819144.6666666666, ans=0.035 2024-09-25 19:55:39,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=15.0 2024-09-25 19:55:48,851 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819238.0, ans=0.1 2024-09-25 19:55:52,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=819238.0, ans=15.0 2024-09-25 19:55:58,767 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=12.0 2024-09-25 19:56:13,861 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.302e+02 1.353e+02 1.428e+02 2.054e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-25 19:56:14,193 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=819284.6666666666, ans=10.0 2024-09-25 19:56:17,286 INFO [train.py:1198] (0/4) Epoch 46, batch 250, loss[loss=0.1738, ctc_loss=0.1082, cr_loss=0.3281, over 17112.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3356, over 2404465.99 frames. ], batch size: 43, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 19:56:32,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=819378.0, ans=0.125 2024-09-25 19:56:48,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=819378.0, ans=0.025 2024-09-25 19:56:49,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=819378.0, ans=0.125 2024-09-25 19:57:11,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=12.0 2024-09-25 19:57:18,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=819471.3333333334, ans=0.125 2024-09-25 19:57:22,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=819471.3333333334, ans=0.2 2024-09-25 19:57:40,864 INFO [train.py:1198] (0/4) Epoch 46, batch 300, loss[loss=0.1979, ctc_loss=0.1264, cr_loss=0.3576, over 17233.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1203, cr_loss=0.3381, over 2613083.33 frames. ], batch size: 50, lr: 2.59e-03, grad_scale: 16.0 2024-09-25 19:57:55,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=819611.3333333334, ans=0.125 2024-09-25 19:58:03,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=819611.3333333334, ans=0.0 2024-09-25 19:59:02,222 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.300e+02 1.374e+02 1.446e+02 2.011e+02, threshold=2.748e+02, percent-clipped=0.0 2024-09-25 19:59:02,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=819798.0, ans=0.2 2024-09-25 19:59:03,735 INFO [train.py:1198] (0/4) Epoch 46, batch 350, loss[loss=0.2119, ctc_loss=0.1379, cr_loss=0.3699, over 17029.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3378, over 2777546.67 frames. ], batch size: 56, lr: 2.59e-03, grad_scale: 16.0 2024-09-25 19:59:08,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819798.0, ans=0.1 2024-09-25 19:59:47,777 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-09-25 19:59:53,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=819938.0, ans=0.125 2024-09-25 20:00:17,688 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2024-09-25 20:00:23,196 INFO [train.py:1198] (0/4) Epoch 46, batch 400, loss[loss=0.2003, ctc_loss=0.129, cr_loss=0.3567, over 17298.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1204, cr_loss=0.3379, over 2900711.16 frames. ], batch size: 49, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:01:23,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=820171.3333333334, ans=0.125 2024-09-25 20:01:50,263 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.300e+02 1.386e+02 1.492e+02 1.969e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-25 20:01:52,001 INFO [train.py:1198] (0/4) Epoch 46, batch 450, loss[loss=0.1897, ctc_loss=0.1242, cr_loss=0.3274, over 17347.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1204, cr_loss=0.338, over 3008234.05 frames. ], batch size: 48, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:02:14,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820311.3333333334, ans=0.1 2024-09-25 20:02:39,200 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=820404.6666666666, ans=0.0 2024-09-25 20:02:39,220 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=820404.6666666666, ans=0.0 2024-09-25 20:03:08,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=820451.3333333334, ans=0.1 2024-09-25 20:03:14,982 INFO [train.py:1198] (0/4) Epoch 46, batch 500, loss[loss=0.1978, ctc_loss=0.1261, cr_loss=0.3583, over 16892.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.121, cr_loss=0.3387, over 3063131.22 frames. ], batch size: 58, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:03:24,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=820498.0, ans=0.125 2024-09-25 20:03:28,892 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-09-25 20:03:34,505 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=820544.6666666666, ans=0.0 2024-09-25 20:03:38,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2024-09-25 20:04:19,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820684.6666666666, ans=0.1 2024-09-25 20:04:22,780 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=820684.6666666666, ans=0.09899494936611666 2024-09-25 20:04:27,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=820684.6666666666, ans=0.0 2024-09-25 20:04:33,505 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.017e+02 1.336e+02 1.423e+02 1.544e+02 3.291e+02, threshold=2.846e+02, percent-clipped=2.0 2024-09-25 20:04:35,166 INFO [train.py:1198] (0/4) Epoch 46, batch 550, loss[loss=0.177, ctc_loss=0.1129, cr_loss=0.3206, over 17028.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1211, cr_loss=0.3387, over 3136416.06 frames. ], batch size: 39, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:04:53,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=820778.0, ans=0.2 2024-09-25 20:05:19,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=820824.6666666666, ans=0.125 2024-09-25 20:06:00,563 INFO [train.py:1198] (0/4) Epoch 46, batch 600, loss[loss=0.1849, ctc_loss=0.116, cr_loss=0.3447, over 17300.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1211, cr_loss=0.3386, over 3177315.32 frames. ], batch size: 46, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:06:00,943 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=820964.6666666666, ans=0.2 2024-09-25 20:06:56,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821104.6666666666, ans=0.1 2024-09-25 20:07:02,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=821104.6666666666, ans=0.125 2024-09-25 20:07:21,441 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.282e+02 1.361e+02 1.501e+02 2.285e+02, threshold=2.723e+02, percent-clipped=0.0 2024-09-25 20:07:23,101 INFO [train.py:1198] (0/4) Epoch 46, batch 650, loss[loss=0.1677, ctc_loss=0.1056, cr_loss=0.3102, over 17261.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1209, cr_loss=0.3382, over 3221978.43 frames. ], batch size: 42, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:07:29,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=821198.0, ans=0.0 2024-09-25 20:07:31,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821198.0, ans=0.1 2024-09-25 20:07:36,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=821198.0, ans=0.125 2024-09-25 20:07:44,514 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2024-09-25 20:08:11,198 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-176000.pt 2024-09-25 20:08:16,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=821338.0, ans=0.0 2024-09-25 20:08:47,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=821431.3333333334, ans=0.125 2024-09-25 20:08:48,496 INFO [train.py:1198] (0/4) Epoch 46, batch 700, loss[loss=0.1926, ctc_loss=0.1203, cr_loss=0.3614, over 17223.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.12, cr_loss=0.3367, over 3254654.67 frames. ], batch size: 47, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:08:52,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=821431.3333333334, ans=0.125 2024-09-25 20:08:53,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821431.3333333334, ans=0.1 2024-09-25 20:08:53,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=821431.3333333334, ans=0.0 2024-09-25 20:08:55,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=12.0 2024-09-25 20:09:00,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=821431.3333333334, ans=0.125 2024-09-25 20:09:03,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=821478.0, ans=0.125 2024-09-25 20:09:06,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=821478.0, ans=0.0 2024-09-25 20:09:14,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=821478.0, ans=0.0 2024-09-25 20:09:28,693 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=821524.6666666666, ans=0.0 2024-09-25 20:09:32,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=22.5 2024-09-25 20:09:58,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=15.0 2024-09-25 20:10:05,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=821618.0, ans=0.1 2024-09-25 20:10:06,894 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.305e+02 1.403e+02 1.487e+02 1.713e+02, threshold=2.806e+02, percent-clipped=0.0 2024-09-25 20:10:08,542 INFO [train.py:1198] (0/4) Epoch 46, batch 750, loss[loss=0.192, ctc_loss=0.1224, cr_loss=0.348, over 17305.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1205, cr_loss=0.3379, over 3277807.01 frames. ], batch size: 51, lr: 2.59e-03, grad_scale: 32.0 2024-09-25 20:10:17,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=821664.6666666666, ans=0.125 2024-09-25 20:10:33,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=821711.3333333334, ans=0.125 2024-09-25 20:10:40,894 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2024-09-25 20:11:03,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=821804.6666666666, ans=0.125 2024-09-25 20:11:08,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=821804.6666666666, ans=0.125 2024-09-25 20:11:10,086 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=821804.6666666666, ans=0.0 2024-09-25 20:11:10,115 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=821804.6666666666, ans=0.0 2024-09-25 20:11:30,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=821851.3333333334, ans=0.0 2024-09-25 20:11:36,192 INFO [train.py:1198] (0/4) Epoch 46, batch 800, loss[loss=0.2244, ctc_loss=0.1426, cr_loss=0.4089, over 17208.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1203, cr_loss=0.3377, over 3296068.75 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 32.0 2024-09-25 20:12:33,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822038.0, ans=0.1 2024-09-25 20:12:42,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=822084.6666666666, ans=0.125 2024-09-25 20:12:43,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=822084.6666666666, ans=0.125 2024-09-25 20:12:48,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=822084.6666666666, ans=0.0 2024-09-25 20:12:54,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=22.5 2024-09-25 20:12:58,584 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.090e+02 1.283e+02 1.367e+02 1.457e+02 2.182e+02, threshold=2.735e+02, percent-clipped=0.0 2024-09-25 20:12:58,614 INFO [train.py:1198] (0/4) Epoch 46, batch 850, loss[loss=0.2346, ctc_loss=0.1509, cr_loss=0.4188, over 16939.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.1211, cr_loss=0.3387, over 3299160.71 frames. ], batch size: 58, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:13:00,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=822131.3333333334, ans=0.0 2024-09-25 20:13:00,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=822131.3333333334, ans=0.2 2024-09-25 20:13:08,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=822131.3333333334, ans=0.125 2024-09-25 20:13:24,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=822178.0, ans=0.0 2024-09-25 20:13:27,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=822178.0, ans=0.0 2024-09-25 20:14:06,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=822318.0, ans=0.125 2024-09-25 20:14:18,697 INFO [train.py:1198] (0/4) Epoch 46, batch 900, loss[loss=0.1689, ctc_loss=0.1053, cr_loss=0.3179, over 17052.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1207, cr_loss=0.3385, over 3317991.52 frames. ], batch size: 39, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:14:34,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822411.3333333334, ans=0.1 2024-09-25 20:14:37,041 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2024-09-25 20:14:39,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=822411.3333333334, ans=0.125 2024-09-25 20:15:32,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822551.3333333334, ans=0.1 2024-09-25 20:15:41,536 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.288e+02 1.364e+02 1.452e+02 2.497e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 20:15:41,561 INFO [train.py:1198] (0/4) Epoch 46, batch 950, loss[loss=0.2067, ctc_loss=0.1337, cr_loss=0.3647, over 16520.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1203, cr_loss=0.338, over 3324589.15 frames. ], batch size: 66, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:15:46,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=822598.0, ans=0.125 2024-09-25 20:15:50,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.07 vs. limit=15.0 2024-09-25 20:15:53,033 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=822598.0, ans=0.0 2024-09-25 20:16:15,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=822691.3333333334, ans=0.125 2024-09-25 20:16:27,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=822691.3333333334, ans=0.1 2024-09-25 20:16:31,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=822691.3333333334, ans=0.125 2024-09-25 20:16:39,151 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2024-09-25 20:16:56,524 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822784.6666666666, ans=0.1 2024-09-25 20:17:07,492 INFO [train.py:1198] (0/4) Epoch 46, batch 1000, loss[loss=0.1973, ctc_loss=0.1248, cr_loss=0.3627, over 17032.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1205, cr_loss=0.3385, over 3333836.13 frames. ], batch size: 44, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:17:09,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=822831.3333333334, ans=0.0 2024-09-25 20:17:25,648 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=822878.0, ans=0.05 2024-09-25 20:17:54,779 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-25 20:18:14,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=823018.0, ans=0.0 2024-09-25 20:18:26,239 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:18:30,620 INFO [train.py:1198] (0/4) Epoch 46, batch 1050, loss[loss=0.1569, ctc_loss=0.09707, cr_loss=0.299, over 17029.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1203, cr_loss=0.3379, over 3339397.78 frames. ], batch size: 39, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:18:32,120 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.054e+02 1.303e+02 1.373e+02 1.497e+02 1.983e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-25 20:19:13,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=823158.0, ans=0.125 2024-09-25 20:19:13,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=823158.0, ans=0.125 2024-09-25 20:19:25,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=823204.6666666666, ans=0.07 2024-09-25 20:19:28,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=823204.6666666666, ans=0.0 2024-09-25 20:19:36,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=823251.3333333334, ans=0.125 2024-09-25 20:19:45,241 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.01 vs. limit=12.0 2024-09-25 20:19:46,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=823251.3333333334, ans=0.125 2024-09-25 20:19:46,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-09-25 20:19:50,553 INFO [train.py:1198] (0/4) Epoch 46, batch 1100, loss[loss=0.2235, ctc_loss=0.1443, cr_loss=0.396, over 17222.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3382, over 3349547.95 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:19:54,412 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.11 vs. limit=22.5 2024-09-25 20:20:03,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=823298.0, ans=0.2 2024-09-25 20:20:15,862 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=823344.6666666666, ans=0.0 2024-09-25 20:21:15,713 INFO [train.py:1198] (0/4) Epoch 46, batch 1150, loss[loss=0.1945, ctc_loss=0.1239, cr_loss=0.3531, over 16675.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1201, cr_loss=0.3384, over 3352686.93 frames. ], batch size: 61, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:21:17,339 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.324e+02 1.372e+02 1.463e+02 5.655e+02, threshold=2.745e+02, percent-clipped=1.0 2024-09-25 20:21:33,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=823578.0, ans=0.125 2024-09-25 20:22:07,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=823671.3333333334, ans=0.125 2024-09-25 20:22:21,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=823718.0, ans=0.1 2024-09-25 20:22:38,928 INFO [train.py:1198] (0/4) Epoch 46, batch 1200, loss[loss=0.1933, ctc_loss=0.1222, cr_loss=0.3554, over 17354.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1204, cr_loss=0.3391, over 3356057.67 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:22:50,470 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:23:02,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=823811.3333333334, ans=0.125 2024-09-25 20:23:18,499 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=823858.0, ans=0.125 2024-09-25 20:23:21,633 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=823858.0, ans=0.125 2024-09-25 20:23:32,714 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=823904.6666666666, ans=0.125 2024-09-25 20:23:53,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=823951.3333333334, ans=0.025 2024-09-25 20:24:00,929 INFO [train.py:1198] (0/4) Epoch 46, batch 1250, loss[loss=0.1933, ctc_loss=0.1244, cr_loss=0.3445, over 17310.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1204, cr_loss=0.339, over 3360429.12 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:24:02,523 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.285e+02 1.375e+02 1.486e+02 1.915e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-25 20:24:21,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=824044.6666666666, ans=0.125 2024-09-25 20:24:33,524 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2024-09-25 20:24:41,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=824091.3333333334, ans=0.125 2024-09-25 20:25:23,679 INFO [train.py:1198] (0/4) Epoch 46, batch 1300, loss[loss=0.1903, ctc_loss=0.1216, cr_loss=0.3432, over 16562.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1204, cr_loss=0.3392, over 3360926.67 frames. ], batch size: 66, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:25:27,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=824231.3333333334, ans=0.125 2024-09-25 20:25:51,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=824278.0, ans=0.125 2024-09-25 20:26:13,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=824371.3333333334, ans=0.07 2024-09-25 20:26:14,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=824371.3333333334, ans=0.125 2024-09-25 20:26:39,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824418.0, ans=0.1 2024-09-25 20:26:48,557 INFO [train.py:1198] (0/4) Epoch 46, batch 1350, loss[loss=0.1766, ctc_loss=0.1127, cr_loss=0.3197, over 17305.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1206, cr_loss=0.3394, over 3359876.38 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:26:51,683 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.295e+02 1.386e+02 1.472e+02 1.767e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-25 20:27:06,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=824511.3333333334, ans=0.0 2024-09-25 20:27:24,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=824558.0, ans=0.2 2024-09-25 20:27:29,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-09-25 20:27:43,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=824604.6666666666, ans=0.025 2024-09-25 20:27:46,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=824604.6666666666, ans=0.2 2024-09-25 20:27:51,353 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2024-09-25 20:27:57,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=824651.3333333334, ans=22.5 2024-09-25 20:27:59,111 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2024-09-25 20:28:01,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=824651.3333333334, ans=0.125 2024-09-25 20:28:11,416 INFO [train.py:1198] (0/4) Epoch 46, batch 1400, loss[loss=0.1953, ctc_loss=0.127, cr_loss=0.3416, over 16937.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.338, over 3363676.10 frames. ], batch size: 58, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:28:31,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=824744.6666666666, ans=0.05 2024-09-25 20:28:37,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=824744.6666666666, ans=0.0 2024-09-25 20:28:39,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=22.5 2024-09-25 20:29:31,897 INFO [train.py:1198] (0/4) Epoch 46, batch 1450, loss[loss=0.1772, ctc_loss=0.1125, cr_loss=0.3235, over 17240.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.337, over 3361406.66 frames. ], batch size: 47, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:29:34,415 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=12.0 2024-09-25 20:29:35,108 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.196e+02 1.313e+02 1.376e+02 1.492e+02 1.838e+02, threshold=2.751e+02, percent-clipped=0.0 2024-09-25 20:30:15,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=12.0 2024-09-25 20:30:26,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=825071.3333333334, ans=0.2 2024-09-25 20:30:37,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=825118.0, ans=0.125 2024-09-25 20:30:37,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825118.0, ans=0.1 2024-09-25 20:30:54,860 INFO [train.py:1198] (0/4) Epoch 46, batch 1500, loss[loss=0.1763, ctc_loss=0.1117, cr_loss=0.323, over 17164.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1193, cr_loss=0.3357, over 3358906.13 frames. ], batch size: 45, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:31:17,453 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2024-09-25 20:31:26,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=825211.3333333334, ans=0.125 2024-09-25 20:31:43,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=825258.0, ans=0.0 2024-09-25 20:31:51,642 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=825304.6666666666, ans=0.125 2024-09-25 20:32:20,344 INFO [train.py:1198] (0/4) Epoch 46, batch 1550, loss[loss=0.2041, ctc_loss=0.1296, cr_loss=0.3725, over 16788.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.12, cr_loss=0.3372, over 3349241.41 frames. ], batch size: 61, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:32:23,492 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.290e+02 1.399e+02 1.518e+02 4.516e+02, threshold=2.798e+02, percent-clipped=1.0 2024-09-25 20:32:23,804 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=825398.0, ans=0.2 2024-09-25 20:32:43,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=825444.6666666666, ans=0.2 2024-09-25 20:33:43,586 INFO [train.py:1198] (0/4) Epoch 46, batch 1600, loss[loss=0.2059, ctc_loss=0.1301, cr_loss=0.3788, over 17344.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1198, cr_loss=0.3367, over 3357454.97 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:33:55,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=825631.3333333334, ans=0.125 2024-09-25 20:34:18,189 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2024-09-25 20:34:27,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=825724.6666666666, ans=0.0 2024-09-25 20:34:41,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=825771.3333333334, ans=0.5 2024-09-25 20:34:56,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=825818.0, ans=0.0 2024-09-25 20:35:04,110 INFO [train.py:1198] (0/4) Epoch 46, batch 1650, loss[loss=0.1922, ctc_loss=0.1234, cr_loss=0.3442, over 17154.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1196, cr_loss=0.336, over 3363138.07 frames. ], batch size: 45, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:35:07,356 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.199e+02 1.315e+02 1.405e+02 1.548e+02 2.408e+02, threshold=2.810e+02, percent-clipped=0.0 2024-09-25 20:35:09,713 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2024-09-25 20:35:26,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=825911.3333333334, ans=0.125 2024-09-25 20:35:33,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=825911.3333333334, ans=0.025 2024-09-25 20:35:42,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=825958.0, ans=0.125 2024-09-25 20:36:04,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=826004.6666666666, ans=0.125 2024-09-25 20:36:14,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=826051.3333333334, ans=0.0 2024-09-25 20:36:17,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=826051.3333333334, ans=0.025 2024-09-25 20:36:32,698 INFO [train.py:1198] (0/4) Epoch 46, batch 1700, loss[loss=0.2172, ctc_loss=0.1404, cr_loss=0.3843, over 17203.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1202, cr_loss=0.3374, over 3359446.15 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:36:43,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=826098.0, ans=0.025 2024-09-25 20:37:30,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=826238.0, ans=0.125 2024-09-25 20:37:52,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-09-25 20:37:55,295 INFO [train.py:1198] (0/4) Epoch 46, batch 1750, loss[loss=0.1799, ctc_loss=0.1126, cr_loss=0.3365, over 17289.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1196, cr_loss=0.3368, over 3357424.40 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:38:00,304 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.299e+02 1.380e+02 1.488e+02 2.427e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-25 20:38:08,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=826331.3333333334, ans=0.0 2024-09-25 20:38:15,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=826378.0, ans=0.125 2024-09-25 20:38:48,716 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=826471.3333333334, ans=0.125 2024-09-25 20:38:56,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=826471.3333333334, ans=0.09899494936611666 2024-09-25 20:39:08,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=12.0 2024-09-25 20:39:09,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=826518.0, ans=0.0 2024-09-25 20:39:15,275 INFO [train.py:1198] (0/4) Epoch 46, batch 1800, loss[loss=0.1559, ctc_loss=0.09836, cr_loss=0.2876, over 17366.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1203, cr_loss=0.3377, over 3344647.25 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:39:33,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-09-25 20:39:57,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=826658.0, ans=0.125 2024-09-25 20:40:20,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.08 vs. limit=15.0 2024-09-25 20:40:38,785 INFO [train.py:1198] (0/4) Epoch 46, batch 1850, loss[loss=0.1902, ctc_loss=0.121, cr_loss=0.3456, over 17239.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1201, cr_loss=0.3382, over 3346945.15 frames. ], batch size: 50, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:40:43,545 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.312e+02 1.384e+02 1.504e+02 2.324e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-25 20:40:48,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=826798.0, ans=0.0 2024-09-25 20:41:37,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=826938.0, ans=0.125 2024-09-25 20:41:43,874 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-09-25 20:41:56,344 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=826984.6666666666, ans=0.125 2024-09-25 20:42:03,935 INFO [train.py:1198] (0/4) Epoch 46, batch 1900, loss[loss=0.1771, ctc_loss=0.1121, cr_loss=0.3252, over 17309.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3357, over 3349540.26 frames. ], batch size: 49, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:42:07,449 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:42:10,564 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827031.3333333334, ans=0.1 2024-09-25 20:42:11,059 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=12.0 2024-09-25 20:42:29,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=827078.0, ans=0.125 2024-09-25 20:43:08,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=827171.3333333334, ans=0.125 2024-09-25 20:43:11,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=827218.0, ans=0.0 2024-09-25 20:43:17,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=827218.0, ans=0.2 2024-09-25 20:43:23,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=827218.0, ans=0.025 2024-09-25 20:43:26,923 INFO [train.py:1198] (0/4) Epoch 46, batch 1950, loss[loss=0.1839, ctc_loss=0.1172, cr_loss=0.3335, over 17003.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1197, cr_loss=0.337, over 3353096.53 frames. ], batch size: 56, lr: 2.58e-03, grad_scale: 8.0 2024-09-25 20:43:30,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827264.6666666666, ans=0.1 2024-09-25 20:43:31,731 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.295e+02 1.354e+02 1.509e+02 3.056e+02, threshold=2.707e+02, percent-clipped=1.0 2024-09-25 20:44:18,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=827404.6666666666, ans=0.125 2024-09-25 20:44:44,031 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=22.5 2024-09-25 20:44:46,533 INFO [train.py:1198] (0/4) Epoch 46, batch 2000, loss[loss=0.1801, ctc_loss=0.1137, cr_loss=0.3321, over 17012.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3349, over 3364463.23 frames. ], batch size: 44, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:44:51,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=827498.0, ans=0.0 2024-09-25 20:45:26,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=22.5 2024-09-25 20:45:28,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=827591.3333333334, ans=0.125 2024-09-25 20:45:29,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.20 vs. limit=15.0 2024-09-25 20:45:46,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=827638.0, ans=0.125 2024-09-25 20:45:52,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=827684.6666666666, ans=0.125 2024-09-25 20:45:55,535 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=8.0 2024-09-25 20:46:05,661 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.99 vs. limit=15.0 2024-09-25 20:46:11,125 INFO [train.py:1198] (0/4) Epoch 46, batch 2050, loss[loss=0.1891, ctc_loss=0.1201, cr_loss=0.345, over 17351.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3352, over 3364435.45 frames. ], batch size: 48, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:46:18,715 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.306e+02 1.402e+02 1.494e+02 2.074e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-25 20:46:59,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=827824.6666666666, ans=0.125 2024-09-25 20:47:02,434 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=827871.3333333334, ans=0.125 2024-09-25 20:47:02,451 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=827871.3333333334, ans=0.2 2024-09-25 20:47:13,760 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 20:47:34,394 INFO [train.py:1198] (0/4) Epoch 46, batch 2100, loss[loss=0.1962, ctc_loss=0.1265, cr_loss=0.3482, over 17306.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1194, cr_loss=0.3363, over 3355349.98 frames. ], batch size: 51, lr: 2.58e-03, grad_scale: 16.0 2024-09-25 20:47:35,177 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2024-09-25 20:47:45,808 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.64 vs. limit=10.0 2024-09-25 20:47:51,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=828011.3333333334, ans=0.0 2024-09-25 20:48:07,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828058.0, ans=0.1 2024-09-25 20:48:10,682 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-25 20:48:18,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=828058.0, ans=0.125 2024-09-25 20:48:21,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=828058.0, ans=0.07 2024-09-25 20:48:56,846 INFO [train.py:1198] (0/4) Epoch 46, batch 2150, loss[loss=0.2155, ctc_loss=0.1384, cr_loss=0.3855, over 17203.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1201, cr_loss=0.3376, over 3353482.55 frames. ], batch size: 47, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:49:00,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=828198.0, ans=0.125 2024-09-25 20:49:01,529 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.324e+02 1.399e+02 1.517e+02 1.900e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-25 20:49:01,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=828198.0, ans=0.2 2024-09-25 20:49:09,880 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=828198.0, ans=0.125 2024-09-25 20:49:09,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828198.0, ans=0.1 2024-09-25 20:49:24,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828244.6666666666, ans=0.125 2024-09-25 20:49:57,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828338.0, ans=0.1 2024-09-25 20:50:02,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=828384.6666666666, ans=0.125 2024-09-25 20:50:04,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=828384.6666666666, ans=0.2 2024-09-25 20:50:06,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=828384.6666666666, ans=0.95 2024-09-25 20:50:19,238 INFO [train.py:1198] (0/4) Epoch 46, batch 2200, loss[loss=0.1732, ctc_loss=0.1103, cr_loss=0.3146, over 17310.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.338, over 3353225.69 frames. ], batch size: 49, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:50:24,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=22.5 2024-09-25 20:51:04,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=828524.6666666666, ans=0.0 2024-09-25 20:51:18,540 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2024-09-25 20:51:39,432 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=15.0 2024-09-25 20:51:44,711 INFO [train.py:1198] (0/4) Epoch 46, batch 2250, loss[loss=0.1659, ctc_loss=0.105, cr_loss=0.3049, over 17015.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1193, cr_loss=0.3364, over 3357713.50 frames. ], batch size: 39, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:51:47,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2024-09-25 20:51:49,433 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.075e+02 1.294e+02 1.364e+02 1.489e+02 2.080e+02, threshold=2.728e+02, percent-clipped=0.0 2024-09-25 20:52:20,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=828758.0, ans=0.025 2024-09-25 20:52:25,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2024-09-25 20:52:30,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828758.0, ans=0.1 2024-09-25 20:52:57,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=828851.3333333334, ans=0.0 2024-09-25 20:53:07,429 INFO [train.py:1198] (0/4) Epoch 46, batch 2300, loss[loss=0.1536, ctc_loss=0.09145, cr_loss=0.311, over 17273.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3361, over 3362831.32 frames. ], batch size: 42, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:53:26,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=828944.6666666666, ans=0.125 2024-09-25 20:53:50,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=828991.3333333334, ans=0.125 2024-09-25 20:54:08,549 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=829038.0, ans=0.125 2024-09-25 20:54:21,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=829084.6666666666, ans=0.2 2024-09-25 20:54:25,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=829131.3333333334, ans=0.0 2024-09-25 20:54:27,183 INFO [train.py:1198] (0/4) Epoch 46, batch 2350, loss[loss=0.1885, ctc_loss=0.1235, cr_loss=0.3246, over 16048.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1195, cr_loss=0.3357, over 3356201.07 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 20:54:31,906 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.291e+02 1.368e+02 1.435e+02 1.902e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-25 20:54:48,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=829178.0, ans=0.125 2024-09-25 20:55:02,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=829224.6666666666, ans=0.125 2024-09-25 20:55:07,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=829224.6666666666, ans=0.2 2024-09-25 20:55:08,607 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=829224.6666666666, ans=0.07 2024-09-25 20:55:37,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=829318.0, ans=0.2 2024-09-25 20:55:44,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829318.0, ans=0.1 2024-09-25 20:55:50,420 INFO [train.py:1198] (0/4) Epoch 46, batch 2400, loss[loss=0.1836, ctc_loss=0.1154, cr_loss=0.341, over 17328.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1184, cr_loss=0.3336, over 3364253.82 frames. ], batch size: 48, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 20:56:07,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829364.6666666666, ans=0.1 2024-09-25 20:56:19,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=829411.3333333334, ans=0.02 2024-09-25 20:56:23,650 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-09-25 20:56:34,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=829458.0, ans=0.025 2024-09-25 20:56:47,296 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=829504.6666666666, ans=0.0 2024-09-25 20:57:15,352 INFO [train.py:1198] (0/4) Epoch 46, batch 2450, loss[loss=0.1953, ctc_loss=0.1268, cr_loss=0.3425, over 15807.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1195, cr_loss=0.3357, over 3357755.75 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 20:57:20,236 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.115e+02 1.311e+02 1.415e+02 1.511e+02 3.334e+02, threshold=2.830e+02, percent-clipped=1.0 2024-09-25 20:57:40,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=829644.6666666666, ans=0.1 2024-09-25 20:58:01,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=829691.3333333334, ans=0.125 2024-09-25 20:58:09,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=829738.0, ans=0.125 2024-09-25 20:58:14,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=829738.0, ans=0.2 2024-09-25 20:58:27,508 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.73 vs. limit=6.0 2024-09-25 20:58:38,038 INFO [train.py:1198] (0/4) Epoch 46, batch 2500, loss[loss=0.2094, ctc_loss=0.1345, cr_loss=0.3745, over 16989.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1195, cr_loss=0.3355, over 3349412.15 frames. ], batch size: 53, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 20:58:38,276 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=829831.3333333334, ans=0.07 2024-09-25 20:58:41,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=829831.3333333334, ans=0.125 2024-09-25 20:58:59,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.20 vs. limit=12.0 2024-09-25 20:59:02,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=829878.0, ans=0.125 2024-09-25 20:59:04,095 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=829878.0, ans=0.125 2024-09-25 20:59:11,947 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=829924.6666666666, ans=0.015 2024-09-25 20:59:21,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=829924.6666666666, ans=0.125 2024-09-25 20:59:21,973 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=829924.6666666666, ans=0.0 2024-09-25 20:59:23,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=829924.6666666666, ans=0.125 2024-09-25 20:59:51,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=830018.0, ans=0.2 2024-09-25 20:59:58,772 INFO [train.py:1198] (0/4) Epoch 46, batch 2550, loss[loss=0.2085, ctc_loss=0.1338, cr_loss=0.3735, over 17301.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1186, cr_loss=0.3341, over 3354096.50 frames. ], batch size: 49, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:00:00,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=830064.6666666666, ans=0.125 2024-09-25 21:00:06,164 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.070e+02 1.313e+02 1.390e+02 1.516e+02 1.832e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-25 21:00:12,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=830064.6666666666, ans=0.125 2024-09-25 21:00:35,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=830158.0, ans=0.0 2024-09-25 21:00:38,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=830158.0, ans=0.0 2024-09-25 21:00:57,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=830204.6666666666, ans=0.1 2024-09-25 21:01:26,912 INFO [train.py:1198] (0/4) Epoch 46, batch 2600, loss[loss=0.164, ctc_loss=0.1042, cr_loss=0.2989, over 16309.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1188, cr_loss=0.3343, over 3347746.90 frames. ], batch size: 36, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:01:27,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2024-09-25 21:02:03,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=830391.3333333334, ans=0.2 2024-09-25 21:02:34,979 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=830484.6666666666, ans=0.5 2024-09-25 21:02:50,771 INFO [train.py:1198] (0/4) Epoch 46, batch 2650, loss[loss=0.2214, ctc_loss=0.1438, cr_loss=0.3877, over 17043.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1195, cr_loss=0.336, over 3348775.50 frames. ], batch size: 52, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:02:55,451 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.287e+02 1.345e+02 1.480e+02 2.035e+02, threshold=2.690e+02, percent-clipped=0.0 2024-09-25 21:03:21,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830624.6666666666, ans=0.1 2024-09-25 21:03:21,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=830624.6666666666, ans=0.125 2024-09-25 21:03:35,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=830624.6666666666, ans=0.2 2024-09-25 21:04:02,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=830718.0, ans=0.125 2024-09-25 21:04:09,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830764.6666666666, ans=0.1 2024-09-25 21:04:10,491 INFO [train.py:1198] (0/4) Epoch 46, batch 2700, loss[loss=0.1426, ctc_loss=0.08708, cr_loss=0.2775, over 17279.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3365, over 3334745.27 frames. ], batch size: 42, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:04:13,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=830764.6666666666, ans=0.125 2024-09-25 21:04:34,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=830811.3333333334, ans=0.0 2024-09-25 21:04:37,914 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=22.5 2024-09-25 21:04:51,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=830858.0, ans=0.125 2024-09-25 21:04:55,605 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=12.0 2024-09-25 21:05:09,037 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=22.5 2024-09-25 21:05:30,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=830998.0, ans=0.0 2024-09-25 21:05:32,284 INFO [train.py:1198] (0/4) Epoch 46, batch 2750, loss[loss=0.1715, ctc_loss=0.109, cr_loss=0.3127, over 17010.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1189, cr_loss=0.3351, over 3336044.33 frames. ], batch size: 44, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:05:37,114 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.184e+02 1.313e+02 1.406e+02 1.516e+02 1.975e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 21:05:54,484 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=831044.6666666666, ans=0.0 2024-09-25 21:06:53,491 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:06:56,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=831231.3333333334, ans=0.025 2024-09-25 21:06:57,857 INFO [train.py:1198] (0/4) Epoch 46, batch 2800, loss[loss=0.1818, ctc_loss=0.1155, cr_loss=0.3314, over 17081.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3351, over 3342719.77 frames. ], batch size: 43, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:07:44,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=831324.6666666666, ans=0.025 2024-09-25 21:08:20,588 INFO [train.py:1198] (0/4) Epoch 46, batch 2850, loss[loss=0.1726, ctc_loss=0.1092, cr_loss=0.317, over 17300.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.119, cr_loss=0.3356, over 3341528.60 frames. ], batch size: 46, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:08:26,959 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.083e+02 1.275e+02 1.372e+02 1.492e+02 2.089e+02, threshold=2.744e+02, percent-clipped=0.0 2024-09-25 21:08:33,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=831464.6666666666, ans=0.125 2024-09-25 21:09:03,576 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-09-25 21:09:06,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=831558.0, ans=0.0 2024-09-25 21:09:15,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=831604.6666666666, ans=0.2 2024-09-25 21:09:30,380 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2024-09-25 21:09:34,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=831651.3333333334, ans=0.0 2024-09-25 21:09:40,972 INFO [train.py:1198] (0/4) Epoch 46, batch 2900, loss[loss=0.2071, ctc_loss=0.1337, cr_loss=0.3672, over 17158.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.3358, over 3351602.99 frames. ], batch size: 48, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:10:06,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=831744.6666666666, ans=0.125 2024-09-25 21:10:22,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=831791.3333333334, ans=0.1 2024-09-25 21:10:41,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=831838.0, ans=0.125 2024-09-25 21:10:51,285 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2024-09-25 21:11:08,574 INFO [train.py:1198] (0/4) Epoch 46, batch 2950, loss[loss=0.1896, ctc_loss=0.1209, cr_loss=0.3437, over 17163.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1194, cr_loss=0.3359, over 3344427.64 frames. ], batch size: 45, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:11:09,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2024-09-25 21:11:10,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=831931.3333333334, ans=0.125 2024-09-25 21:11:16,446 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.273e+02 1.341e+02 1.442e+02 4.466e+02, threshold=2.682e+02, percent-clipped=1.0 2024-09-25 21:11:18,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=831931.3333333334, ans=0.125 2024-09-25 21:11:34,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=831978.0, ans=0.125 2024-09-25 21:11:43,740 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=832024.6666666666, ans=0.95 2024-09-25 21:11:50,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832024.6666666666, ans=0.1 2024-09-25 21:11:50,934 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=22.5 2024-09-25 21:12:17,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=832118.0, ans=0.2 2024-09-25 21:12:23,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832118.0, ans=0.1 2024-09-25 21:12:27,906 INFO [train.py:1198] (0/4) Epoch 46, batch 3000, loss[loss=0.2068, ctc_loss=0.1365, cr_loss=0.3517, over 17304.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1196, cr_loss=0.3365, over 3356896.46 frames. ], batch size: 49, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:12:27,907 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 21:12:43,455 INFO [train.py:1230] (0/4) Epoch 46, validation: loss=0.03583, ctc_loss=0.03583, cr_loss=1.006e-14, over 944034.00 frames. 2024-09-25 21:12:43,455 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 21:12:51,867 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=15.0 2024-09-25 21:13:09,730 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2024-09-25 21:13:27,786 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=832258.0, ans=0.125 2024-09-25 21:13:32,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=832304.6666666666, ans=0.0 2024-09-25 21:13:57,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=832351.3333333334, ans=0.125 2024-09-25 21:14:01,695 INFO [train.py:1198] (0/4) Epoch 46, batch 3050, loss[loss=0.2142, ctc_loss=0.1402, cr_loss=0.3702, over 16931.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3353, over 3366270.91 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:14:09,458 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.315e+02 1.415e+02 1.524e+02 2.421e+02, threshold=2.829e+02, percent-clipped=0.0 2024-09-25 21:14:30,607 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=22.5 2024-09-25 21:14:35,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-25 21:15:01,658 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=832538.0, ans=0.05 2024-09-25 21:15:18,025 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2024-09-25 21:15:20,313 INFO [train.py:1198] (0/4) Epoch 46, batch 3100, loss[loss=0.1653, ctc_loss=0.1028, cr_loss=0.3126, over 17192.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3357, over 3371722.34 frames. ], batch size: 41, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:15:42,411 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:16:10,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=832771.3333333334, ans=0.2 2024-09-25 21:16:11,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=832771.3333333334, ans=0.0 2024-09-25 21:16:20,428 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:16:25,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=832818.0, ans=0.125 2024-09-25 21:16:26,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=832818.0, ans=0.125 2024-09-25 21:16:31,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=832818.0, ans=0.95 2024-09-25 21:16:32,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=832818.0, ans=0.0 2024-09-25 21:16:34,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=832818.0, ans=0.07 2024-09-25 21:16:38,988 INFO [train.py:1198] (0/4) Epoch 46, batch 3150, loss[loss=0.1752, ctc_loss=0.1104, cr_loss=0.3239, over 17025.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1191, cr_loss=0.3354, over 3364090.13 frames. ], batch size: 44, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:16:46,804 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.123e+02 1.285e+02 1.343e+02 1.430e+02 2.452e+02, threshold=2.687e+02, percent-clipped=0.0 2024-09-25 21:16:50,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=832864.6666666666, ans=0.0 2024-09-25 21:17:09,156 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=832958.0, ans=10.0 2024-09-25 21:17:35,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=833004.6666666666, ans=0.125 2024-09-25 21:17:59,372 INFO [train.py:1198] (0/4) Epoch 46, batch 3200, loss[loss=0.2051, ctc_loss=0.135, cr_loss=0.3505, over 17044.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1196, cr_loss=0.337, over 3357213.52 frames. ], batch size: 56, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:18:39,112 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-09-25 21:19:11,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=833284.6666666666, ans=0.5 2024-09-25 21:19:14,570 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=833284.6666666666, ans=0.125 2024-09-25 21:19:17,333 INFO [train.py:1198] (0/4) Epoch 46, batch 3250, loss[loss=0.2142, ctc_loss=0.1395, cr_loss=0.3739, over 17151.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1203, cr_loss=0.3384, over 3356266.49 frames. ], batch size: 48, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:19:25,151 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.305e+02 1.369e+02 1.483e+02 2.240e+02, threshold=2.739e+02, percent-clipped=0.0 2024-09-25 21:19:49,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=833424.6666666666, ans=0.0 2024-09-25 21:19:55,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=833424.6666666666, ans=0.2 2024-09-25 21:19:59,046 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-25 21:20:14,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=833471.3333333334, ans=10.0 2024-09-25 21:20:16,022 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=833471.3333333334, ans=0.125 2024-09-25 21:20:16,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=833471.3333333334, ans=0.0 2024-09-25 21:20:40,809 INFO [train.py:1198] (0/4) Epoch 46, batch 3300, loss[loss=0.1738, ctc_loss=0.1125, cr_loss=0.3063, over 16937.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1197, cr_loss=0.337, over 3348786.37 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:21:12,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833658.0, ans=0.1 2024-09-25 21:21:55,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2024-09-25 21:21:58,830 INFO [train.py:1198] (0/4) Epoch 46, batch 3350, loss[loss=0.1707, ctc_loss=0.1077, cr_loss=0.315, over 17049.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1202, cr_loss=0.3371, over 3347767.08 frames. ], batch size: 39, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:22:06,617 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.301e+02 1.404e+02 1.465e+02 2.030e+02, threshold=2.807e+02, percent-clipped=0.0 2024-09-25 21:22:27,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=833844.6666666666, ans=0.125 2024-09-25 21:22:38,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=833891.3333333334, ans=0.2 2024-09-25 21:22:44,580 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=22.5 2024-09-25 21:22:44,778 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2024-09-25 21:22:47,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=833938.0, ans=0.2 2024-09-25 21:22:50,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=833938.0, ans=0.125 2024-09-25 21:23:16,306 INFO [train.py:1198] (0/4) Epoch 46, batch 3400, loss[loss=0.2048, ctc_loss=0.1337, cr_loss=0.3554, over 14835.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1205, cr_loss=0.3375, over 3337170.21 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 32.0 2024-09-25 21:23:35,404 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=834078.0, ans=0.125 2024-09-25 21:24:30,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=834218.0, ans=0.0 2024-09-25 21:24:36,365 INFO [train.py:1198] (0/4) Epoch 46, batch 3450, loss[loss=0.2283, ctc_loss=0.1544, cr_loss=0.3697, over 11613.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1209, cr_loss=0.3386, over 3315737.05 frames. ], batch size: 123, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:24:45,574 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.317e+02 1.418e+02 1.501e+02 3.351e+02, threshold=2.836e+02, percent-clipped=1.0 2024-09-25 21:25:18,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=834358.0, ans=0.125 2024-09-25 21:25:40,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=834451.3333333334, ans=0.2 2024-09-25 21:25:42,496 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-09-25 21:25:54,710 INFO [train.py:1198] (0/4) Epoch 46, batch 3500, loss[loss=0.2248, ctc_loss=0.1463, cr_loss=0.3922, over 14864.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1207, cr_loss=0.3385, over 3316564.53 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 16.0 2024-09-25 21:26:05,268 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-09-25 21:26:21,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=834544.6666666666, ans=0.125 2024-09-25 21:26:49,851 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-09-25 21:26:52,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=834638.0, ans=0.2 2024-09-25 21:26:55,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=834684.6666666666, ans=0.2 2024-09-25 21:27:06,864 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=834684.6666666666, ans=0.125 2024-09-25 21:27:12,619 INFO [train.py:1198] (0/4) Epoch 46, batch 3550, loss[loss=0.172, ctc_loss=0.1087, cr_loss=0.3162, over 17097.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1193, cr_loss=0.3361, over 3333967.52 frames. ], batch size: 40, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:27:21,878 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.283e+02 1.360e+02 1.445e+02 2.258e+02, threshold=2.720e+02, percent-clipped=0.0 2024-09-25 21:27:26,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=834778.0, ans=0.125 2024-09-25 21:27:36,548 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=834778.0, ans=0.09899494936611666 2024-09-25 21:28:32,454 INFO [train.py:1198] (0/4) Epoch 46, batch 3600, loss[loss=0.18, ctc_loss=0.1162, cr_loss=0.3189, over 16782.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1196, cr_loss=0.3371, over 3347447.80 frames. ], batch size: 61, lr: 2.56e-03, grad_scale: 32.0 2024-09-25 21:28:42,455 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2024-09-25 21:28:49,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=835011.3333333334, ans=0.125 2024-09-25 21:29:02,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=835058.0, ans=0.2 2024-09-25 21:29:49,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835151.3333333334, ans=0.1 2024-09-25 21:29:55,384 INFO [train.py:1198] (0/4) Epoch 46, batch 3650, loss[loss=0.2185, ctc_loss=0.1416, cr_loss=0.3844, over 16881.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1192, cr_loss=0.336, over 3357546.55 frames. ], batch size: 58, lr: 2.56e-03, grad_scale: 32.0 2024-09-25 21:30:04,585 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.303e+02 1.373e+02 1.458e+02 2.085e+02, threshold=2.745e+02, percent-clipped=0.0 2024-09-25 21:30:36,227 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=835291.3333333334, ans=0.1 2024-09-25 21:30:37,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=835291.3333333334, ans=0.125 2024-09-25 21:30:46,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=835338.0, ans=0.2 2024-09-25 21:31:02,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=835384.6666666666, ans=0.025 2024-09-25 21:31:08,427 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=835384.6666666666, ans=0.125 2024-09-25 21:31:14,428 INFO [train.py:1198] (0/4) Epoch 46, batch 3700, loss[loss=0.1474, ctc_loss=0.08902, cr_loss=0.2917, over 17099.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.119, cr_loss=0.336, over 3364500.29 frames. ], batch size: 43, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:31:19,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=835431.3333333334, ans=0.0 2024-09-25 21:31:35,725 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=12.0 2024-09-25 21:31:36,928 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:32:04,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=835571.3333333334, ans=0.2 2024-09-25 21:32:26,418 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=835618.0, ans=0.025 2024-09-25 21:32:32,251 INFO [train.py:1198] (0/4) Epoch 46, batch 3750, loss[loss=0.2293, ctc_loss=0.1515, cr_loss=0.3889, over 14931.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1192, cr_loss=0.3357, over 3359138.94 frames. ], batch size: 89, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:32:43,197 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.138e+02 1.294e+02 1.384e+02 1.512e+02 2.088e+02, threshold=2.767e+02, percent-clipped=0.0 2024-09-25 21:32:55,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=835711.3333333334, ans=0.1 2024-09-25 21:32:57,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=835711.3333333334, ans=0.0 2024-09-25 21:32:58,968 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=835711.3333333334, ans=0.1 2024-09-25 21:33:44,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=835851.3333333334, ans=0.125 2024-09-25 21:33:49,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=835898.0, ans=0.125 2024-09-25 21:33:50,865 INFO [train.py:1198] (0/4) Epoch 46, batch 3800, loss[loss=0.1997, ctc_loss=0.1295, cr_loss=0.3509, over 17276.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1194, cr_loss=0.3361, over 3341632.41 frames. ], batch size: 51, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:34:11,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=835944.6666666666, ans=0.125 2024-09-25 21:35:04,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=836084.6666666666, ans=0.0 2024-09-25 21:35:04,924 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2024-09-25 21:35:08,994 INFO [train.py:1198] (0/4) Epoch 46, batch 3850, loss[loss=0.2327, ctc_loss=0.1601, cr_loss=0.3628, over 11443.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1192, cr_loss=0.3352, over 3313399.06 frames. ], batch size: 124, lr: 2.56e-03, grad_scale: 16.0 2024-09-25 21:35:19,791 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.194e+02 1.341e+02 1.442e+02 1.559e+02 2.635e+02, threshold=2.885e+02, percent-clipped=0.0 2024-09-25 21:35:20,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=836131.3333333334, ans=0.0 2024-09-25 21:35:35,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=22.5 2024-09-25 21:35:51,048 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=12.0 2024-09-25 21:36:19,121 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-46.pt 2024-09-25 21:37:05,583 INFO [train.py:1198] (0/4) Epoch 47, batch 0, loss[loss=0.1643, ctc_loss=0.1039, cr_loss=0.302, over 17036.00 frames. ], tot_loss[loss=0.1643, ctc_loss=0.1039, cr_loss=0.302, over 17036.00 frames. ], batch size: 39, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:37:05,584 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 21:37:17,639 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1930, 4.9917, 4.2675, 4.9490], device='cuda:0') 2024-09-25 21:37:22,174 INFO [train.py:1230] (0/4) Epoch 47, validation: loss=0.03509, ctc_loss=0.03509, cr_loss=1.062e-14, over 944034.00 frames. 2024-09-25 21:37:22,175 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 21:37:25,611 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=836346.0, ans=0.09899494936611666 2024-09-25 21:37:25,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=836346.0, ans=0.125 2024-09-25 21:37:33,724 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=836346.0, ans=0.0 2024-09-25 21:37:37,547 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2024-09-25 21:37:38,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=836392.6666666666, ans=0.125 2024-09-25 21:37:46,778 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=836392.6666666666, ans=0.05 2024-09-25 21:38:07,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=836439.3333333334, ans=0.0 2024-09-25 21:38:32,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=836532.6666666666, ans=0.0 2024-09-25 21:38:44,944 INFO [train.py:1198] (0/4) Epoch 47, batch 50, loss[loss=0.2264, ctc_loss=0.148, cr_loss=0.3921, over 14888.00 frames. ], tot_loss[loss=0.1899, ctc_loss=0.1217, cr_loss=0.3412, over 745026.07 frames. ], batch size: 89, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:38:46,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=836579.3333333334, ans=0.125 2024-09-25 21:38:53,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=836579.3333333334, ans=0.0 2024-09-25 21:38:56,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=836579.3333333334, ans=0.0 2024-09-25 21:39:01,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=836626.0, ans=0.125 2024-09-25 21:39:02,615 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.187e+02 1.320e+02 1.514e+02 1.646e+02 2.881e+02, threshold=3.028e+02, percent-clipped=0.0 2024-09-25 21:39:06,733 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=22.5 2024-09-25 21:39:11,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=836626.0, ans=15.0 2024-09-25 21:39:26,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=836672.6666666666, ans=0.125 2024-09-25 21:40:05,069 INFO [train.py:1198] (0/4) Epoch 47, batch 100, loss[loss=0.202, ctc_loss=0.1281, cr_loss=0.3696, over 17228.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1208, cr_loss=0.3375, over 1318230.81 frames. ], batch size: 50, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:40:10,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=836812.6666666666, ans=0.0 2024-09-25 21:40:14,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=836812.6666666666, ans=0.125 2024-09-25 21:40:23,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-09-25 21:40:32,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=836859.3333333334, ans=0.0 2024-09-25 21:40:35,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=836906.0, ans=0.125 2024-09-25 21:40:44,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836906.0, ans=0.1 2024-09-25 21:41:26,869 INFO [train.py:1198] (0/4) Epoch 47, batch 150, loss[loss=0.206, ctc_loss=0.1332, cr_loss=0.3638, over 17233.00 frames. ], tot_loss[loss=0.1891, ctc_loss=0.1213, cr_loss=0.339, over 1774373.43 frames. ], batch size: 55, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:41:30,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=837046.0, ans=0.125 2024-09-25 21:41:44,265 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.065e+02 1.291e+02 1.353e+02 1.432e+02 2.053e+02, threshold=2.706e+02, percent-clipped=0.0 2024-09-25 21:42:05,610 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:42:27,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=837186.0, ans=0.2 2024-09-25 21:42:53,061 INFO [train.py:1198] (0/4) Epoch 47, batch 200, loss[loss=0.1821, ctc_loss=0.1158, cr_loss=0.3315, over 17102.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1209, cr_loss=0.3377, over 2113265.27 frames. ], batch size: 43, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:42:55,888 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=15.0 2024-09-25 21:43:06,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=837279.3333333334, ans=0.04949747468305833 2024-09-25 21:43:07,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=837326.0, ans=0.2 2024-09-25 21:43:12,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=837326.0, ans=0.09899494936611666 2024-09-25 21:43:32,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837372.6666666666, ans=0.1 2024-09-25 21:43:43,278 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-09-25 21:43:49,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2024-09-25 21:43:52,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=837419.3333333334, ans=0.125 2024-09-25 21:44:00,254 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2024-09-25 21:44:03,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=837466.0, ans=0.1 2024-09-25 21:44:12,862 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2024-09-25 21:44:13,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837512.6666666666, ans=0.1 2024-09-25 21:44:14,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=837512.6666666666, ans=0.025 2024-09-25 21:44:15,404 INFO [train.py:1198] (0/4) Epoch 47, batch 250, loss[loss=0.1494, ctc_loss=0.0932, cr_loss=0.2809, over 17013.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1208, cr_loss=0.3375, over 2381197.02 frames. ], batch size: 39, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:44:20,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=837512.6666666666, ans=0.0 2024-09-25 21:44:22,209 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=837512.6666666666, ans=0.125 2024-09-25 21:44:32,924 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.332e+02 1.402e+02 1.492e+02 3.444e+02, threshold=2.804e+02, percent-clipped=1.0 2024-09-25 21:45:07,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=837652.6666666666, ans=0.0 2024-09-25 21:45:31,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=837699.3333333334, ans=0.125 2024-09-25 21:45:35,457 INFO [train.py:1198] (0/4) Epoch 47, batch 300, loss[loss=0.1904, ctc_loss=0.1261, cr_loss=0.3213, over 17010.00 frames. ], tot_loss[loss=0.1889, ctc_loss=0.1211, cr_loss=0.3389, over 2590391.80 frames. ], batch size: 53, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:45:57,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=837792.6666666666, ans=0.04949747468305833 2024-09-25 21:46:04,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=837792.6666666666, ans=10.0 2024-09-25 21:46:11,229 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.90 vs. limit=10.0 2024-09-25 21:46:20,175 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=837839.3333333334, ans=0.0 2024-09-25 21:46:34,979 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=22.5 2024-09-25 21:46:39,323 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=837886.0, ans=0.0 2024-09-25 21:46:45,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=837932.6666666666, ans=0.0 2024-09-25 21:47:01,261 INFO [train.py:1198] (0/4) Epoch 47, batch 350, loss[loss=0.1638, ctc_loss=0.1047, cr_loss=0.2955, over 17285.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1205, cr_loss=0.3385, over 2765906.62 frames. ], batch size: 46, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:47:09,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=837979.3333333334, ans=0.0 2024-09-25 21:47:15,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=838026.0, ans=0.1 2024-09-25 21:47:16,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=838026.0, ans=0.1 2024-09-25 21:47:16,105 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=838026.0, ans=0.1 2024-09-25 21:47:17,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=838026.0, ans=0.0 2024-09-25 21:47:18,802 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.331e+02 1.404e+02 1.493e+02 2.320e+02, threshold=2.808e+02, percent-clipped=0.0 2024-09-25 21:47:33,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=838026.0, ans=0.125 2024-09-25 21:47:39,757 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=838072.6666666666, ans=0.125 2024-09-25 21:47:54,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=838119.3333333334, ans=0.0 2024-09-25 21:48:08,675 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=838166.0, ans=0.125 2024-09-25 21:48:16,745 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=838166.0, ans=0.025 2024-09-25 21:48:24,409 INFO [train.py:1198] (0/4) Epoch 47, batch 400, loss[loss=0.183, ctc_loss=0.1147, cr_loss=0.3418, over 17077.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1209, cr_loss=0.3387, over 2891635.56 frames. ], batch size: 46, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:48:35,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=838212.6666666666, ans=0.05 2024-09-25 21:48:38,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=838212.6666666666, ans=0.0 2024-09-25 21:49:00,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=838306.0, ans=0.025 2024-09-25 21:49:04,055 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=838306.0, ans=0.2 2024-09-25 21:49:12,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=838306.0, ans=0.125 2024-09-25 21:49:29,969 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=838399.3333333334, ans=0.0 2024-09-25 21:49:41,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=10.0 2024-09-25 21:49:42,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=838399.3333333334, ans=0.09899494936611666 2024-09-25 21:49:46,382 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2024-09-25 21:49:47,322 INFO [train.py:1198] (0/4) Epoch 47, batch 450, loss[loss=0.2088, ctc_loss=0.135, cr_loss=0.3687, over 17047.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1199, cr_loss=0.3366, over 3001231.16 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:49:57,461 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=838446.0, ans=0.125 2024-09-25 21:50:04,976 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.297e+02 1.382e+02 1.499e+02 1.706e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-25 21:50:10,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=838492.6666666666, ans=0.2 2024-09-25 21:50:34,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=838586.0, ans=0.0 2024-09-25 21:50:50,221 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=22.5 2024-09-25 21:51:01,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2024-09-25 21:51:08,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=838679.3333333334, ans=0.125 2024-09-25 21:51:09,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-25 21:51:09,782 INFO [train.py:1198] (0/4) Epoch 47, batch 500, loss[loss=0.196, ctc_loss=0.1256, cr_loss=0.3522, over 16787.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3381, over 3076997.53 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:51:11,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=838679.3333333334, ans=0.125 2024-09-25 21:51:17,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=838679.3333333334, ans=0.07 2024-09-25 21:51:44,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=838772.6666666666, ans=0.0 2024-09-25 21:52:18,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=838866.0, ans=0.1 2024-09-25 21:52:35,638 INFO [train.py:1198] (0/4) Epoch 47, batch 550, loss[loss=0.1565, ctc_loss=0.1002, cr_loss=0.2816, over 17158.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1201, cr_loss=0.3376, over 3129408.85 frames. ], batch size: 41, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:52:42,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=838912.6666666666, ans=0.125 2024-09-25 21:52:47,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=838912.6666666666, ans=0.125 2024-09-25 21:52:53,167 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.330e+02 1.398e+02 1.505e+02 2.211e+02, threshold=2.797e+02, percent-clipped=0.0 2024-09-25 21:52:53,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=838959.3333333334, ans=0.025 2024-09-25 21:53:02,997 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=838959.3333333334, ans=0.125 2024-09-25 21:53:33,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=839052.6666666666, ans=0.025 2024-09-25 21:53:37,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=839052.6666666666, ans=0.0 2024-09-25 21:53:44,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=839099.3333333334, ans=0.0 2024-09-25 21:53:49,201 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=839099.3333333334, ans=0.0 2024-09-25 21:53:58,579 INFO [train.py:1198] (0/4) Epoch 47, batch 600, loss[loss=0.1667, ctc_loss=0.1055, cr_loss=0.3063, over 17343.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3377, over 3174200.51 frames. ], batch size: 48, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:54:06,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839146.0, ans=0.1 2024-09-25 21:54:08,428 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:54:52,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=839286.0, ans=0.125 2024-09-25 21:54:59,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839286.0, ans=0.1 2024-09-25 21:55:17,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=839379.3333333334, ans=0.0 2024-09-25 21:55:18,522 INFO [train.py:1198] (0/4) Epoch 47, batch 650, loss[loss=0.2122, ctc_loss=0.1349, cr_loss=0.3869, over 17009.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3382, over 3222196.83 frames. ], batch size: 53, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:55:26,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=839379.3333333334, ans=0.125 2024-09-25 21:55:36,235 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.288e+02 1.367e+02 1.485e+02 1.946e+02, threshold=2.733e+02, percent-clipped=0.0 2024-09-25 21:56:08,714 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2024-09-25 21:56:32,225 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839566.0, ans=0.1 2024-09-25 21:56:41,551 INFO [train.py:1198] (0/4) Epoch 47, batch 700, loss[loss=0.1991, ctc_loss=0.1255, cr_loss=0.368, over 17155.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1205, cr_loss=0.3389, over 3246509.07 frames. ], batch size: 45, lr: 2.53e-03, grad_scale: 16.0 2024-09-25 21:57:23,906 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=839706.0, ans=0.2 2024-09-25 21:57:23,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=839706.0, ans=0.1 2024-09-25 21:58:00,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=839799.3333333334, ans=0.125 2024-09-25 21:58:06,324 INFO [train.py:1198] (0/4) Epoch 47, batch 750, loss[loss=0.2027, ctc_loss=0.1294, cr_loss=0.3664, over 17024.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1206, cr_loss=0.3386, over 3275646.37 frames. ], batch size: 52, lr: 2.53e-03, grad_scale: 16.0 2024-09-25 21:58:25,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=839892.6666666666, ans=0.07 2024-09-25 21:58:27,884 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.312e+02 1.373e+02 1.486e+02 2.253e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-25 21:58:37,975 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=839892.6666666666, ans=0.5 2024-09-25 21:58:55,997 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=22.5 2024-09-25 21:59:00,555 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-180000.pt 2024-09-25 21:59:17,597 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=840032.6666666666, ans=0.125 2024-09-25 21:59:19,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=840032.6666666666, ans=0.125 2024-09-25 21:59:19,267 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 21:59:28,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=840032.6666666666, ans=0.125 2024-09-25 21:59:29,133 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=22.5 2024-09-25 21:59:31,442 INFO [train.py:1198] (0/4) Epoch 47, batch 800, loss[loss=0.1992, ctc_loss=0.1276, cr_loss=0.358, over 17156.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1208, cr_loss=0.3384, over 3297390.39 frames. ], batch size: 45, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 21:59:42,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=840079.3333333334, ans=0.035 2024-09-25 22:00:21,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=840219.3333333334, ans=0.125 2024-09-25 22:00:52,988 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:00:54,192 INFO [train.py:1198] (0/4) Epoch 47, batch 850, loss[loss=0.1687, ctc_loss=0.1064, cr_loss=0.3115, over 16975.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1201, cr_loss=0.3373, over 3305988.81 frames. ], batch size: 42, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:00:59,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840312.6666666666, ans=0.1 2024-09-25 22:01:02,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=840312.6666666666, ans=0.125 2024-09-25 22:01:13,304 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.171e+02 1.297e+02 1.378e+02 1.477e+02 2.622e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 22:01:30,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=840406.0, ans=0.0 2024-09-25 22:01:37,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=840406.0, ans=0.0 2024-09-25 22:01:44,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=22.5 2024-09-25 22:01:59,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=840499.3333333334, ans=0.2 2024-09-25 22:01:59,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=840499.3333333334, ans=0.0 2024-09-25 22:02:10,226 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2024-09-25 22:02:19,133 INFO [train.py:1198] (0/4) Epoch 47, batch 900, loss[loss=0.1816, ctc_loss=0.118, cr_loss=0.3184, over 17364.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3376, over 3310257.54 frames. ], batch size: 48, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:02:29,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-25 22:02:33,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=840592.6666666666, ans=0.125 2024-09-25 22:02:33,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=840592.6666666666, ans=0.125 2024-09-25 22:02:49,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=840639.3333333334, ans=0.125 2024-09-25 22:03:39,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840779.3333333334, ans=0.1 2024-09-25 22:03:41,096 INFO [train.py:1198] (0/4) Epoch 47, batch 950, loss[loss=0.2047, ctc_loss=0.1327, cr_loss=0.36, over 17169.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1197, cr_loss=0.3364, over 3329301.78 frames. ], batch size: 45, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:03:41,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=840779.3333333334, ans=0.0 2024-09-25 22:03:45,376 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2024-09-25 22:03:51,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=840779.3333333334, ans=0.0 2024-09-25 22:03:53,383 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=22.5 2024-09-25 22:03:55,842 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=840826.0, ans=0.125 2024-09-25 22:04:00,132 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.146e+02 1.323e+02 1.399e+02 1.542e+02 2.460e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-25 22:04:07,625 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=22.5 2024-09-25 22:04:11,586 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=840872.6666666666, ans=0.2 2024-09-25 22:04:20,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=840872.6666666666, ans=0.09899494936611666 2024-09-25 22:04:40,112 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:04:46,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=840966.0, ans=0.125 2024-09-25 22:04:50,217 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-09-25 22:04:51,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=840966.0, ans=0.1 2024-09-25 22:04:56,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=840966.0, ans=0.125 2024-09-25 22:04:57,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=840966.0, ans=0.0 2024-09-25 22:05:00,653 INFO [train.py:1198] (0/4) Epoch 47, batch 1000, loss[loss=0.2423, ctc_loss=0.162, cr_loss=0.4017, over 16607.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1196, cr_loss=0.3361, over 3335170.01 frames. ], batch size: 66, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:06:23,152 INFO [train.py:1198] (0/4) Epoch 47, batch 1050, loss[loss=0.1811, ctc_loss=0.1148, cr_loss=0.3311, over 17221.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1197, cr_loss=0.3364, over 3345774.50 frames. ], batch size: 47, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:06:26,752 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=841246.0, ans=0.125 2024-09-25 22:06:42,388 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.129e+02 1.322e+02 1.409e+02 1.520e+02 2.848e+02, threshold=2.818e+02, percent-clipped=1.0 2024-09-25 22:06:47,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=841292.6666666666, ans=0.0 2024-09-25 22:06:49,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=841292.6666666666, ans=0.2 2024-09-25 22:06:51,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=841292.6666666666, ans=10.0 2024-09-25 22:07:40,520 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.10 vs. limit=22.5 2024-09-25 22:07:47,936 INFO [train.py:1198] (0/4) Epoch 47, batch 1100, loss[loss=0.2048, ctc_loss=0.1317, cr_loss=0.3655, over 16987.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.3359, over 3353194.33 frames. ], batch size: 53, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:07:48,149 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=841479.3333333334, ans=0.025 2024-09-25 22:07:54,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=841479.3333333334, ans=0.0 2024-09-25 22:08:18,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=841526.0, ans=0.125 2024-09-25 22:08:22,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=12.0 2024-09-25 22:08:42,278 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=841619.3333333334, ans=0.125 2024-09-25 22:08:57,049 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2024-09-25 22:09:09,206 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=841712.6666666666, ans=0.09899494936611666 2024-09-25 22:09:10,494 INFO [train.py:1198] (0/4) Epoch 47, batch 1150, loss[loss=0.2211, ctc_loss=0.143, cr_loss=0.3908, over 16698.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1182, cr_loss=0.3338, over 3357131.18 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:09:29,554 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.082e+02 1.291e+02 1.365e+02 1.481e+02 2.112e+02, threshold=2.730e+02, percent-clipped=0.0 2024-09-25 22:09:30,568 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2024-09-25 22:10:13,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=841899.3333333334, ans=0.2 2024-09-25 22:10:16,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=841899.3333333334, ans=10.0 2024-09-25 22:10:19,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=841899.3333333334, ans=0.0 2024-09-25 22:10:32,844 INFO [train.py:1198] (0/4) Epoch 47, batch 1200, loss[loss=0.2069, ctc_loss=0.1347, cr_loss=0.3612, over 17229.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1184, cr_loss=0.3341, over 3358008.35 frames. ], batch size: 55, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:10:42,685 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=841946.0, ans=0.125 2024-09-25 22:10:43,317 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2024-09-25 22:11:18,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=842039.3333333334, ans=0.125 2024-09-25 22:11:44,407 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2024-09-25 22:11:54,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842179.3333333334, ans=0.1 2024-09-25 22:11:55,785 INFO [train.py:1198] (0/4) Epoch 47, batch 1250, loss[loss=0.2065, ctc_loss=0.1399, cr_loss=0.3335, over 11720.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1183, cr_loss=0.3343, over 3358332.25 frames. ], batch size: 123, lr: 2.53e-03, grad_scale: 32.0 2024-09-25 22:12:17,777 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.299e+02 1.369e+02 1.454e+02 1.923e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-25 22:12:40,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=842272.6666666666, ans=0.125 2024-09-25 22:12:56,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=842319.3333333334, ans=0.5 2024-09-25 22:13:20,998 INFO [train.py:1198] (0/4) Epoch 47, batch 1300, loss[loss=0.1991, ctc_loss=0.1263, cr_loss=0.3638, over 15755.00 frames. ], tot_loss[loss=0.1835, ctc_loss=0.1172, cr_loss=0.3317, over 3361433.94 frames. ], batch size: 74, lr: 2.53e-03, grad_scale: 16.0 2024-09-25 22:13:47,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=842459.3333333334, ans=0.125 2024-09-25 22:13:50,391 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=842459.3333333334, ans=0.0 2024-09-25 22:14:40,808 INFO [train.py:1198] (0/4) Epoch 47, batch 1350, loss[loss=0.2053, ctc_loss=0.1325, cr_loss=0.3638, over 17309.00 frames. ], tot_loss[loss=0.1844, ctc_loss=0.1179, cr_loss=0.3327, over 3358716.07 frames. ], batch size: 46, lr: 2.53e-03, grad_scale: 16.0 2024-09-25 22:14:42,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=842646.0, ans=0.0 2024-09-25 22:14:58,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=842692.6666666666, ans=0.0 2024-09-25 22:15:01,642 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.184e+02 1.286e+02 1.352e+02 1.458e+02 2.037e+02, threshold=2.703e+02, percent-clipped=0.0 2024-09-25 22:15:10,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842692.6666666666, ans=0.1 2024-09-25 22:15:10,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=842692.6666666666, ans=0.0 2024-09-25 22:15:13,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=842739.3333333334, ans=0.0 2024-09-25 22:15:14,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=842739.3333333334, ans=0.125 2024-09-25 22:15:16,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=842739.3333333334, ans=0.125 2024-09-25 22:15:21,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=842739.3333333334, ans=0.0 2024-09-25 22:15:47,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=842832.6666666666, ans=0.125 2024-09-25 22:16:03,149 INFO [train.py:1198] (0/4) Epoch 47, batch 1400, loss[loss=0.1831, ctc_loss=0.12, cr_loss=0.3155, over 16922.00 frames. ], tot_loss[loss=0.1837, ctc_loss=0.1174, cr_loss=0.3315, over 3357095.45 frames. ], batch size: 42, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:16:08,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=842879.3333333334, ans=0.0 2024-09-25 22:16:19,964 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-09-25 22:16:25,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842926.0, ans=0.1 2024-09-25 22:16:26,030 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=842926.0, ans=0.09899494936611666 2024-09-25 22:16:30,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=842926.0, ans=0.2 2024-09-25 22:17:08,981 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=843019.3333333334, ans=0.07 2024-09-25 22:17:10,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=843066.0, ans=0.0 2024-09-25 22:17:26,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=843112.6666666666, ans=0.07 2024-09-25 22:17:27,857 INFO [train.py:1198] (0/4) Epoch 47, batch 1450, loss[loss=0.1631, ctc_loss=0.1047, cr_loss=0.292, over 17169.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1181, cr_loss=0.3329, over 3361452.07 frames. ], batch size: 41, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:17:48,452 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.105e+02 1.274e+02 1.355e+02 1.464e+02 2.562e+02, threshold=2.711e+02, percent-clipped=0.0 2024-09-25 22:17:52,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=843159.3333333334, ans=0.0 2024-09-25 22:18:33,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=843299.3333333334, ans=10.0 2024-09-25 22:18:35,104 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=12.0 2024-09-25 22:18:36,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=843299.3333333334, ans=0.0 2024-09-25 22:18:50,193 INFO [train.py:1198] (0/4) Epoch 47, batch 1500, loss[loss=0.167, ctc_loss=0.1039, cr_loss=0.3158, over 17039.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.118, cr_loss=0.3331, over 3367101.27 frames. ], batch size: 39, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:19:07,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=843392.6666666666, ans=0.1 2024-09-25 22:19:24,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2024-09-25 22:19:35,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=843439.3333333334, ans=0.0 2024-09-25 22:20:10,602 INFO [train.py:1198] (0/4) Epoch 47, batch 1550, loss[loss=0.181, ctc_loss=0.1189, cr_loss=0.3103, over 17285.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.335, over 3371629.66 frames. ], batch size: 51, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:20:34,263 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.106e+02 1.301e+02 1.382e+02 1.468e+02 1.851e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-25 22:21:00,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=843719.3333333334, ans=0.125 2024-09-25 22:21:02,592 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2024-09-25 22:21:12,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=843719.3333333334, ans=0.0 2024-09-25 22:21:20,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=843766.0, ans=0.0 2024-09-25 22:21:24,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=843766.0, ans=0.0 2024-09-25 22:21:34,159 INFO [train.py:1198] (0/4) Epoch 47, batch 1600, loss[loss=0.1742, ctc_loss=0.1102, cr_loss=0.3198, over 17262.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3351, over 3363007.30 frames. ], batch size: 42, lr: 2.52e-03, grad_scale: 32.0 2024-09-25 22:21:34,854 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-09-25 22:21:39,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843812.6666666666, ans=0.1 2024-09-25 22:21:54,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=843859.3333333334, ans=0.125 2024-09-25 22:22:00,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=843859.3333333334, ans=0.1 2024-09-25 22:22:00,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=843859.3333333334, ans=0.2 2024-09-25 22:22:00,817 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2024-09-25 22:22:32,754 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=22.5 2024-09-25 22:22:58,438 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2024-09-25 22:22:59,329 INFO [train.py:1198] (0/4) Epoch 47, batch 1650, loss[loss=0.1505, ctc_loss=0.09435, cr_loss=0.281, over 17259.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1186, cr_loss=0.3347, over 3371777.71 frames. ], batch size: 44, lr: 2.52e-03, grad_scale: 32.0 2024-09-25 22:23:04,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=844046.0, ans=0.125 2024-09-25 22:23:22,602 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.287e+02 1.363e+02 1.436e+02 1.851e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 22:23:29,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=12.0 2024-09-25 22:23:32,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=844139.3333333334, ans=0.125 2024-09-25 22:23:34,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=844139.3333333334, ans=0.125 2024-09-25 22:23:39,126 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=844139.3333333334, ans=0.125 2024-09-25 22:24:12,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=844232.6666666666, ans=0.125 2024-09-25 22:24:14,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=12.0 2024-09-25 22:24:17,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=844232.6666666666, ans=0.0 2024-09-25 22:24:21,989 INFO [train.py:1198] (0/4) Epoch 47, batch 1700, loss[loss=0.1694, ctc_loss=0.1061, cr_loss=0.3162, over 17320.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1186, cr_loss=0.3352, over 3371738.11 frames. ], batch size: 49, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:25:11,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=844419.3333333334, ans=0.025 2024-09-25 22:25:15,677 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.11 vs. limit=10.0 2024-09-25 22:25:19,686 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:25:28,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=844466.0, ans=0.0 2024-09-25 22:25:44,107 INFO [train.py:1198] (0/4) Epoch 47, batch 1750, loss[loss=0.2031, ctc_loss=0.1337, cr_loss=0.3472, over 17317.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3345, over 3373528.64 frames. ], batch size: 51, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:26:06,157 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.334e+02 1.394e+02 1.508e+02 1.965e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 22:26:41,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=844652.6666666666, ans=0.95 2024-09-25 22:27:08,726 INFO [train.py:1198] (0/4) Epoch 47, batch 1800, loss[loss=0.193, ctc_loss=0.1245, cr_loss=0.3426, over 16899.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1182, cr_loss=0.3344, over 3360058.31 frames. ], batch size: 58, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:27:26,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=844792.6666666666, ans=0.125 2024-09-25 22:27:36,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=844792.6666666666, ans=0.0 2024-09-25 22:28:00,826 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=22.5 2024-09-25 22:28:27,044 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:28:31,340 INFO [train.py:1198] (0/4) Epoch 47, batch 1850, loss[loss=0.2093, ctc_loss=0.1346, cr_loss=0.3733, over 17140.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.335, over 3360695.52 frames. ], batch size: 48, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:28:33,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=844979.3333333334, ans=0.09899494936611666 2024-09-25 22:28:53,365 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.135e+02 1.318e+02 1.395e+02 1.509e+02 1.851e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-25 22:28:58,456 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=845026.0, ans=0.125 2024-09-25 22:29:09,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=845072.6666666666, ans=0.0 2024-09-25 22:29:16,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=845072.6666666666, ans=0.0 2024-09-25 22:29:22,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=845119.3333333334, ans=0.125 2024-09-25 22:29:38,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=845166.0, ans=0.2 2024-09-25 22:29:41,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=845166.0, ans=0.04949747468305833 2024-09-25 22:29:43,315 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=22.5 2024-09-25 22:29:44,684 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=845166.0, ans=0.05 2024-09-25 22:29:50,499 INFO [train.py:1198] (0/4) Epoch 47, batch 1900, loss[loss=0.2241, ctc_loss=0.1444, cr_loss=0.3985, over 16100.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1185, cr_loss=0.335, over 3358053.84 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:29:51,173 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-09-25 22:30:25,694 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-09-25 22:30:34,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=845306.0, ans=0.0 2024-09-25 22:30:48,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=845352.6666666666, ans=0.125 2024-09-25 22:30:56,941 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=845399.3333333334, ans=0.125 2024-09-25 22:31:11,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=845446.0, ans=0.0 2024-09-25 22:31:12,815 INFO [train.py:1198] (0/4) Epoch 47, batch 1950, loss[loss=0.1897, ctc_loss=0.1218, cr_loss=0.3396, over 17289.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1189, cr_loss=0.3354, over 3355849.64 frames. ], batch size: 46, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:31:34,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=845492.6666666666, ans=0.1 2024-09-25 22:31:35,176 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.125e+02 1.279e+02 1.368e+02 1.443e+02 1.975e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-25 22:31:45,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=845539.3333333334, ans=0.1 2024-09-25 22:31:50,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=845539.3333333334, ans=0.0 2024-09-25 22:31:54,966 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=845539.3333333334, ans=0.2 2024-09-25 22:32:17,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=845586.0, ans=0.2 2024-09-25 22:32:17,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=845586.0, ans=0.125 2024-09-25 22:32:30,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=845632.6666666666, ans=0.0 2024-09-25 22:32:38,558 INFO [train.py:1198] (0/4) Epoch 47, batch 2000, loss[loss=0.1958, ctc_loss=0.1253, cr_loss=0.3524, over 16131.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1189, cr_loss=0.3356, over 3351134.18 frames. ], batch size: 74, lr: 2.52e-03, grad_scale: 32.0 2024-09-25 22:33:09,751 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.75 vs. limit=15.0 2024-09-25 22:33:31,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=845819.3333333334, ans=0.125 2024-09-25 22:33:52,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=845866.0, ans=0.0 2024-09-25 22:34:01,514 INFO [train.py:1198] (0/4) Epoch 47, batch 2050, loss[loss=0.1946, ctc_loss=0.1263, cr_loss=0.3417, over 16612.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1192, cr_loss=0.3358, over 3350787.32 frames. ], batch size: 66, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:34:25,353 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.271e+02 1.350e+02 1.448e+02 1.978e+02, threshold=2.700e+02, percent-clipped=0.0 2024-09-25 22:34:32,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=846006.0, ans=10.0 2024-09-25 22:34:35,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=846006.0, ans=0.1 2024-09-25 22:34:54,452 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=846052.6666666666, ans=0.125 2024-09-25 22:34:57,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=846052.6666666666, ans=0.2 2024-09-25 22:35:23,977 INFO [train.py:1198] (0/4) Epoch 47, batch 2100, loss[loss=0.1849, ctc_loss=0.1183, cr_loss=0.3331, over 17097.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3357, over 3356508.99 frames. ], batch size: 49, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:35:33,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=846146.0, ans=0.2 2024-09-25 22:36:21,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=846286.0, ans=0.0 2024-09-25 22:36:29,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=846332.6666666666, ans=0.02 2024-09-25 22:36:31,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=846332.6666666666, ans=0.07 2024-09-25 22:36:46,573 INFO [train.py:1198] (0/4) Epoch 47, batch 2150, loss[loss=0.1793, ctc_loss=0.1147, cr_loss=0.3232, over 17014.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1183, cr_loss=0.3344, over 3360173.30 frames. ], batch size: 53, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:36:57,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=846379.3333333334, ans=0.125 2024-09-25 22:36:59,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=846379.3333333334, ans=0.125 2024-09-25 22:37:07,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2024-09-25 22:37:14,616 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.166e+02 1.323e+02 1.402e+02 1.477e+02 1.932e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-25 22:37:14,909 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=846426.0, ans=0.2 2024-09-25 22:37:20,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=15.0 2024-09-25 22:37:35,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=846519.3333333334, ans=0.125 2024-09-25 22:37:51,020 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2024-09-25 22:38:12,285 INFO [train.py:1198] (0/4) Epoch 47, batch 2200, loss[loss=0.1836, ctc_loss=0.1176, cr_loss=0.3298, over 17174.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.3359, over 3350535.58 frames. ], batch size: 45, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:38:46,813 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2024-09-25 22:38:54,501 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=846706.0, ans=0.0 2024-09-25 22:39:17,449 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.97 vs. limit=22.5 2024-09-25 22:39:32,144 INFO [train.py:1198] (0/4) Epoch 47, batch 2250, loss[loss=0.152, ctc_loss=0.09457, cr_loss=0.287, over 16314.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1192, cr_loss=0.3364, over 3345868.11 frames. ], batch size: 36, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:39:33,463 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=12.0 2024-09-25 22:39:44,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=846846.0, ans=0.1 2024-09-25 22:39:55,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=846892.6666666666, ans=0.05 2024-09-25 22:39:58,070 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.312e+02 1.382e+02 1.535e+02 2.267e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-25 22:39:58,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=846892.6666666666, ans=0.125 2024-09-25 22:40:28,364 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=22.5 2024-09-25 22:40:55,216 INFO [train.py:1198] (0/4) Epoch 47, batch 2300, loss[loss=0.1948, ctc_loss=0.1241, cr_loss=0.3535, over 17175.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1188, cr_loss=0.3357, over 3347908.55 frames. ], batch size: 45, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:41:03,630 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847079.3333333334, ans=0.1 2024-09-25 22:41:10,439 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.36 vs. limit=6.0 2024-09-25 22:41:29,473 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:41:33,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=15.0 2024-09-25 22:42:20,986 INFO [train.py:1198] (0/4) Epoch 47, batch 2350, loss[loss=0.1965, ctc_loss=0.127, cr_loss=0.3478, over 17358.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1179, cr_loss=0.3338, over 3357197.13 frames. ], batch size: 48, lr: 2.52e-03, grad_scale: 8.0 2024-09-25 22:42:34,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=847312.6666666666, ans=0.0 2024-09-25 22:42:39,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=847359.3333333334, ans=0.125 2024-09-25 22:42:46,751 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.178e+02 1.289e+02 1.363e+02 1.471e+02 2.156e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 22:43:02,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=847406.0, ans=0.0 2024-09-25 22:43:13,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=847452.6666666666, ans=0.0 2024-09-25 22:43:31,682 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=847499.3333333334, ans=0.125 2024-09-25 22:43:44,381 INFO [train.py:1198] (0/4) Epoch 47, batch 2400, loss[loss=0.1693, ctc_loss=0.1063, cr_loss=0.3148, over 17152.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1177, cr_loss=0.3326, over 3360827.07 frames. ], batch size: 45, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:44:16,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=847639.3333333334, ans=0.0 2024-09-25 22:44:31,691 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=12.0 2024-09-25 22:45:02,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=847732.6666666666, ans=0.125 2024-09-25 22:45:07,188 INFO [train.py:1198] (0/4) Epoch 47, batch 2450, loss[loss=0.1765, ctc_loss=0.1112, cr_loss=0.3265, over 17297.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1181, cr_loss=0.334, over 3365380.10 frames. ], batch size: 46, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:45:15,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=847779.3333333334, ans=0.0 2024-09-25 22:45:32,661 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.309e+02 1.414e+02 1.505e+02 2.738e+02, threshold=2.828e+02, percent-clipped=1.0 2024-09-25 22:45:39,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=847872.6666666666, ans=0.125 2024-09-25 22:45:53,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=847919.3333333334, ans=0.125 2024-09-25 22:46:00,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=847919.3333333334, ans=0.0 2024-09-25 22:46:02,442 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.71 vs. limit=10.0 2024-09-25 22:46:16,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=847966.0, ans=0.04949747468305833 2024-09-25 22:46:27,294 INFO [train.py:1198] (0/4) Epoch 47, batch 2500, loss[loss=0.1951, ctc_loss=0.1246, cr_loss=0.3526, over 17187.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1186, cr_loss=0.3348, over 3371272.98 frames. ], batch size: 55, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:46:56,215 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.38 vs. limit=15.0 2024-09-25 22:47:01,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=848059.3333333334, ans=0.2 2024-09-25 22:47:12,890 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=848106.0, ans=0.0 2024-09-25 22:47:16,521 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=22.5 2024-09-25 22:47:35,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=848199.3333333334, ans=0.125 2024-09-25 22:47:37,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=848199.3333333334, ans=0.125 2024-09-25 22:47:55,380 INFO [train.py:1198] (0/4) Epoch 47, batch 2550, loss[loss=0.2216, ctc_loss=0.1449, cr_loss=0.3831, over 16600.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1196, cr_loss=0.3368, over 3362879.70 frames. ], batch size: 66, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:48:00,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848246.0, ans=0.1 2024-09-25 22:48:05,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=848246.0, ans=0.0 2024-09-25 22:48:07,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=848246.0, ans=0.2 2024-09-25 22:48:16,420 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=848292.6666666666, ans=0.125 2024-09-25 22:48:20,934 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.321e+02 1.406e+02 1.538e+02 2.240e+02, threshold=2.812e+02, percent-clipped=0.0 2024-09-25 22:48:34,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=848339.3333333334, ans=0.125 2024-09-25 22:48:43,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=848386.0, ans=0.025 2024-09-25 22:49:15,636 INFO [train.py:1198] (0/4) Epoch 47, batch 2600, loss[loss=0.1736, ctc_loss=0.1097, cr_loss=0.3194, over 17022.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1199, cr_loss=0.3378, over 3349630.51 frames. ], batch size: 44, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:49:27,074 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848479.3333333334, ans=0.1 2024-09-25 22:49:31,953 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=848526.0, ans=0.125 2024-09-25 22:49:33,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848526.0, ans=0.1 2024-09-25 22:50:00,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=848572.6666666666, ans=0.0 2024-09-25 22:50:08,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=848619.3333333334, ans=0.125 2024-09-25 22:50:14,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=848619.3333333334, ans=0.2 2024-09-25 22:50:17,877 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-25 22:50:32,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=848666.0, ans=0.0 2024-09-25 22:50:38,210 INFO [train.py:1198] (0/4) Epoch 47, batch 2650, loss[loss=0.1999, ctc_loss=0.1257, cr_loss=0.3709, over 17314.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1198, cr_loss=0.3375, over 3354703.18 frames. ], batch size: 51, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:51:03,873 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.306e+02 1.404e+02 1.475e+02 2.254e+02, threshold=2.808e+02, percent-clipped=0.0 2024-09-25 22:51:07,951 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2024-09-25 22:51:13,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=848806.0, ans=0.125 2024-09-25 22:51:29,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=848852.6666666666, ans=0.125 2024-09-25 22:51:42,144 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=848852.6666666666, ans=0.1 2024-09-25 22:51:57,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=848899.3333333334, ans=0.0 2024-09-25 22:51:59,555 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-09-25 22:52:03,699 INFO [train.py:1198] (0/4) Epoch 47, batch 2700, loss[loss=0.1942, ctc_loss=0.1223, cr_loss=0.3596, over 17066.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1205, cr_loss=0.3385, over 3356505.26 frames. ], batch size: 46, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:52:16,731 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=848946.0, ans=0.125 2024-09-25 22:52:19,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=848992.6666666666, ans=0.1 2024-09-25 22:52:27,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=848992.6666666666, ans=0.125 2024-09-25 22:52:52,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=849086.0, ans=0.125 2024-09-25 22:53:04,331 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-25 22:53:16,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=849132.6666666666, ans=10.0 2024-09-25 22:53:25,684 INFO [train.py:1198] (0/4) Epoch 47, batch 2750, loss[loss=0.2126, ctc_loss=0.1367, cr_loss=0.3792, over 16589.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1201, cr_loss=0.3374, over 3352288.03 frames. ], batch size: 66, lr: 2.52e-03, grad_scale: 16.0 2024-09-25 22:53:32,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=849179.3333333334, ans=0.025 2024-09-25 22:53:51,197 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.120e+02 1.348e+02 1.413e+02 1.514e+02 2.410e+02, threshold=2.827e+02, percent-clipped=0.0 2024-09-25 22:53:56,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=849272.6666666666, ans=0.0 2024-09-25 22:54:04,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=849272.6666666666, ans=0.125 2024-09-25 22:54:11,141 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-09-25 22:54:33,464 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849366.0, ans=0.1 2024-09-25 22:54:38,197 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=849366.0, ans=0.0 2024-09-25 22:54:45,977 INFO [train.py:1198] (0/4) Epoch 47, batch 2800, loss[loss=0.1669, ctc_loss=0.1055, cr_loss=0.307, over 17350.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1202, cr_loss=0.3381, over 3361007.04 frames. ], batch size: 48, lr: 2.51e-03, grad_scale: 32.0 2024-09-25 22:55:19,427 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2024-09-25 22:55:20,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=849506.0, ans=22.5 2024-09-25 22:55:30,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=849506.0, ans=0.1 2024-09-25 22:55:30,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=849506.0, ans=0.2 2024-09-25 22:55:33,402 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=849506.0, ans=0.0 2024-09-25 22:55:40,627 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2024-09-25 22:55:41,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2024-09-25 22:56:08,516 INFO [train.py:1198] (0/4) Epoch 47, batch 2850, loss[loss=0.1585, ctc_loss=0.1004, cr_loss=0.2908, over 17174.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1197, cr_loss=0.3372, over 3365204.23 frames. ], batch size: 45, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 22:56:35,137 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=849692.6666666666, ans=0.125 2024-09-25 22:56:38,077 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.322e+02 1.395e+02 1.477e+02 2.122e+02, threshold=2.790e+02, percent-clipped=0.0 2024-09-25 22:56:41,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=849739.3333333334, ans=0.125 2024-09-25 22:57:33,023 INFO [train.py:1198] (0/4) Epoch 47, batch 2900, loss[loss=0.2327, ctc_loss=0.1561, cr_loss=0.3829, over 10627.00 frames. ], tot_loss[loss=0.1883, ctc_loss=0.1206, cr_loss=0.3388, over 3361737.79 frames. ], batch size: 124, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 22:57:56,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=849926.0, ans=0.0 2024-09-25 22:58:07,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=849972.6666666666, ans=0.1 2024-09-25 22:58:10,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=849972.6666666666, ans=0.125 2024-09-25 22:58:25,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=850019.3333333334, ans=0.125 2024-09-25 22:58:27,114 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=850019.3333333334, ans=0.125 2024-09-25 22:58:42,529 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.73 vs. limit=15.0 2024-09-25 22:58:46,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850066.0, ans=0.1 2024-09-25 22:58:56,181 INFO [train.py:1198] (0/4) Epoch 47, batch 2950, loss[loss=0.1797, ctc_loss=0.1147, cr_loss=0.325, over 17302.00 frames. ], tot_loss[loss=0.188, ctc_loss=0.1203, cr_loss=0.3384, over 3365755.90 frames. ], batch size: 49, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 22:59:01,218 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=850112.6666666666, ans=0.0 2024-09-25 22:59:04,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=850112.6666666666, ans=0.5 2024-09-25 22:59:07,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=850112.6666666666, ans=0.0 2024-09-25 22:59:20,509 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=850159.3333333334, ans=0.125 2024-09-25 22:59:24,952 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.302e+02 1.410e+02 1.499e+02 2.139e+02, threshold=2.820e+02, percent-clipped=0.0 2024-09-25 22:59:33,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=850206.0, ans=0.0 2024-09-25 23:00:02,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=850299.3333333334, ans=0.0 2024-09-25 23:00:04,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=850299.3333333334, ans=0.0 2024-09-25 23:00:07,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=850299.3333333334, ans=0.125 2024-09-25 23:00:17,881 INFO [train.py:1198] (0/4) Epoch 47, batch 3000, loss[loss=0.1605, ctc_loss=0.1011, cr_loss=0.2968, over 16934.00 frames. ], tot_loss[loss=0.1886, ctc_loss=0.1208, cr_loss=0.3393, over 3354979.18 frames. ], batch size: 42, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:00:17,882 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 23:00:26,586 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5101, 5.3884, 4.9542, 5.2751], device='cuda:0') 2024-09-25 23:00:29,812 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.9598, 2.0972, 2.3006, 2.1559, 2.4719, 2.2977, 2.5285, 2.0160], device='cuda:0') 2024-09-25 23:00:33,704 INFO [train.py:1230] (0/4) Epoch 47, validation: loss=0.0348, ctc_loss=0.0348, cr_loss=1.036e-14, over 944034.00 frames. 2024-09-25 23:00:33,704 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 23:01:02,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=850392.6666666666, ans=0.2 2024-09-25 23:01:29,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=850486.0, ans=0.0 2024-09-25 23:01:46,972 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2024-09-25 23:01:52,183 INFO [train.py:1198] (0/4) Epoch 47, batch 3050, loss[loss=0.1931, ctc_loss=0.1248, cr_loss=0.3418, over 17109.00 frames. ], tot_loss[loss=0.1888, ctc_loss=0.121, cr_loss=0.3393, over 3350309.41 frames. ], batch size: 49, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:02:20,305 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.194e+02 1.305e+02 1.413e+02 1.506e+02 4.087e+02, threshold=2.825e+02, percent-clipped=1.0 2024-09-25 23:02:41,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=850719.3333333334, ans=0.125 2024-09-25 23:02:54,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=850719.3333333334, ans=0.125 2024-09-25 23:03:05,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=850766.0, ans=0.125 2024-09-25 23:03:05,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=850766.0, ans=0.2 2024-09-25 23:03:13,093 INFO [train.py:1198] (0/4) Epoch 47, batch 3100, loss[loss=0.177, ctc_loss=0.1104, cr_loss=0.3332, over 17252.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1209, cr_loss=0.3389, over 3353413.50 frames. ], batch size: 44, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:03:19,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=850812.6666666666, ans=0.07 2024-09-25 23:03:22,709 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=850812.6666666666, ans=0.1 2024-09-25 23:03:57,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=850906.0, ans=0.125 2024-09-25 23:04:01,282 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=12.0 2024-09-25 23:04:11,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=850952.6666666666, ans=0.2 2024-09-25 23:04:13,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=850952.6666666666, ans=0.1 2024-09-25 23:04:29,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=850999.3333333334, ans=0.125 2024-09-25 23:04:33,494 INFO [train.py:1198] (0/4) Epoch 47, batch 3150, loss[loss=0.1866, ctc_loss=0.1168, cr_loss=0.3489, over 17361.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1202, cr_loss=0.3385, over 3355770.82 frames. ], batch size: 48, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:04:42,986 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=851046.0, ans=0.2 2024-09-25 23:05:01,410 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.168e+02 1.375e+02 1.455e+02 1.638e+02 2.123e+02, threshold=2.910e+02, percent-clipped=0.0 2024-09-25 23:05:04,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=851139.3333333334, ans=0.125 2024-09-25 23:05:25,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=851186.0, ans=0.0 2024-09-25 23:05:45,391 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-25 23:05:49,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=851232.6666666666, ans=0.0 2024-09-25 23:05:54,052 INFO [train.py:1198] (0/4) Epoch 47, batch 3200, loss[loss=0.2068, ctc_loss=0.1322, cr_loss=0.3729, over 16998.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3382, over 3360059.97 frames. ], batch size: 56, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:06:08,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=851326.0, ans=0.09899494936611666 2024-09-25 23:06:16,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=851326.0, ans=0.125 2024-09-25 23:06:53,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=851419.3333333334, ans=0.125 2024-09-25 23:07:10,835 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=851512.6666666666, ans=0.0 2024-09-25 23:07:12,129 INFO [train.py:1198] (0/4) Epoch 47, batch 3250, loss[loss=0.1717, ctc_loss=0.11, cr_loss=0.3082, over 17089.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3362, over 3354708.26 frames. ], batch size: 43, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:07:40,362 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.291e+02 1.374e+02 1.460e+02 1.837e+02, threshold=2.749e+02, percent-clipped=0.0 2024-09-25 23:07:45,994 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=22.5 2024-09-25 23:08:05,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=851652.6666666666, ans=0.125 2024-09-25 23:08:10,370 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-25 23:08:13,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=851699.3333333334, ans=0.0 2024-09-25 23:08:28,015 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-25 23:08:30,209 INFO [train.py:1198] (0/4) Epoch 47, batch 3300, loss[loss=0.192, ctc_loss=0.1212, cr_loss=0.3539, over 17044.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3365, over 3359519.53 frames. ], batch size: 52, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:08:33,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=851746.0, ans=0.2 2024-09-25 23:08:54,773 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.27 vs. limit=5.0 2024-09-25 23:09:04,853 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=851839.3333333334, ans=0.125 2024-09-25 23:09:08,067 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:09:48,392 INFO [train.py:1198] (0/4) Epoch 47, batch 3350, loss[loss=0.2327, ctc_loss=0.155, cr_loss=0.3886, over 11979.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1193, cr_loss=0.3364, over 3348786.85 frames. ], batch size: 123, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:10:15,935 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=22.5 2024-09-25 23:10:17,609 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.42 vs. limit=10.0 2024-09-25 23:10:18,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=852026.0, ans=0.125 2024-09-25 23:10:20,045 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.314e+02 1.397e+02 1.497e+02 2.232e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-25 23:10:42,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=852119.3333333334, ans=0.125 2024-09-25 23:10:42,844 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2024-09-25 23:10:47,081 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=852119.3333333334, ans=0.125 2024-09-25 23:10:59,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852166.0, ans=0.1 2024-09-25 23:11:08,898 INFO [train.py:1198] (0/4) Epoch 47, batch 3400, loss[loss=0.2142, ctc_loss=0.1381, cr_loss=0.3807, over 17361.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1199, cr_loss=0.3372, over 3350861.02 frames. ], batch size: 48, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:11:26,424 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=852259.3333333334, ans=0.0 2024-09-25 23:11:46,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852306.0, ans=0.1 2024-09-25 23:12:27,290 INFO [train.py:1198] (0/4) Epoch 47, batch 3450, loss[loss=0.198, ctc_loss=0.1272, cr_loss=0.3538, over 17316.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1199, cr_loss=0.337, over 3356910.37 frames. ], batch size: 51, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:12:37,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2024-09-25 23:12:39,235 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=22.5 2024-09-25 23:12:47,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=852492.6666666666, ans=0.125 2024-09-25 23:12:56,970 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.319e+02 1.407e+02 1.479e+02 2.069e+02, threshold=2.813e+02, percent-clipped=0.0 2024-09-25 23:13:12,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=852586.0, ans=0.1 2024-09-25 23:13:47,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=852679.3333333334, ans=0.2 2024-09-25 23:13:49,280 INFO [train.py:1198] (0/4) Epoch 47, batch 3500, loss[loss=0.2, ctc_loss=0.1291, cr_loss=0.3541, over 17343.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1201, cr_loss=0.3379, over 3360698.46 frames. ], batch size: 48, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:14:16,810 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.19 vs. limit=15.0 2024-09-25 23:14:42,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=852819.3333333334, ans=0.0 2024-09-25 23:14:47,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=852819.3333333334, ans=0.125 2024-09-25 23:14:52,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=852866.0, ans=0.125 2024-09-25 23:15:07,130 INFO [train.py:1198] (0/4) Epoch 47, batch 3550, loss[loss=0.1953, ctc_loss=0.1248, cr_loss=0.3524, over 17158.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1199, cr_loss=0.3371, over 3351922.63 frames. ], batch size: 45, lr: 2.51e-03, grad_scale: 8.0 2024-09-25 23:15:24,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2024-09-25 23:15:34,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=852959.3333333334, ans=0.0 2024-09-25 23:15:38,869 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.284e+02 1.338e+02 1.473e+02 3.313e+02, threshold=2.677e+02, percent-clipped=1.0 2024-09-25 23:15:40,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=853006.0, ans=0.125 2024-09-25 23:16:09,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=853052.6666666666, ans=0.0 2024-09-25 23:16:23,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=853099.3333333334, ans=0.04949747468305833 2024-09-25 23:16:27,457 INFO [train.py:1198] (0/4) Epoch 47, batch 3600, loss[loss=0.2162, ctc_loss=0.1418, cr_loss=0.3716, over 17033.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1198, cr_loss=0.3362, over 3341572.16 frames. ], batch size: 53, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:16:49,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=853192.6666666666, ans=0.2 2024-09-25 23:16:59,843 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=853239.3333333334, ans=0.125 2024-09-25 23:17:16,136 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.75 vs. limit=10.0 2024-09-25 23:17:30,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=853332.6666666666, ans=0.125 2024-09-25 23:17:40,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853332.6666666666, ans=0.1 2024-09-25 23:17:45,127 INFO [train.py:1198] (0/4) Epoch 47, batch 3650, loss[loss=0.1799, ctc_loss=0.1153, cr_loss=0.3226, over 17022.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1195, cr_loss=0.3358, over 3346741.42 frames. ], batch size: 44, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:17:45,358 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=853379.3333333334, ans=0.2 2024-09-25 23:17:54,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=853379.3333333334, ans=10.0 2024-09-25 23:17:54,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853379.3333333334, ans=0.1 2024-09-25 23:18:08,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=853426.0, ans=0.125 2024-09-25 23:18:11,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=853426.0, ans=15.0 2024-09-25 23:18:14,971 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.311e+02 1.426e+02 1.502e+02 2.686e+02, threshold=2.852e+02, percent-clipped=1.0 2024-09-25 23:18:20,385 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=22.5 2024-09-25 23:18:20,843 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=22.5 2024-09-25 23:18:37,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=853519.3333333334, ans=0.0 2024-09-25 23:18:57,149 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-09-25 23:19:04,232 INFO [train.py:1198] (0/4) Epoch 47, batch 3700, loss[loss=0.1903, ctc_loss=0.1234, cr_loss=0.3347, over 17145.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1201, cr_loss=0.3368, over 3340564.12 frames. ], batch size: 48, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:19:04,770 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2024-09-25 23:19:46,393 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=22.5 2024-09-25 23:19:50,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=853752.6666666666, ans=0.125 2024-09-25 23:20:06,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=853799.3333333334, ans=0.2 2024-09-25 23:20:16,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=853799.3333333334, ans=0.125 2024-09-25 23:20:23,780 INFO [train.py:1198] (0/4) Epoch 47, batch 3750, loss[loss=0.1824, ctc_loss=0.1181, cr_loss=0.3214, over 17030.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1203, cr_loss=0.3375, over 3337501.71 frames. ], batch size: 51, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:20:25,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=853846.0, ans=0.0 2024-09-25 23:20:53,580 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.156e+02 1.332e+02 1.393e+02 1.534e+02 1.821e+02, threshold=2.786e+02, percent-clipped=0.0 2024-09-25 23:21:10,080 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-25 23:21:34,689 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=22.5 2024-09-25 23:21:37,325 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-25 23:21:43,181 INFO [train.py:1198] (0/4) Epoch 47, batch 3800, loss[loss=0.2178, ctc_loss=0.138, cr_loss=0.3988, over 17014.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1206, cr_loss=0.338, over 3320802.50 frames. ], batch size: 44, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:22:19,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=854172.6666666666, ans=0.125 2024-09-25 23:22:24,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=854172.6666666666, ans=0.0 2024-09-25 23:22:36,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=854219.3333333334, ans=0.2 2024-09-25 23:22:39,359 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=854219.3333333334, ans=0.1 2024-09-25 23:22:41,047 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=854219.3333333334, ans=0.04949747468305833 2024-09-25 23:22:49,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=854266.0, ans=0.0 2024-09-25 23:23:03,555 INFO [train.py:1198] (0/4) Epoch 47, batch 3850, loss[loss=0.2235, ctc_loss=0.144, cr_loss=0.3971, over 17012.00 frames. ], tot_loss[loss=0.1893, ctc_loss=0.1215, cr_loss=0.3394, over 3301270.05 frames. ], batch size: 53, lr: 2.51e-03, grad_scale: 16.0 2024-09-25 23:23:25,574 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=854359.3333333334, ans=0.0 2024-09-25 23:23:32,701 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.186e+02 1.338e+02 1.449e+02 1.615e+02 3.869e+02, threshold=2.899e+02, percent-clipped=2.0 2024-09-25 23:23:34,541 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=854406.0, ans=0.125 2024-09-25 23:23:42,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=854406.0, ans=0.125 2024-09-25 23:24:13,230 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-47.pt 2024-09-25 23:25:01,184 INFO [train.py:1198] (0/4) Epoch 48, batch 0, loss[loss=0.1897, ctc_loss=0.1231, cr_loss=0.3329, over 17313.00 frames. ], tot_loss[loss=0.1897, ctc_loss=0.1231, cr_loss=0.3329, over 17313.00 frames. ], batch size: 51, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:25:01,185 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-25 23:25:16,469 INFO [train.py:1230] (0/4) Epoch 48, validation: loss=0.0347, ctc_loss=0.0347, cr_loss=1.045e-14, over 944034.00 frames. 2024-09-25 23:25:16,470 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-25 23:25:38,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-09-25 23:26:38,951 INFO [train.py:1198] (0/4) Epoch 48, batch 50, loss[loss=0.1682, ctc_loss=0.1059, cr_loss=0.3111, over 16950.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.118, cr_loss=0.3336, over 757008.05 frames. ], batch size: 42, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:26:55,235 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=854807.3333333334, ans=0.125 2024-09-25 23:26:55,718 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=12.0 2024-09-25 23:26:56,893 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=854807.3333333334, ans=0.125 2024-09-25 23:27:09,290 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=854854.0, ans=0.125 2024-09-25 23:27:14,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=854854.0, ans=0.1 2024-09-25 23:27:14,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=854854.0, ans=0.125 2024-09-25 23:27:16,936 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.326e+02 1.422e+02 1.657e+02 2.406e+02, threshold=2.844e+02, percent-clipped=0.0 2024-09-25 23:27:44,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=854947.3333333334, ans=0.125 2024-09-25 23:27:51,748 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=854947.3333333334, ans=0.2 2024-09-25 23:27:53,352 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854947.3333333334, ans=0.1 2024-09-25 23:27:57,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=854947.3333333334, ans=0.125 2024-09-25 23:28:00,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=854994.0, ans=10.0 2024-09-25 23:28:02,162 INFO [train.py:1198] (0/4) Epoch 48, batch 100, loss[loss=0.2021, ctc_loss=0.1311, cr_loss=0.3552, over 17349.00 frames. ], tot_loss[loss=0.1839, ctc_loss=0.1173, cr_loss=0.3327, over 1329553.85 frames. ], batch size: 48, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:28:16,773 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=855040.6666666666, ans=0.125 2024-09-25 23:28:37,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=855087.3333333334, ans=0.125 2024-09-25 23:28:49,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=855134.0, ans=0.07 2024-09-25 23:28:55,763 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=855134.0, ans=12.0 2024-09-25 23:29:16,097 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:29:16,110 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=855180.6666666666, ans=0.2 2024-09-25 23:29:20,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=855180.6666666666, ans=0.125 2024-09-25 23:29:25,177 INFO [train.py:1198] (0/4) Epoch 48, batch 150, loss[loss=0.213, ctc_loss=0.1393, cr_loss=0.3682, over 16577.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1178, cr_loss=0.3344, over 1790250.61 frames. ], batch size: 66, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:29:53,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=855274.0, ans=0.2 2024-09-25 23:30:00,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=855320.6666666666, ans=0.2 2024-09-25 23:30:05,999 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.318e+02 1.406e+02 1.496e+02 3.141e+02, threshold=2.812e+02, percent-clipped=1.0 2024-09-25 23:30:14,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855367.3333333334, ans=0.1 2024-09-25 23:30:47,887 INFO [train.py:1198] (0/4) Epoch 48, batch 200, loss[loss=0.1721, ctc_loss=0.1106, cr_loss=0.3077, over 17292.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1181, cr_loss=0.3352, over 2139713.01 frames. ], batch size: 51, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:30:50,522 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=15.0 2024-09-25 23:31:16,482 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=855507.3333333334, ans=0.125 2024-09-25 23:31:21,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2024-09-25 23:31:29,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=855554.0, ans=0.0 2024-09-25 23:31:29,335 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=855554.0, ans=0.125 2024-09-25 23:31:29,371 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=855554.0, ans=0.125 2024-09-25 23:32:06,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=855647.3333333334, ans=0.125 2024-09-25 23:32:10,689 INFO [train.py:1198] (0/4) Epoch 48, batch 250, loss[loss=0.1436, ctc_loss=0.09152, cr_loss=0.2603, over 17074.00 frames. ], tot_loss[loss=0.1835, ctc_loss=0.1169, cr_loss=0.3327, over 2407798.70 frames. ], batch size: 39, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:32:28,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=855740.6666666666, ans=0.125 2024-09-25 23:32:44,554 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:32:49,017 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.111e+02 1.307e+02 1.374e+02 1.472e+02 2.103e+02, threshold=2.747e+02, percent-clipped=0.0 2024-09-25 23:32:49,273 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=855787.3333333334, ans=0.0 2024-09-25 23:33:23,510 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2024-09-25 23:33:33,797 INFO [train.py:1198] (0/4) Epoch 48, batch 300, loss[loss=0.2043, ctc_loss=0.1311, cr_loss=0.366, over 17236.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1174, cr_loss=0.3334, over 2624458.62 frames. ], batch size: 55, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:33:54,916 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=12.0 2024-09-25 23:34:46,919 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=856114.0, ans=0.125 2024-09-25 23:34:59,422 INFO [train.py:1198] (0/4) Epoch 48, batch 350, loss[loss=0.2094, ctc_loss=0.1336, cr_loss=0.3788, over 16991.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.1189, cr_loss=0.3363, over 2781109.98 frames. ], batch size: 53, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:35:14,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=856207.3333333334, ans=0.2 2024-09-25 23:35:28,316 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=856207.3333333334, ans=0.125 2024-09-25 23:35:37,610 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.298e+02 1.363e+02 1.438e+02 2.006e+02, threshold=2.725e+02, percent-clipped=0.0 2024-09-25 23:35:47,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856300.6666666666, ans=0.1 2024-09-25 23:35:49,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=856300.6666666666, ans=0.0 2024-09-25 23:35:58,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=856300.6666666666, ans=0.0 2024-09-25 23:36:10,327 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-09-25 23:36:14,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=856347.3333333334, ans=0.2 2024-09-25 23:36:22,267 INFO [train.py:1198] (0/4) Epoch 48, batch 400, loss[loss=0.194, ctc_loss=0.1253, cr_loss=0.3431, over 17235.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1186, cr_loss=0.3362, over 2912565.66 frames. ], batch size: 50, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:36:27,504 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=856394.0, ans=0.125 2024-09-25 23:36:49,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-25 23:36:57,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=856487.3333333334, ans=0.0 2024-09-25 23:37:02,939 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-25 23:37:04,346 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=856487.3333333334, ans=0.025 2024-09-25 23:37:23,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=856534.0, ans=0.125 2024-09-25 23:37:23,375 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=856534.0, ans=0.025 2024-09-25 23:37:31,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=856580.6666666666, ans=0.125 2024-09-25 23:37:33,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=856580.6666666666, ans=0.125 2024-09-25 23:37:42,219 INFO [train.py:1198] (0/4) Epoch 48, batch 450, loss[loss=0.1672, ctc_loss=0.1065, cr_loss=0.3034, over 17156.00 frames. ], tot_loss[loss=0.1867, ctc_loss=0.1192, cr_loss=0.3373, over 3017803.99 frames. ], batch size: 45, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:37:53,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856627.3333333334, ans=0.1 2024-09-25 23:37:56,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856674.0, ans=0.1 2024-09-25 23:38:04,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=856674.0, ans=0.0 2024-09-25 23:38:23,492 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.150e+02 1.314e+02 1.437e+02 1.529e+02 1.990e+02, threshold=2.873e+02, percent-clipped=0.0 2024-09-25 23:38:24,365 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2024-09-25 23:38:27,286 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856720.6666666666, ans=0.1 2024-09-25 23:38:33,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=856767.3333333334, ans=0.125 2024-09-25 23:39:07,876 INFO [train.py:1198] (0/4) Epoch 48, batch 500, loss[loss=0.1566, ctc_loss=0.09506, cr_loss=0.3078, over 17041.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.1199, cr_loss=0.3388, over 3091010.14 frames. ], batch size: 39, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:39:16,898 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2024-09-25 23:39:24,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-25 23:39:35,459 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=856907.3333333334, ans=0.0 2024-09-25 23:39:47,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=856954.0, ans=0.125 2024-09-25 23:39:49,397 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=856954.0, ans=0.125 2024-09-25 23:40:10,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=857000.6666666666, ans=0.1 2024-09-25 23:40:26,847 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-25 23:40:30,798 INFO [train.py:1198] (0/4) Epoch 48, batch 550, loss[loss=0.1956, ctc_loss=0.1252, cr_loss=0.352, over 17165.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.12, cr_loss=0.3387, over 3144573.37 frames. ], batch size: 45, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:40:31,397 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2024-09-25 23:40:35,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=857094.0, ans=0.125 2024-09-25 23:41:11,883 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.319e+02 1.428e+02 1.533e+02 1.901e+02, threshold=2.856e+02, percent-clipped=0.0 2024-09-25 23:41:33,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=857234.0, ans=0.0 2024-09-25 23:41:53,735 INFO [train.py:1198] (0/4) Epoch 48, batch 600, loss[loss=0.2113, ctc_loss=0.1406, cr_loss=0.3534, over 12292.00 frames. ], tot_loss[loss=0.1885, ctc_loss=0.1205, cr_loss=0.3399, over 3186314.19 frames. ], batch size: 124, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:42:08,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=857374.0, ans=0.0 2024-09-25 23:42:34,804 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=15.0 2024-09-25 23:42:40,635 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=857467.3333333334, ans=0.125 2024-09-25 23:42:45,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=857467.3333333334, ans=0.125 2024-09-25 23:43:03,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=857514.0, ans=0.125 2024-09-25 23:43:04,128 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:43:16,492 INFO [train.py:1198] (0/4) Epoch 48, batch 650, loss[loss=0.1832, ctc_loss=0.1172, cr_loss=0.3296, over 17022.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3378, over 3213705.97 frames. ], batch size: 51, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:43:42,801 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2024-09-25 23:43:51,852 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:43:59,395 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.322e+02 1.414e+02 1.494e+02 2.236e+02, threshold=2.829e+02, percent-clipped=0.0 2024-09-25 23:44:04,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=857654.0, ans=0.125 2024-09-25 23:44:06,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=857700.6666666666, ans=0.025 2024-09-25 23:44:14,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857700.6666666666, ans=0.1 2024-09-25 23:44:14,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=857700.6666666666, ans=0.0 2024-09-25 23:44:16,598 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2024-09-25 23:44:40,140 INFO [train.py:1198] (0/4) Epoch 48, batch 700, loss[loss=0.2081, ctc_loss=0.1348, cr_loss=0.3664, over 17219.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1191, cr_loss=0.3367, over 3246672.01 frames. ], batch size: 55, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:44:42,565 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2024-09-25 23:44:50,995 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=857794.0, ans=0.2 2024-09-25 23:45:06,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=857840.6666666666, ans=0.2 2024-09-25 23:45:16,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=857887.3333333334, ans=0.0 2024-09-25 23:45:40,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=857934.0, ans=0.2 2024-09-25 23:46:05,008 INFO [train.py:1198] (0/4) Epoch 48, batch 750, loss[loss=0.1683, ctc_loss=0.1036, cr_loss=0.3237, over 17168.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3346, over 3271427.12 frames. ], batch size: 45, lr: 2.48e-03, grad_scale: 16.0 2024-09-25 23:46:05,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858027.3333333334, ans=0.1 2024-09-25 23:46:10,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=858027.3333333334, ans=0.2 2024-09-25 23:46:21,450 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=858074.0, ans=0.125 2024-09-25 23:46:21,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=858074.0, ans=0.125 2024-09-25 23:46:26,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=858074.0, ans=0.07 2024-09-25 23:46:29,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=858074.0, ans=0.125 2024-09-25 23:46:39,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=858120.6666666666, ans=0.125 2024-09-25 23:46:44,963 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.152e+02 1.340e+02 1.448e+02 1.538e+02 3.559e+02, threshold=2.895e+02, percent-clipped=1.0 2024-09-25 23:47:10,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=858214.0, ans=0.0 2024-09-25 23:47:25,137 INFO [train.py:1198] (0/4) Epoch 48, batch 800, loss[loss=0.185, ctc_loss=0.1172, cr_loss=0.339, over 17072.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3365, over 3291996.68 frames. ], batch size: 43, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:47:27,020 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=858260.6666666666, ans=0.125 2024-09-25 23:47:27,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=858260.6666666666, ans=0.0 2024-09-25 23:47:43,449 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=858307.3333333334, ans=0.04949747468305833 2024-09-25 23:47:43,909 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-25 23:48:10,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=858354.0, ans=0.125 2024-09-25 23:48:11,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=858354.0, ans=0.0 2024-09-25 23:48:13,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=858354.0, ans=0.125 2024-09-25 23:48:25,821 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=858400.6666666666, ans=0.125 2024-09-25 23:48:32,959 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-09-25 23:48:47,866 INFO [train.py:1198] (0/4) Epoch 48, batch 850, loss[loss=0.1874, ctc_loss=0.1203, cr_loss=0.3353, over 17368.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1194, cr_loss=0.3372, over 3308683.42 frames. ], batch size: 48, lr: 2.48e-03, grad_scale: 32.0 2024-09-25 23:49:13,445 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858540.6666666666, ans=0.1 2024-09-25 23:49:16,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858540.6666666666, ans=0.1 2024-09-25 23:49:19,967 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=858540.6666666666, ans=0.2 2024-09-25 23:49:24,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=858587.3333333334, ans=0.125 2024-09-25 23:49:30,828 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.140e+02 1.305e+02 1.378e+02 1.479e+02 2.667e+02, threshold=2.756e+02, percent-clipped=0.0 2024-09-25 23:49:32,623 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=858587.3333333334, ans=0.125 2024-09-25 23:49:32,838 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:49:34,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=858587.3333333334, ans=0.125 2024-09-25 23:49:51,291 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-184000.pt 2024-09-25 23:50:06,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=858680.6666666666, ans=0.5 2024-09-25 23:50:15,699 INFO [train.py:1198] (0/4) Epoch 48, batch 900, loss[loss=0.1822, ctc_loss=0.117, cr_loss=0.3262, over 17140.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1195, cr_loss=0.3378, over 3322587.24 frames. ], batch size: 48, lr: 2.47e-03, grad_scale: 32.0 2024-09-25 23:50:36,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=858774.0, ans=0.0 2024-09-25 23:50:44,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=858774.0, ans=0.0 2024-09-25 23:50:59,384 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.40 vs. limit=22.5 2024-09-25 23:51:10,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=858867.3333333334, ans=0.2 2024-09-25 23:51:20,617 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=858914.0, ans=0.2 2024-09-25 23:51:28,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=858914.0, ans=0.025 2024-09-25 23:51:37,650 INFO [train.py:1198] (0/4) Epoch 48, batch 950, loss[loss=0.1648, ctc_loss=0.1036, cr_loss=0.306, over 17082.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1191, cr_loss=0.3367, over 3336184.32 frames. ], batch size: 39, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:51:44,739 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=12.0 2024-09-25 23:51:55,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=859007.3333333334, ans=0.0 2024-09-25 23:52:02,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859007.3333333334, ans=0.1 2024-09-25 23:52:06,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=859007.3333333334, ans=0.0 2024-09-25 23:52:08,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-09-25 23:52:14,043 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-09-25 23:52:19,389 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.296e+02 1.387e+02 1.508e+02 2.071e+02, threshold=2.775e+02, percent-clipped=0.0 2024-09-25 23:52:43,575 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=859147.3333333334, ans=0.0 2024-09-25 23:53:00,141 INFO [train.py:1198] (0/4) Epoch 48, batch 1000, loss[loss=0.217, ctc_loss=0.1443, cr_loss=0.3631, over 17036.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1192, cr_loss=0.3367, over 3347728.74 frames. ], batch size: 52, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:53:06,952 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=4.942e-02 2024-09-25 23:54:10,146 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=859380.6666666666, ans=0.125 2024-09-25 23:54:19,813 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=859380.6666666666, ans=0.125 2024-09-25 23:54:22,551 INFO [train.py:1198] (0/4) Epoch 48, batch 1050, loss[loss=0.1818, ctc_loss=0.1134, cr_loss=0.342, over 17212.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1192, cr_loss=0.3368, over 3350983.17 frames. ], batch size: 47, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:54:29,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=859427.3333333334, ans=0.0 2024-09-25 23:54:32,540 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=859427.3333333334, ans=0.1 2024-09-25 23:54:34,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=859427.3333333334, ans=0.0 2024-09-25 23:55:06,816 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.127e+02 1.301e+02 1.364e+02 1.474e+02 5.159e+02, threshold=2.728e+02, percent-clipped=1.0 2024-09-25 23:55:23,612 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-25 23:55:31,185 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=859614.0, ans=0.2 2024-09-25 23:55:32,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=859614.0, ans=0.125 2024-09-25 23:55:34,230 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=859614.0, ans=0.2 2024-09-25 23:55:45,301 INFO [train.py:1198] (0/4) Epoch 48, batch 1100, loss[loss=0.2003, ctc_loss=0.1287, cr_loss=0.3575, over 17303.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3379, over 3348433.57 frames. ], batch size: 49, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:55:47,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=859660.6666666666, ans=0.125 2024-09-25 23:55:51,920 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=859660.6666666666, ans=0.0 2024-09-25 23:56:13,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=859707.3333333334, ans=0.125 2024-09-25 23:56:31,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859754.0, ans=0.1 2024-09-25 23:57:07,817 INFO [train.py:1198] (0/4) Epoch 48, batch 1150, loss[loss=0.211, ctc_loss=0.137, cr_loss=0.3699, over 17021.00 frames. ], tot_loss[loss=0.1877, ctc_loss=0.1201, cr_loss=0.338, over 3345332.14 frames. ], batch size: 53, lr: 2.47e-03, grad_scale: 16.0 2024-09-25 23:57:52,001 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.124e+02 1.298e+02 1.394e+02 1.477e+02 1.999e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-25 23:58:19,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=860080.6666666666, ans=0.125 2024-09-25 23:58:19,481 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860080.6666666666, ans=0.1 2024-09-25 23:58:30,255 INFO [train.py:1198] (0/4) Epoch 48, batch 1200, loss[loss=0.183, ctc_loss=0.1182, cr_loss=0.3242, over 17309.00 frames. ], tot_loss[loss=0.1882, ctc_loss=0.1204, cr_loss=0.3387, over 3347966.02 frames. ], batch size: 49, lr: 2.47e-03, grad_scale: 32.0 2024-09-25 23:58:36,974 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860127.3333333334, ans=0.1 2024-09-25 23:58:43,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=860127.3333333334, ans=0.125 2024-09-25 23:59:01,322 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-09-25 23:59:06,977 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=860220.6666666666, ans=0.0 2024-09-25 23:59:06,993 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=860220.6666666666, ans=0.125 2024-09-25 23:59:12,310 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.18 vs. limit=10.0 2024-09-25 23:59:32,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=860267.3333333334, ans=0.125 2024-09-25 23:59:41,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=860314.0, ans=0.0 2024-09-25 23:59:56,070 INFO [train.py:1198] (0/4) Epoch 48, batch 1250, loss[loss=0.2183, ctc_loss=0.1407, cr_loss=0.3879, over 16644.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1201, cr_loss=0.3383, over 3357054.00 frames. ], batch size: 66, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:00:39,301 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.302e+02 1.385e+02 1.504e+02 2.588e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-26 00:01:05,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=860547.3333333334, ans=0.2 2024-09-26 00:01:16,760 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:01:19,901 INFO [train.py:1198] (0/4) Epoch 48, batch 1300, loss[loss=0.1957, ctc_loss=0.1268, cr_loss=0.3445, over 17026.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3379, over 3359788.69 frames. ], batch size: 51, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:01:34,994 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860640.6666666666, ans=0.1 2024-09-26 00:01:41,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=860640.6666666666, ans=0.0 2024-09-26 00:02:01,076 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=860687.3333333334, ans=0.125 2024-09-26 00:02:17,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=860734.0, ans=0.2 2024-09-26 00:02:31,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=860780.6666666666, ans=0.125 2024-09-26 00:02:40,988 INFO [train.py:1198] (0/4) Epoch 48, batch 1350, loss[loss=0.179, ctc_loss=0.1141, cr_loss=0.3244, over 16754.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.3381, over 3354913.23 frames. ], batch size: 37, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:02:41,280 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=860827.3333333334, ans=0.05 2024-09-26 00:02:44,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=860827.3333333334, ans=0.0 2024-09-26 00:03:06,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=860874.0, ans=0.125 2024-09-26 00:03:17,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=860920.6666666666, ans=0.1 2024-09-26 00:03:26,902 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.284e+02 1.347e+02 1.442e+02 2.015e+02, threshold=2.695e+02, percent-clipped=0.0 2024-09-26 00:03:30,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=860967.3333333334, ans=0.05 2024-09-26 00:03:45,440 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=860967.3333333334, ans=0.0 2024-09-26 00:04:07,154 INFO [train.py:1198] (0/4) Epoch 48, batch 1400, loss[loss=0.1519, ctc_loss=0.09672, cr_loss=0.276, over 16960.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1198, cr_loss=0.3372, over 3362317.60 frames. ], batch size: 42, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:04:10,527 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=861060.6666666666, ans=0.125 2024-09-26 00:04:20,940 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2024-09-26 00:04:54,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=861154.0, ans=0.0 2024-09-26 00:05:00,719 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2024-09-26 00:05:11,542 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=22.5 2024-09-26 00:05:25,819 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=861247.3333333334, ans=0.1 2024-09-26 00:05:30,163 INFO [train.py:1198] (0/4) Epoch 48, batch 1450, loss[loss=0.1816, ctc_loss=0.1182, cr_loss=0.3169, over 16734.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1195, cr_loss=0.3372, over 3371174.74 frames. ], batch size: 61, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:05:40,275 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=861294.0, ans=0.125 2024-09-26 00:05:53,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=861340.6666666666, ans=0.0 2024-09-26 00:06:15,971 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.169e+02 1.346e+02 1.409e+02 1.541e+02 2.437e+02, threshold=2.817e+02, percent-clipped=0.0 2024-09-26 00:06:24,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=12.0 2024-09-26 00:06:34,329 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:06:52,975 INFO [train.py:1198] (0/4) Epoch 48, batch 1500, loss[loss=0.2093, ctc_loss=0.1353, cr_loss=0.3701, over 17357.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1198, cr_loss=0.3376, over 3352876.18 frames. ], batch size: 48, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:07:10,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=861574.0, ans=0.125 2024-09-26 00:07:44,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=861667.3333333334, ans=0.0 2024-09-26 00:08:15,405 INFO [train.py:1198] (0/4) Epoch 48, batch 1550, loss[loss=0.1648, ctc_loss=0.1044, cr_loss=0.3019, over 17011.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.338, over 3342102.40 frames. ], batch size: 44, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:08:31,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=861807.3333333334, ans=0.0 2024-09-26 00:08:34,641 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=861807.3333333334, ans=0.0 2024-09-26 00:09:00,539 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.301e+02 1.350e+02 1.480e+02 2.232e+02, threshold=2.701e+02, percent-clipped=0.0 2024-09-26 00:09:08,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=861900.6666666666, ans=0.125 2024-09-26 00:09:37,075 INFO [train.py:1198] (0/4) Epoch 48, batch 1600, loss[loss=0.1914, ctc_loss=0.1226, cr_loss=0.3442, over 17246.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3378, over 3348655.76 frames. ], batch size: 44, lr: 2.47e-03, grad_scale: 32.0 2024-09-26 00:10:11,249 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.26 vs. limit=10.0 2024-09-26 00:10:42,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-09-26 00:11:01,994 INFO [train.py:1198] (0/4) Epoch 48, batch 1650, loss[loss=0.162, ctc_loss=0.103, cr_loss=0.295, over 16936.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3378, over 3352070.85 frames. ], batch size: 42, lr: 2.47e-03, grad_scale: 32.0 2024-09-26 00:11:11,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=862227.3333333334, ans=0.1 2024-09-26 00:11:13,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=862227.3333333334, ans=0.0 2024-09-26 00:11:17,409 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2024-09-26 00:11:23,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=862274.0, ans=0.125 2024-09-26 00:11:26,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=862274.0, ans=0.025 2024-09-26 00:11:31,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=862274.0, ans=0.125 2024-09-26 00:11:31,946 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.33 vs. limit=10.0 2024-09-26 00:11:42,905 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=862320.6666666666, ans=0.125 2024-09-26 00:11:47,278 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.306e+02 1.368e+02 1.453e+02 1.769e+02, threshold=2.736e+02, percent-clipped=0.0 2024-09-26 00:11:52,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=862367.3333333334, ans=0.125 2024-09-26 00:12:22,677 INFO [train.py:1198] (0/4) Epoch 48, batch 1700, loss[loss=0.2102, ctc_loss=0.1372, cr_loss=0.3647, over 15888.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.336, over 3358760.48 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:12:45,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=862507.3333333334, ans=0.125 2024-09-26 00:12:49,348 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=862507.3333333334, ans=0.0 2024-09-26 00:13:18,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=862600.6666666666, ans=0.0 2024-09-26 00:13:18,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=862600.6666666666, ans=0.125 2024-09-26 00:13:23,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=862600.6666666666, ans=0.05 2024-09-26 00:13:28,090 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=862647.3333333334, ans=0.0 2024-09-26 00:13:37,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=862647.3333333334, ans=0.05 2024-09-26 00:13:45,482 INFO [train.py:1198] (0/4) Epoch 48, batch 1750, loss[loss=0.1808, ctc_loss=0.1122, cr_loss=0.3432, over 17134.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1197, cr_loss=0.3375, over 3353294.27 frames. ], batch size: 48, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:13:56,442 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=862694.0, ans=0.0 2024-09-26 00:14:32,513 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.295e+02 1.382e+02 1.510e+02 2.646e+02, threshold=2.763e+02, percent-clipped=0.0 2024-09-26 00:14:44,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=862834.0, ans=0.09899494936611666 2024-09-26 00:14:50,451 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-26 00:14:51,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=862834.0, ans=0.125 2024-09-26 00:15:09,852 INFO [train.py:1198] (0/4) Epoch 48, batch 1800, loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3351, over 17211.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1208, cr_loss=0.3398, over 3345134.06 frames. ], batch size: 47, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:15:18,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=862927.3333333334, ans=0.125 2024-09-26 00:15:19,644 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=862927.3333333334, ans=0.125 2024-09-26 00:15:22,972 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=862927.3333333334, ans=0.0 2024-09-26 00:15:33,142 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.40 vs. limit=10.0 2024-09-26 00:15:35,673 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=862974.0, ans=0.2 2024-09-26 00:15:46,857 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=863020.6666666666, ans=0.125 2024-09-26 00:15:48,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=863020.6666666666, ans=0.125 2024-09-26 00:15:48,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=863020.6666666666, ans=0.125 2024-09-26 00:16:25,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=863114.0, ans=0.125 2024-09-26 00:16:30,843 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:16:31,981 INFO [train.py:1198] (0/4) Epoch 48, batch 1850, loss[loss=0.1872, ctc_loss=0.1191, cr_loss=0.3403, over 17361.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1202, cr_loss=0.3386, over 3339127.04 frames. ], batch size: 48, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:16:50,231 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=22.5 2024-09-26 00:16:50,626 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2024-09-26 00:17:09,311 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=863254.0, ans=0.0 2024-09-26 00:17:10,832 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863254.0, ans=0.1 2024-09-26 00:17:16,790 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.322e+02 1.421e+02 1.543e+02 2.119e+02, threshold=2.843e+02, percent-clipped=0.0 2024-09-26 00:17:54,933 INFO [train.py:1198] (0/4) Epoch 48, batch 1900, loss[loss=0.1499, ctc_loss=0.09321, cr_loss=0.2835, over 17091.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1191, cr_loss=0.337, over 3350732.55 frames. ], batch size: 43, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:18:55,558 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:18:57,051 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:19:06,833 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=863580.6666666666, ans=0.2 2024-09-26 00:19:17,622 INFO [train.py:1198] (0/4) Epoch 48, batch 1950, loss[loss=0.164, ctc_loss=0.1038, cr_loss=0.3013, over 17058.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3351, over 3348908.55 frames. ], batch size: 46, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:19:29,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=863627.3333333334, ans=0.125 2024-09-26 00:19:36,406 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=863674.0, ans=0.2 2024-09-26 00:20:05,198 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.164e+02 1.335e+02 1.400e+02 1.522e+02 2.815e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-26 00:20:08,701 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=863767.3333333334, ans=0.2 2024-09-26 00:20:18,398 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=863767.3333333334, ans=0.0 2024-09-26 00:20:31,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=863814.0, ans=0.0 2024-09-26 00:20:37,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=863814.0, ans=0.125 2024-09-26 00:20:40,169 INFO [train.py:1198] (0/4) Epoch 48, batch 2000, loss[loss=0.1885, ctc_loss=0.1199, cr_loss=0.3429, over 16739.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3342, over 3350032.79 frames. ], batch size: 61, lr: 2.47e-03, grad_scale: 32.0 2024-09-26 00:20:57,751 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=863907.3333333334, ans=0.025 2024-09-26 00:21:07,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=863907.3333333334, ans=0.125 2024-09-26 00:21:10,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=863907.3333333334, ans=0.125 2024-09-26 00:21:13,582 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=863954.0, ans=0.2 2024-09-26 00:22:02,511 INFO [train.py:1198] (0/4) Epoch 48, batch 2050, loss[loss=0.185, ctc_loss=0.1178, cr_loss=0.3362, over 17025.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3353, over 3350692.01 frames. ], batch size: 52, lr: 2.47e-03, grad_scale: 32.0 2024-09-26 00:22:09,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=864094.0, ans=0.025 2024-09-26 00:22:23,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=864140.6666666666, ans=0.05 2024-09-26 00:22:28,849 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.52 vs. limit=10.0 2024-09-26 00:22:51,183 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.098e+02 1.300e+02 1.355e+02 1.459e+02 2.186e+02, threshold=2.709e+02, percent-clipped=0.0 2024-09-26 00:22:55,053 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2024-09-26 00:23:02,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.67 vs. limit=5.0 2024-09-26 00:23:25,069 INFO [train.py:1198] (0/4) Epoch 48, batch 2100, loss[loss=0.1975, ctc_loss=0.1287, cr_loss=0.3437, over 17196.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.119, cr_loss=0.3352, over 3344416.86 frames. ], batch size: 55, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:23:32,722 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=15.0 2024-09-26 00:24:31,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=864467.3333333334, ans=0.0 2024-09-26 00:24:46,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=864514.0, ans=0.0 2024-09-26 00:24:48,988 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-09-26 00:24:50,879 INFO [train.py:1198] (0/4) Epoch 48, batch 2150, loss[loss=0.1691, ctc_loss=0.1063, cr_loss=0.3141, over 17009.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1184, cr_loss=0.3344, over 3352572.91 frames. ], batch size: 44, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:25:20,895 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.08 vs. limit=12.0 2024-09-26 00:25:37,467 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.112e+02 1.295e+02 1.362e+02 1.467e+02 1.863e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-26 00:26:09,307 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=864747.3333333334, ans=0.125 2024-09-26 00:26:13,772 INFO [train.py:1198] (0/4) Epoch 48, batch 2200, loss[loss=0.191, ctc_loss=0.1229, cr_loss=0.3405, over 17008.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3351, over 3350828.88 frames. ], batch size: 51, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:26:20,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864794.0, ans=0.1 2024-09-26 00:26:37,188 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2024-09-26 00:26:54,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=864887.3333333334, ans=0.09899494936611666 2024-09-26 00:27:36,724 INFO [train.py:1198] (0/4) Epoch 48, batch 2250, loss[loss=0.1796, ctc_loss=0.112, cr_loss=0.3379, over 17269.00 frames. ], tot_loss[loss=0.1878, ctc_loss=0.1202, cr_loss=0.338, over 3336153.89 frames. ], batch size: 44, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:27:41,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=865027.3333333334, ans=0.0 2024-09-26 00:27:45,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=865027.3333333334, ans=0.125 2024-09-26 00:28:23,489 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 1.300e+02 1.376e+02 1.452e+02 2.028e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-26 00:29:00,163 INFO [train.py:1198] (0/4) Epoch 48, batch 2300, loss[loss=0.186, ctc_loss=0.1236, cr_loss=0.3117, over 17013.00 frames. ], tot_loss[loss=0.1884, ctc_loss=0.1207, cr_loss=0.3387, over 3343153.85 frames. ], batch size: 44, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:29:00,530 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=865260.6666666666, ans=0.0 2024-09-26 00:29:26,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=865307.3333333334, ans=0.04949747468305833 2024-09-26 00:29:44,838 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=865354.0, ans=0.2 2024-09-26 00:29:52,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=865400.6666666666, ans=0.125 2024-09-26 00:30:07,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=865447.3333333334, ans=0.125 2024-09-26 00:30:10,588 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=865447.3333333334, ans=0.125 2024-09-26 00:30:15,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=865447.3333333334, ans=0.125 2024-09-26 00:30:22,870 INFO [train.py:1198] (0/4) Epoch 48, batch 2350, loss[loss=0.2045, ctc_loss=0.1313, cr_loss=0.3662, over 17173.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1204, cr_loss=0.3378, over 3332673.90 frames. ], batch size: 45, lr: 2.47e-03, grad_scale: 16.0 2024-09-26 00:30:34,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865494.0, ans=0.125 2024-09-26 00:30:46,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865540.6666666666, ans=0.1 2024-09-26 00:31:01,702 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.42 vs. limit=15.0 2024-09-26 00:31:11,656 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.301e+02 1.386e+02 1.481e+02 2.528e+02, threshold=2.772e+02, percent-clipped=0.0 2024-09-26 00:31:21,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865634.0, ans=0.1 2024-09-26 00:31:22,218 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2024-09-26 00:31:23,094 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=865634.0, ans=0.1 2024-09-26 00:31:26,629 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=22.5 2024-09-26 00:31:30,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865680.6666666666, ans=0.1 2024-09-26 00:31:44,988 INFO [train.py:1198] (0/4) Epoch 48, batch 2400, loss[loss=0.2263, ctc_loss=0.1505, cr_loss=0.3792, over 11566.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1209, cr_loss=0.339, over 3334592.39 frames. ], batch size: 123, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:31:45,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=865727.3333333334, ans=0.2 2024-09-26 00:31:58,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=865727.3333333334, ans=0.0 2024-09-26 00:32:14,878 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.11 vs. limit=22.5 2024-09-26 00:32:22,039 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=865820.6666666666, ans=0.0 2024-09-26 00:33:07,656 INFO [train.py:1198] (0/4) Epoch 48, batch 2450, loss[loss=0.1652, ctc_loss=0.1023, cr_loss=0.3149, over 17238.00 frames. ], tot_loss[loss=0.1876, ctc_loss=0.12, cr_loss=0.3377, over 3346296.76 frames. ], batch size: 42, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:33:47,416 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=866054.0, ans=0.0 2024-09-26 00:33:58,228 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.118e+02 1.328e+02 1.396e+02 1.507e+02 2.681e+02, threshold=2.792e+02, percent-clipped=0.0 2024-09-26 00:34:32,989 INFO [train.py:1198] (0/4) Epoch 48, batch 2500, loss[loss=0.1998, ctc_loss=0.13, cr_loss=0.3486, over 17215.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1203, cr_loss=0.3381, over 3352645.57 frames. ], batch size: 50, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:34:47,661 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:35:21,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=866334.0, ans=0.125 2024-09-26 00:35:29,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=866334.0, ans=0.07 2024-09-26 00:35:34,647 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=866334.0, ans=0.025 2024-09-26 00:35:36,291 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=866380.6666666666, ans=0.125 2024-09-26 00:35:56,124 INFO [train.py:1198] (0/4) Epoch 48, batch 2550, loss[loss=0.1837, ctc_loss=0.116, cr_loss=0.3383, over 16740.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1191, cr_loss=0.3358, over 3367276.67 frames. ], batch size: 61, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:36:06,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=866427.3333333334, ans=0.0 2024-09-26 00:36:41,600 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-26 00:36:43,908 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.101e+02 1.316e+02 1.412e+02 1.521e+02 2.388e+02, threshold=2.824e+02, percent-clipped=0.0 2024-09-26 00:36:47,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=866567.3333333334, ans=0.0 2024-09-26 00:36:49,731 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.62 vs. limit=6.0 2024-09-26 00:37:16,417 INFO [train.py:1198] (0/4) Epoch 48, batch 2600, loss[loss=0.1667, ctc_loss=0.1052, cr_loss=0.3073, over 17014.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1192, cr_loss=0.3362, over 3370878.54 frames. ], batch size: 39, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:37:21,806 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:37:30,664 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=866660.6666666666, ans=0.0 2024-09-26 00:37:32,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=866660.6666666666, ans=0.125 2024-09-26 00:37:44,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=866707.3333333334, ans=0.0 2024-09-26 00:37:54,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=866754.0, ans=0.125 2024-09-26 00:38:15,256 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=866800.6666666666, ans=0.1 2024-09-26 00:38:16,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=866800.6666666666, ans=0.1 2024-09-26 00:38:20,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=866800.6666666666, ans=0.125 2024-09-26 00:38:20,050 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=866800.6666666666, ans=0.025 2024-09-26 00:38:40,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=866894.0, ans=0.1 2024-09-26 00:38:41,120 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-09-26 00:38:41,801 INFO [train.py:1198] (0/4) Epoch 48, batch 2650, loss[loss=0.1895, ctc_loss=0.1219, cr_loss=0.338, over 17029.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.12, cr_loss=0.3378, over 3358839.36 frames. ], batch size: 51, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:38:59,770 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:39:01,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=866940.6666666666, ans=0.125 2024-09-26 00:39:22,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=866987.3333333334, ans=0.125 2024-09-26 00:39:29,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=866987.3333333334, ans=0.2 2024-09-26 00:39:32,300 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.306e+02 1.381e+02 1.471e+02 2.187e+02, threshold=2.762e+02, percent-clipped=0.0 2024-09-26 00:39:37,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867034.0, ans=0.1 2024-09-26 00:39:44,638 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2024-09-26 00:40:04,503 INFO [train.py:1198] (0/4) Epoch 48, batch 2700, loss[loss=0.191, ctc_loss=0.1204, cr_loss=0.3528, over 17256.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.119, cr_loss=0.336, over 3359213.26 frames. ], batch size: 44, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:40:08,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=867127.3333333334, ans=0.95 2024-09-26 00:40:34,833 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2024-09-26 00:40:37,747 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=867220.6666666666, ans=0.0 2024-09-26 00:41:14,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=867314.0, ans=0.2 2024-09-26 00:41:19,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=867314.0, ans=0.025 2024-09-26 00:41:21,433 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=867314.0, ans=0.0 2024-09-26 00:41:27,333 INFO [train.py:1198] (0/4) Epoch 48, batch 2750, loss[loss=0.1714, ctc_loss=0.1075, cr_loss=0.3199, over 17355.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3367, over 3340610.48 frames. ], batch size: 48, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:41:42,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=867407.3333333334, ans=0.05 2024-09-26 00:41:46,848 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-09-26 00:42:13,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=867500.6666666666, ans=0.0 2024-09-26 00:42:13,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=867500.6666666666, ans=0.0 2024-09-26 00:42:15,064 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.121e+02 1.302e+02 1.400e+02 1.492e+02 2.005e+02, threshold=2.800e+02, percent-clipped=0.0 2024-09-26 00:42:34,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=867547.3333333334, ans=0.125 2024-09-26 00:42:49,904 INFO [train.py:1198] (0/4) Epoch 48, batch 2800, loss[loss=0.2054, ctc_loss=0.1326, cr_loss=0.3642, over 15901.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1195, cr_loss=0.3368, over 3346225.93 frames. ], batch size: 74, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:42:53,431 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=867594.0, ans=0.125 2024-09-26 00:44:12,431 INFO [train.py:1198] (0/4) Epoch 48, batch 2850, loss[loss=0.2108, ctc_loss=0.1372, cr_loss=0.368, over 16650.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1192, cr_loss=0.337, over 3359765.36 frames. ], batch size: 66, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:44:14,470 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:44:26,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=867827.3333333334, ans=0.125 2024-09-26 00:44:47,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=867920.6666666666, ans=0.125 2024-09-26 00:45:02,970 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.151e+02 1.286e+02 1.342e+02 1.424e+02 1.734e+02, threshold=2.685e+02, percent-clipped=0.0 2024-09-26 00:45:09,793 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=867967.3333333334, ans=0.125 2024-09-26 00:45:22,936 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=868014.0, ans=0.125 2024-09-26 00:45:30,224 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=868014.0, ans=0.125 2024-09-26 00:45:31,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=868014.0, ans=0.0 2024-09-26 00:45:37,912 INFO [train.py:1198] (0/4) Epoch 48, batch 2900, loss[loss=0.1899, ctc_loss=0.1187, cr_loss=0.356, over 17234.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.119, cr_loss=0.3373, over 3362805.04 frames. ], batch size: 50, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:45:53,069 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=22.5 2024-09-26 00:46:26,101 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=868200.6666666666, ans=0.0 2024-09-26 00:46:41,707 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:46:53,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=868247.3333333334, ans=0.0 2024-09-26 00:46:57,808 INFO [train.py:1198] (0/4) Epoch 48, batch 2950, loss[loss=0.1792, ctc_loss=0.1153, cr_loss=0.3196, over 16941.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1185, cr_loss=0.3361, over 3366619.98 frames. ], batch size: 42, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:47:07,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=868294.0, ans=0.05 2024-09-26 00:47:19,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=868340.6666666666, ans=0.025 2024-09-26 00:47:34,581 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868387.3333333334, ans=0.1 2024-09-26 00:47:50,008 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.284e+02 1.365e+02 1.462e+02 2.440e+02, threshold=2.731e+02, percent-clipped=0.0 2024-09-26 00:48:01,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=868434.0, ans=0.1 2024-09-26 00:48:12,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=868480.6666666666, ans=0.2 2024-09-26 00:48:20,402 INFO [train.py:1198] (0/4) Epoch 48, batch 3000, loss[loss=0.1606, ctc_loss=0.1006, cr_loss=0.2999, over 17195.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1194, cr_loss=0.3376, over 3355844.15 frames. ], batch size: 41, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:48:20,403 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-26 00:48:38,779 INFO [train.py:1230] (0/4) Epoch 48, validation: loss=0.03527, ctc_loss=0.03527, cr_loss=1.067e-14, over 944034.00 frames. 2024-09-26 00:48:38,780 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-26 00:48:44,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2024-09-26 00:48:50,570 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2024-09-26 00:48:57,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=868574.0, ans=0.125 2024-09-26 00:49:08,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=868620.6666666666, ans=0.07 2024-09-26 00:49:25,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=868667.3333333334, ans=0.0 2024-09-26 00:49:57,147 INFO [train.py:1198] (0/4) Epoch 48, batch 3050, loss[loss=0.1622, ctc_loss=0.101, cr_loss=0.3063, over 17171.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1196, cr_loss=0.3378, over 3360394.04 frames. ], batch size: 41, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:50:06,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=868760.6666666666, ans=0.2 2024-09-26 00:50:17,600 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=868807.3333333334, ans=0.0 2024-09-26 00:50:29,292 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=12.0 2024-09-26 00:50:48,354 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.116e+02 1.292e+02 1.394e+02 1.474e+02 1.856e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-26 00:50:58,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=868900.6666666666, ans=0.125 2024-09-26 00:51:12,289 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=868947.3333333334, ans=0.125 2024-09-26 00:51:12,614 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=22.5 2024-09-26 00:51:18,423 INFO [train.py:1198] (0/4) Epoch 48, batch 3100, loss[loss=0.1737, ctc_loss=0.1102, cr_loss=0.3177, over 17017.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1199, cr_loss=0.3381, over 3365670.95 frames. ], batch size: 44, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:51:20,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=868994.0, ans=0.0 2024-09-26 00:51:24,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=868994.0, ans=0.2 2024-09-26 00:51:39,023 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-09-26 00:51:46,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=869040.6666666666, ans=0.04949747468305833 2024-09-26 00:51:51,084 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=869087.3333333334, ans=0.2 2024-09-26 00:51:58,999 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=869087.3333333334, ans=0.025 2024-09-26 00:52:20,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=869180.6666666666, ans=0.0 2024-09-26 00:52:26,349 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.06 vs. limit=22.5 2024-09-26 00:52:31,553 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=869180.6666666666, ans=0.0 2024-09-26 00:52:34,561 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=869227.3333333334, ans=0.125 2024-09-26 00:52:35,872 INFO [train.py:1198] (0/4) Epoch 48, batch 3150, loss[loss=0.2022, ctc_loss=0.1296, cr_loss=0.3631, over 17002.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1199, cr_loss=0.3379, over 3361724.63 frames. ], batch size: 53, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 00:52:46,098 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 00:52:55,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=869274.0, ans=0.05 2024-09-26 00:53:07,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=869320.6666666666, ans=0.1 2024-09-26 00:53:26,182 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.144e+02 1.286e+02 1.377e+02 1.489e+02 2.573e+02, threshold=2.753e+02, percent-clipped=0.0 2024-09-26 00:53:27,965 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=869367.3333333334, ans=0.125 2024-09-26 00:53:50,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-09-26 00:53:54,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=869460.6666666666, ans=0.0 2024-09-26 00:53:55,919 INFO [train.py:1198] (0/4) Epoch 48, batch 3200, loss[loss=0.2054, ctc_loss=0.1296, cr_loss=0.379, over 17042.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1191, cr_loss=0.3369, over 3358837.62 frames. ], batch size: 53, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:54:15,013 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=869507.3333333334, ans=0.1 2024-09-26 00:54:27,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=869554.0, ans=0.1 2024-09-26 00:54:27,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=869554.0, ans=0.125 2024-09-26 00:54:38,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=869554.0, ans=0.025 2024-09-26 00:54:40,396 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=869554.0, ans=0.125 2024-09-26 00:54:47,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.91 vs. limit=5.0 2024-09-26 00:54:51,906 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2024-09-26 00:54:57,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=869647.3333333334, ans=0.1 2024-09-26 00:55:11,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=869647.3333333334, ans=0.0 2024-09-26 00:55:13,765 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.20 vs. limit=10.0 2024-09-26 00:55:14,464 INFO [train.py:1198] (0/4) Epoch 48, batch 3250, loss[loss=0.1714, ctc_loss=0.107, cr_loss=0.3218, over 17289.00 frames. ], tot_loss[loss=0.1875, ctc_loss=0.1198, cr_loss=0.3387, over 3360158.28 frames. ], batch size: 46, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:55:17,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=869694.0, ans=0.125 2024-09-26 00:55:21,059 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=869694.0, ans=0.0 2024-09-26 00:55:30,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=869740.6666666666, ans=0.0 2024-09-26 00:55:33,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=869740.6666666666, ans=0.125 2024-09-26 00:55:44,425 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=869787.3333333334, ans=0.1 2024-09-26 00:55:52,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=869787.3333333334, ans=15.0 2024-09-26 00:56:03,200 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.301e+02 1.439e+02 1.548e+02 2.033e+02, threshold=2.879e+02, percent-clipped=0.0 2024-09-26 00:56:14,443 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=869834.0, ans=0.125 2024-09-26 00:56:25,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=869880.6666666666, ans=0.2 2024-09-26 00:56:33,154 INFO [train.py:1198] (0/4) Epoch 48, batch 3300, loss[loss=0.1619, ctc_loss=0.1023, cr_loss=0.2981, over 17084.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1189, cr_loss=0.3368, over 3354010.35 frames. ], batch size: 40, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:56:38,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=869927.3333333334, ans=0.125 2024-09-26 00:56:39,759 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=869927.3333333334, ans=0.0 2024-09-26 00:57:14,644 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-09-26 00:57:54,016 INFO [train.py:1198] (0/4) Epoch 48, batch 3350, loss[loss=0.175, ctc_loss=0.1087, cr_loss=0.3315, over 17254.00 frames. ], tot_loss[loss=0.1869, ctc_loss=0.1194, cr_loss=0.3375, over 3353355.47 frames. ], batch size: 44, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:57:54,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=870160.6666666666, ans=0.0 2024-09-26 00:58:00,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=870160.6666666666, ans=0.125 2024-09-26 00:58:30,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=870254.0, ans=0.125 2024-09-26 00:58:42,522 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.313e+02 1.389e+02 1.485e+02 2.642e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-26 00:59:12,294 INFO [train.py:1198] (0/4) Epoch 48, batch 3400, loss[loss=0.1929, ctc_loss=0.1254, cr_loss=0.3376, over 16972.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1196, cr_loss=0.3373, over 3343809.81 frames. ], batch size: 58, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 00:59:47,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=870487.3333333334, ans=0.0 2024-09-26 00:59:48,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=12.0 2024-09-26 00:59:50,571 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=870487.3333333334, ans=0.125 2024-09-26 00:59:55,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=870487.3333333334, ans=0.0 2024-09-26 01:00:07,856 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=870534.0, ans=0.125 2024-09-26 01:00:14,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=870534.0, ans=0.125 2024-09-26 01:00:26,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=870580.6666666666, ans=0.0 2024-09-26 01:00:26,552 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=870580.6666666666, ans=0.125 2024-09-26 01:00:32,485 INFO [train.py:1198] (0/4) Epoch 48, batch 3450, loss[loss=0.162, ctc_loss=0.099, cr_loss=0.3153, over 16687.00 frames. ], tot_loss[loss=0.1864, ctc_loss=0.1192, cr_loss=0.3363, over 3348225.47 frames. ], batch size: 37, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 01:00:35,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870627.3333333334, ans=0.1 2024-09-26 01:00:41,022 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-26 01:00:45,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=870627.3333333334, ans=0.025 2024-09-26 01:00:47,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=870627.3333333334, ans=0.0 2024-09-26 01:00:58,219 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=870674.0, ans=0.035 2024-09-26 01:01:24,344 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.161e+02 1.301e+02 1.395e+02 1.467e+02 2.192e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-26 01:01:33,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2024-09-26 01:01:33,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=870767.3333333334, ans=6.0 2024-09-26 01:01:52,536 INFO [train.py:1198] (0/4) Epoch 48, batch 3500, loss[loss=0.1735, ctc_loss=0.108, cr_loss=0.3276, over 17152.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1184, cr_loss=0.3357, over 3353975.51 frames. ], batch size: 48, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 01:02:05,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=870860.6666666666, ans=0.09899494936611666 2024-09-26 01:02:11,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=870907.3333333334, ans=0.95 2024-09-26 01:02:13,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870907.3333333334, ans=0.1 2024-09-26 01:02:25,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=870954.0, ans=12.0 2024-09-26 01:02:27,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=870954.0, ans=0.125 2024-09-26 01:02:42,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=871000.6666666666, ans=0.125 2024-09-26 01:02:50,758 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.88 vs. limit=10.0 2024-09-26 01:03:02,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=871047.3333333334, ans=0.2 2024-09-26 01:03:12,762 INFO [train.py:1198] (0/4) Epoch 48, batch 3550, loss[loss=0.234, ctc_loss=0.153, cr_loss=0.4052, over 16905.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1187, cr_loss=0.3352, over 3348705.40 frames. ], batch size: 58, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 01:03:14,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=871094.0, ans=0.125 2024-09-26 01:03:41,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=871140.6666666666, ans=0.0 2024-09-26 01:03:48,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=8.0 2024-09-26 01:03:52,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=871187.3333333334, ans=0.125 2024-09-26 01:04:02,352 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.323e+02 1.392e+02 1.491e+02 3.535e+02, threshold=2.785e+02, percent-clipped=1.0 2024-09-26 01:04:16,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=871280.6666666666, ans=0.0 2024-09-26 01:04:16,515 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=871280.6666666666, ans=0.0 2024-09-26 01:04:19,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871280.6666666666, ans=0.1 2024-09-26 01:04:20,093 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-26 01:04:24,430 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=871280.6666666666, ans=0.0 2024-09-26 01:04:27,560 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=871280.6666666666, ans=0.0 2024-09-26 01:04:30,424 INFO [train.py:1198] (0/4) Epoch 48, batch 3600, loss[loss=0.1483, ctc_loss=0.09226, cr_loss=0.2802, over 17107.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3344, over 3350695.43 frames. ], batch size: 40, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 01:04:33,806 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=871327.3333333334, ans=0.0 2024-09-26 01:04:41,656 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=871327.3333333334, ans=0.025 2024-09-26 01:05:09,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=871420.6666666666, ans=0.125 2024-09-26 01:05:37,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=871514.0, ans=0.125 2024-09-26 01:05:48,461 INFO [train.py:1198] (0/4) Epoch 48, batch 3650, loss[loss=0.2225, ctc_loss=0.1445, cr_loss=0.3902, over 16453.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1183, cr_loss=0.3344, over 3346728.38 frames. ], batch size: 66, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 01:05:48,808 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=871560.6666666666, ans=0.1 2024-09-26 01:05:53,384 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=871560.6666666666, ans=0.125 2024-09-26 01:06:09,568 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=871607.3333333334, ans=0.0 2024-09-26 01:06:14,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=871607.3333333334, ans=0.0 2024-09-26 01:06:15,699 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=871607.3333333334, ans=0.0 2024-09-26 01:06:26,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=871654.0, ans=0.125 2024-09-26 01:06:29,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=871654.0, ans=0.07 2024-09-26 01:06:40,145 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.328e+02 1.397e+02 1.530e+02 1.851e+02, threshold=2.795e+02, percent-clipped=0.0 2024-09-26 01:06:40,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=871700.6666666666, ans=0.125 2024-09-26 01:06:50,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=871700.6666666666, ans=0.125 2024-09-26 01:07:09,263 INFO [train.py:1198] (0/4) Epoch 48, batch 3700, loss[loss=0.2129, ctc_loss=0.1355, cr_loss=0.3871, over 17026.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1188, cr_loss=0.3361, over 3352060.77 frames. ], batch size: 52, lr: 2.46e-03, grad_scale: 32.0 2024-09-26 01:07:42,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=871887.3333333334, ans=0.125 2024-09-26 01:07:52,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=871887.3333333334, ans=0.125 2024-09-26 01:08:04,874 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=871934.0, ans=0.0 2024-09-26 01:08:09,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=871934.0, ans=0.1 2024-09-26 01:08:28,605 INFO [train.py:1198] (0/4) Epoch 48, batch 3750, loss[loss=0.1774, ctc_loss=0.1144, cr_loss=0.3153, over 17294.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3349, over 3349575.79 frames. ], batch size: 51, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 01:08:45,175 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2024-09-26 01:08:53,416 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.91 vs. limit=15.0 2024-09-26 01:08:54,349 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=872074.0, ans=0.125 2024-09-26 01:08:54,363 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=872074.0, ans=0.1 2024-09-26 01:09:05,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=872120.6666666666, ans=0.125 2024-09-26 01:09:18,007 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=872167.3333333334, ans=0.125 2024-09-26 01:09:20,720 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.319e+02 1.400e+02 1.543e+02 5.735e+02, threshold=2.801e+02, percent-clipped=1.0 2024-09-26 01:09:23,306 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=872167.3333333334, ans=0.125 2024-09-26 01:09:48,875 INFO [train.py:1198] (0/4) Epoch 48, batch 3800, loss[loss=0.1794, ctc_loss=0.1135, cr_loss=0.3297, over 16965.00 frames. ], tot_loss[loss=0.1868, ctc_loss=0.1195, cr_loss=0.3362, over 3329916.75 frames. ], batch size: 42, lr: 2.46e-03, grad_scale: 16.0 2024-09-26 01:09:50,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=872260.6666666666, ans=0.125 2024-09-26 01:10:19,405 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=22.5 2024-09-26 01:10:23,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872354.0, ans=0.1 2024-09-26 01:10:24,208 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=15.0 2024-09-26 01:10:45,223 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=872400.6666666666, ans=0.2 2024-09-26 01:10:53,375 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=12.0 2024-09-26 01:10:59,708 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=12.0 2024-09-26 01:11:06,799 INFO [train.py:1198] (0/4) Epoch 48, batch 3850, loss[loss=0.1912, ctc_loss=0.1228, cr_loss=0.3422, over 17179.00 frames. ], tot_loss[loss=0.1879, ctc_loss=0.1205, cr_loss=0.3371, over 3301494.08 frames. ], batch size: 45, lr: 2.46e-03, grad_scale: 8.0 2024-09-26 01:11:54,971 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:11:59,143 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.219e+02 1.389e+02 1.529e+02 1.705e+02 2.274e+02, threshold=3.058e+02, percent-clipped=0.0 2024-09-26 01:12:02,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=872634.0, ans=0.05 2024-09-26 01:12:17,348 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-48.pt 2024-09-26 01:13:04,118 INFO [train.py:1198] (0/4) Epoch 49, batch 0, loss[loss=0.1641, ctc_loss=0.1017, cr_loss=0.312, over 17197.00 frames. ], tot_loss[loss=0.1641, ctc_loss=0.1017, cr_loss=0.312, over 17197.00 frames. ], batch size: 47, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:13:04,119 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-26 01:13:19,513 INFO [train.py:1230] (0/4) Epoch 49, validation: loss=0.03487, ctc_loss=0.03487, cr_loss=1.087e-14, over 944034.00 frames. 2024-09-26 01:13:19,513 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-26 01:14:03,453 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=872802.0, ans=0.0 2024-09-26 01:14:44,164 INFO [train.py:1198] (0/4) Epoch 49, batch 50, loss[loss=0.1835, ctc_loss=0.115, cr_loss=0.3422, over 17210.00 frames. ], tot_loss[loss=0.1821, ctc_loss=0.1158, cr_loss=0.3317, over 750349.56 frames. ], batch size: 50, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:14:44,408 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=872942.0, ans=0.125 2024-09-26 01:14:56,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=872942.0, ans=0.1 2024-09-26 01:14:56,680 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=872942.0, ans=0.025 2024-09-26 01:15:13,074 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2024-09-26 01:15:17,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=22.5 2024-09-26 01:15:41,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=873082.0, ans=0.125 2024-09-26 01:15:47,636 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.099e+02 1.321e+02 1.394e+02 1.552e+02 2.223e+02, threshold=2.788e+02, percent-clipped=0.0 2024-09-26 01:15:48,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873082.0, ans=0.1 2024-09-26 01:15:51,470 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-09-26 01:16:07,027 INFO [train.py:1198] (0/4) Epoch 49, batch 100, loss[loss=0.186, ctc_loss=0.1185, cr_loss=0.3377, over 17020.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1184, cr_loss=0.3368, over 1338612.81 frames. ], batch size: 44, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:16:16,065 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-26 01:16:18,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=873175.3333333334, ans=0.125 2024-09-26 01:16:22,101 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.78 vs. limit=10.0 2024-09-26 01:16:48,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=873268.6666666666, ans=0.125 2024-09-26 01:16:52,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=22.5 2024-09-26 01:17:01,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=873315.3333333334, ans=0.0 2024-09-26 01:17:29,483 INFO [train.py:1198] (0/4) Epoch 49, batch 150, loss[loss=0.1995, ctc_loss=0.1267, cr_loss=0.3638, over 17008.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1184, cr_loss=0.3368, over 1783655.06 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:17:58,168 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=873455.3333333334, ans=0.125 2024-09-26 01:18:32,853 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.108e+02 1.298e+02 1.390e+02 1.504e+02 2.457e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-26 01:18:52,322 INFO [train.py:1198] (0/4) Epoch 49, batch 200, loss[loss=0.1908, ctc_loss=0.1224, cr_loss=0.3419, over 17346.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1193, cr_loss=0.3389, over 2130476.10 frames. ], batch size: 48, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:18:59,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=873642.0, ans=0.0 2024-09-26 01:19:00,628 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:19:43,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=873782.0, ans=0.0 2024-09-26 01:19:52,952 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:20:05,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=873828.6666666666, ans=0.2 2024-09-26 01:20:17,865 INFO [train.py:1198] (0/4) Epoch 49, batch 250, loss[loss=0.1719, ctc_loss=0.1082, cr_loss=0.3186, over 17065.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1175, cr_loss=0.3342, over 2409403.38 frames. ], batch size: 46, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:21:10,811 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=874015.3333333334, ans=0.125 2024-09-26 01:21:10,987 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=12.0 2024-09-26 01:21:18,234 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.091e+02 1.257e+02 1.339e+02 1.417e+02 1.603e+02, threshold=2.678e+02, percent-clipped=0.0 2024-09-26 01:21:37,798 INFO [train.py:1198] (0/4) Epoch 49, batch 300, loss[loss=0.1682, ctc_loss=0.1068, cr_loss=0.3069, over 17223.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1173, cr_loss=0.3341, over 2618868.33 frames. ], batch size: 50, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:22:24,294 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=874202.0, ans=0.125 2024-09-26 01:22:45,404 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-09-26 01:22:59,528 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=874342.0, ans=0.0 2024-09-26 01:23:00,852 INFO [train.py:1198] (0/4) Epoch 49, batch 350, loss[loss=0.1974, ctc_loss=0.1304, cr_loss=0.3354, over 17012.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1177, cr_loss=0.3344, over 2779780.97 frames. ], batch size: 53, lr: 2.43e-03, grad_scale: 16.0 2024-09-26 01:23:06,402 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2024-09-26 01:23:07,531 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=874342.0, ans=0.0 2024-09-26 01:23:14,386 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-09-26 01:23:51,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=874482.0, ans=0.125 2024-09-26 01:23:52,138 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2024-09-26 01:24:04,153 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.315e+02 1.403e+02 1.517e+02 7.992e+02, threshold=2.807e+02, percent-clipped=1.0 2024-09-26 01:24:04,409 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=874482.0, ans=0.0 2024-09-26 01:24:23,275 INFO [train.py:1198] (0/4) Epoch 49, batch 400, loss[loss=0.1775, ctc_loss=0.1118, cr_loss=0.3285, over 17323.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1181, cr_loss=0.3349, over 2908792.65 frames. ], batch size: 51, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:24:24,251 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2024-09-26 01:25:06,348 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.50 vs. limit=10.0 2024-09-26 01:25:32,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=874762.0, ans=0.0 2024-09-26 01:25:40,666 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=874762.0, ans=0.07 2024-09-26 01:25:48,238 INFO [train.py:1198] (0/4) Epoch 49, batch 450, loss[loss=0.1823, ctc_loss=0.1176, cr_loss=0.3236, over 17305.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1193, cr_loss=0.3366, over 3011315.33 frames. ], batch size: 51, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:25:48,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=874808.6666666666, ans=0.125 2024-09-26 01:26:10,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=874855.3333333334, ans=0.0 2024-09-26 01:26:20,538 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=874902.0, ans=0.0 2024-09-26 01:26:49,069 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.319e+02 1.417e+02 1.516e+02 2.635e+02, threshold=2.833e+02, percent-clipped=0.0 2024-09-26 01:27:07,314 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-09-26 01:27:08,165 INFO [train.py:1198] (0/4) Epoch 49, batch 500, loss[loss=0.1693, ctc_loss=0.105, cr_loss=0.3213, over 16952.00 frames. ], tot_loss[loss=0.187, ctc_loss=0.1195, cr_loss=0.3374, over 3093858.78 frames. ], batch size: 42, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:27:20,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875042.0, ans=0.1 2024-09-26 01:27:52,510 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=875135.3333333334, ans=0.025 2024-09-26 01:27:54,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=875135.3333333334, ans=0.125 2024-09-26 01:27:54,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=875135.3333333334, ans=0.2 2024-09-26 01:28:02,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875182.0, ans=0.1 2024-09-26 01:28:06,881 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=875182.0, ans=0.125 2024-09-26 01:28:11,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=875182.0, ans=0.125 2024-09-26 01:28:13,534 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=22.5 2024-09-26 01:28:32,677 INFO [train.py:1198] (0/4) Epoch 49, batch 550, loss[loss=0.1861, ctc_loss=0.1161, cr_loss=0.3498, over 17253.00 frames. ], tot_loss[loss=0.1873, ctc_loss=0.1198, cr_loss=0.3377, over 3158003.24 frames. ], batch size: 44, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:28:45,655 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=875275.3333333334, ans=0.125 2024-09-26 01:28:53,519 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=875322.0, ans=0.125 2024-09-26 01:28:55,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.24 vs. limit=6.0 2024-09-26 01:29:09,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875368.6666666666, ans=0.1 2024-09-26 01:29:27,526 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875415.3333333334, ans=0.1 2024-09-26 01:29:33,607 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.295e+02 1.361e+02 1.476e+02 2.484e+02, threshold=2.722e+02, percent-clipped=0.0 2024-09-26 01:29:47,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=875462.0, ans=0.125 2024-09-26 01:29:58,451 INFO [train.py:1198] (0/4) Epoch 49, batch 600, loss[loss=0.1928, ctc_loss=0.1232, cr_loss=0.3479, over 17070.00 frames. ], tot_loss[loss=0.1872, ctc_loss=0.1196, cr_loss=0.3376, over 3200180.30 frames. ], batch size: 46, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:30:05,133 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:30:21,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2024-09-26 01:30:26,619 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.09 vs. limit=15.0 2024-09-26 01:31:18,111 INFO [train.py:1198] (0/4) Epoch 49, batch 650, loss[loss=0.1489, ctc_loss=0.09232, cr_loss=0.2828, over 17104.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1192, cr_loss=0.3364, over 3241936.94 frames. ], batch size: 43, lr: 2.43e-03, grad_scale: 32.0 2024-09-26 01:31:28,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=875742.0, ans=0.025 2024-09-26 01:31:52,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=875835.3333333334, ans=0.07 2024-09-26 01:32:00,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=12.0 2024-09-26 01:32:21,520 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.285e+02 1.382e+02 1.503e+02 2.355e+02, threshold=2.765e+02, percent-clipped=0.0 2024-09-26 01:32:40,442 INFO [train.py:1198] (0/4) Epoch 49, batch 700, loss[loss=0.1863, ctc_loss=0.12, cr_loss=0.3316, over 17235.00 frames. ], tot_loss[loss=0.1871, ctc_loss=0.1196, cr_loss=0.3371, over 3256248.69 frames. ], batch size: 50, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:32:42,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=875975.3333333334, ans=0.125 2024-09-26 01:33:31,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876115.3333333334, ans=0.1 2024-09-26 01:33:44,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876115.3333333334, ans=0.1 2024-09-26 01:33:44,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=876115.3333333334, ans=0.125 2024-09-26 01:33:47,730 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=876162.0, ans=0.125 2024-09-26 01:34:03,488 INFO [train.py:1198] (0/4) Epoch 49, batch 750, loss[loss=0.1896, ctc_loss=0.1234, cr_loss=0.3308, over 16754.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1184, cr_loss=0.3352, over 3282516.48 frames. ], batch size: 61, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:34:26,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=876255.3333333334, ans=0.125 2024-09-26 01:34:39,622 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2024-09-26 01:35:05,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=876348.6666666666, ans=0.125 2024-09-26 01:35:09,764 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.321e+02 1.400e+02 1.492e+02 1.805e+02, threshold=2.801e+02, percent-clipped=0.0 2024-09-26 01:35:14,190 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2024-09-26 01:35:27,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=876442.0, ans=0.125 2024-09-26 01:35:28,994 INFO [train.py:1198] (0/4) Epoch 49, batch 800, loss[loss=0.2539, ctc_loss=0.1649, cr_loss=0.4452, over 15071.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1182, cr_loss=0.3352, over 3293797.17 frames. ], batch size: 89, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:35:37,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=876442.0, ans=0.2 2024-09-26 01:36:06,338 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-26 01:36:20,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=876582.0, ans=0.0 2024-09-26 01:36:25,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=876582.0, ans=0.2 2024-09-26 01:36:33,025 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876628.6666666666, ans=0.1 2024-09-26 01:36:43,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=876628.6666666666, ans=0.125 2024-09-26 01:36:49,021 INFO [train.py:1198] (0/4) Epoch 49, batch 850, loss[loss=0.2018, ctc_loss=0.1271, cr_loss=0.3736, over 17217.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3347, over 3316936.98 frames. ], batch size: 47, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:37:05,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=876722.0, ans=0.05 2024-09-26 01:37:51,966 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.324e+02 1.412e+02 1.511e+02 2.737e+02, threshold=2.823e+02, percent-clipped=0.0 2024-09-26 01:38:01,878 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=876862.0, ans=0.125 2024-09-26 01:38:11,392 INFO [train.py:1198] (0/4) Epoch 49, batch 900, loss[loss=0.1514, ctc_loss=0.09396, cr_loss=0.2873, over 17120.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1181, cr_loss=0.3347, over 3326972.08 frames. ], batch size: 40, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:38:18,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=876908.6666666666, ans=0.0 2024-09-26 01:38:18,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876908.6666666666, ans=0.1 2024-09-26 01:38:30,069 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=876955.3333333334, ans=0.125 2024-09-26 01:38:48,430 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.48 vs. limit=10.0 2024-09-26 01:39:03,012 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2024-09-26 01:39:13,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=877048.6666666666, ans=0.1 2024-09-26 01:39:16,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=877095.3333333334, ans=0.0 2024-09-26 01:39:21,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877095.3333333334, ans=0.1 2024-09-26 01:39:33,927 INFO [train.py:1198] (0/4) Epoch 49, batch 950, loss[loss=0.1569, ctc_loss=0.0977, cr_loss=0.2961, over 17180.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1185, cr_loss=0.3356, over 3333290.66 frames. ], batch size: 41, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:39:43,850 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=877142.0, ans=0.125 2024-09-26 01:40:26,062 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-09-26 01:40:41,474 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.100e+02 1.300e+02 1.395e+02 1.483e+02 3.362e+02, threshold=2.790e+02, percent-clipped=1.0 2024-09-26 01:40:43,375 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-188000.pt 2024-09-26 01:41:01,614 INFO [train.py:1198] (0/4) Epoch 49, batch 1000, loss[loss=0.1912, ctc_loss=0.1231, cr_loss=0.3405, over 17137.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3346, over 3339708.06 frames. ], batch size: 48, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:41:28,889 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=877422.0, ans=0.125 2024-09-26 01:41:28,957 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=877422.0, ans=0.125 2024-09-26 01:41:32,217 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=877468.6666666666, ans=0.1 2024-09-26 01:41:38,958 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=22.5 2024-09-26 01:42:21,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877562.0, ans=0.1 2024-09-26 01:42:23,942 INFO [train.py:1198] (0/4) Epoch 49, batch 1050, loss[loss=0.1793, ctc_loss=0.1137, cr_loss=0.3283, over 17077.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1185, cr_loss=0.3352, over 3336303.67 frames. ], batch size: 49, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:42:29,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=877608.6666666666, ans=0.125 2024-09-26 01:42:39,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2024-09-26 01:42:48,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=877655.3333333334, ans=0.2 2024-09-26 01:43:07,552 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.70 vs. limit=6.0 2024-09-26 01:43:10,820 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2024-09-26 01:43:28,469 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.276e+02 1.380e+02 1.490e+02 2.320e+02, threshold=2.759e+02, percent-clipped=0.0 2024-09-26 01:43:46,130 INFO [train.py:1198] (0/4) Epoch 49, batch 1100, loss[loss=0.1819, ctc_loss=0.1129, cr_loss=0.3447, over 16965.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1183, cr_loss=0.3351, over 3334141.30 frames. ], batch size: 42, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:43:51,160 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=877842.0, ans=10.0 2024-09-26 01:44:04,006 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=877888.6666666666, ans=0.2 2024-09-26 01:44:10,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=877888.6666666666, ans=0.025 2024-09-26 01:44:53,823 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=878028.6666666666, ans=0.1 2024-09-26 01:45:09,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=878075.3333333334, ans=0.0 2024-09-26 01:45:10,652 INFO [train.py:1198] (0/4) Epoch 49, batch 1150, loss[loss=0.1673, ctc_loss=0.1048, cr_loss=0.3126, over 17081.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1186, cr_loss=0.3351, over 3334861.77 frames. ], batch size: 46, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:45:22,226 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878075.3333333334, ans=0.1 2024-09-26 01:45:33,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=878122.0, ans=0.0 2024-09-26 01:45:45,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=878168.6666666666, ans=0.09899494936611666 2024-09-26 01:45:49,613 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=878168.6666666666, ans=0.125 2024-09-26 01:46:05,675 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:46:06,036 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=878215.3333333334, ans=15.0 2024-09-26 01:46:13,008 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.291e+02 1.352e+02 1.450e+02 2.079e+02, threshold=2.704e+02, percent-clipped=0.0 2024-09-26 01:46:13,299 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=878262.0, ans=0.5 2024-09-26 01:46:25,095 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=12.0 2024-09-26 01:46:27,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=878262.0, ans=0.125 2024-09-26 01:46:30,812 INFO [train.py:1198] (0/4) Epoch 49, batch 1200, loss[loss=0.204, ctc_loss=0.1295, cr_loss=0.3728, over 17015.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1187, cr_loss=0.3361, over 3349454.84 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:46:35,769 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=878308.6666666666, ans=0.0 2024-09-26 01:46:41,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=878308.6666666666, ans=0.2 2024-09-26 01:46:48,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=878355.3333333334, ans=0.125 2024-09-26 01:46:56,148 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=878355.3333333334, ans=0.125 2024-09-26 01:47:52,403 INFO [train.py:1198] (0/4) Epoch 49, batch 1250, loss[loss=0.1893, ctc_loss=0.1207, cr_loss=0.3431, over 17064.00 frames. ], tot_loss[loss=0.1845, ctc_loss=0.1177, cr_loss=0.3342, over 3360552.66 frames. ], batch size: 46, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:48:03,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=878542.0, ans=0.0 2024-09-26 01:48:05,762 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=22.5 2024-09-26 01:48:17,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.68 vs. limit=10.0 2024-09-26 01:48:42,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=878682.0, ans=0.07 2024-09-26 01:48:44,653 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2024-09-26 01:48:49,017 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=878682.0, ans=0.125 2024-09-26 01:48:56,555 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.071e+02 1.287e+02 1.357e+02 1.444e+02 1.832e+02, threshold=2.714e+02, percent-clipped=0.0 2024-09-26 01:49:11,905 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2024-09-26 01:49:14,201 INFO [train.py:1198] (0/4) Epoch 49, batch 1300, loss[loss=0.1813, ctc_loss=0.1146, cr_loss=0.3335, over 17049.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1174, cr_loss=0.3333, over 3351822.66 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:49:20,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=878775.3333333334, ans=0.125 2024-09-26 01:49:39,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=878822.0, ans=0.0 2024-09-26 01:50:05,784 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=878915.3333333334, ans=0.2 2024-09-26 01:50:05,980 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=878915.3333333334, ans=0.2 2024-09-26 01:50:07,393 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=878915.3333333334, ans=0.05 2024-09-26 01:50:18,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=878915.3333333334, ans=0.125 2024-09-26 01:50:24,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=878962.0, ans=0.0 2024-09-26 01:50:39,184 INFO [train.py:1198] (0/4) Epoch 49, batch 1350, loss[loss=0.1776, ctc_loss=0.1129, cr_loss=0.3237, over 17296.00 frames. ], tot_loss[loss=0.1835, ctc_loss=0.1169, cr_loss=0.3327, over 3363338.24 frames. ], batch size: 46, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:50:43,201 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-26 01:51:06,838 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-26 01:51:30,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=879148.6666666666, ans=0.07 2024-09-26 01:51:37,155 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=879148.6666666666, ans=0.015 2024-09-26 01:51:40,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=879148.6666666666, ans=0.0 2024-09-26 01:51:41,923 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.182e+02 1.323e+02 1.398e+02 1.529e+02 2.878e+02, threshold=2.797e+02, percent-clipped=1.0 2024-09-26 01:51:50,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=879195.3333333334, ans=0.035 2024-09-26 01:51:57,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=15.0 2024-09-26 01:51:59,738 INFO [train.py:1198] (0/4) Epoch 49, batch 1400, loss[loss=0.1933, ctc_loss=0.1232, cr_loss=0.3505, over 15867.00 frames. ], tot_loss[loss=0.1824, ctc_loss=0.1162, cr_loss=0.3309, over 3361428.45 frames. ], batch size: 74, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:53:02,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=879382.0, ans=0.1 2024-09-26 01:53:24,001 INFO [train.py:1198] (0/4) Epoch 49, batch 1450, loss[loss=0.1985, ctc_loss=0.1275, cr_loss=0.3551, over 16778.00 frames. ], tot_loss[loss=0.1837, ctc_loss=0.1173, cr_loss=0.3322, over 3363637.17 frames. ], batch size: 61, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:53:25,892 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=879475.3333333334, ans=0.0 2024-09-26 01:53:51,326 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=879522.0, ans=0.0 2024-09-26 01:53:59,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=879568.6666666666, ans=0.04949747468305833 2024-09-26 01:53:59,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=879568.6666666666, ans=0.125 2024-09-26 01:54:05,885 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:54:09,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=879568.6666666666, ans=0.05 2024-09-26 01:54:28,634 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.139e+02 1.317e+02 1.389e+02 1.499e+02 2.448e+02, threshold=2.778e+02, percent-clipped=0.0 2024-09-26 01:54:40,168 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:54:47,226 INFO [train.py:1198] (0/4) Epoch 49, batch 1500, loss[loss=0.1667, ctc_loss=0.1064, cr_loss=0.3014, over 17143.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1179, cr_loss=0.3335, over 3364920.92 frames. ], batch size: 45, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:54:52,821 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2024-09-26 01:55:03,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=879755.3333333334, ans=0.125 2024-09-26 01:55:03,805 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2024-09-26 01:55:08,236 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=879755.3333333334, ans=0.0 2024-09-26 01:55:09,225 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=22.5 2024-09-26 01:55:22,912 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=879802.0, ans=0.025 2024-09-26 01:55:37,067 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=879848.6666666666, ans=0.125 2024-09-26 01:55:59,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=879895.3333333334, ans=0.1 2024-09-26 01:56:07,499 INFO [train.py:1198] (0/4) Epoch 49, batch 1550, loss[loss=0.1963, ctc_loss=0.127, cr_loss=0.3461, over 16754.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1184, cr_loss=0.3346, over 3370237.81 frames. ], batch size: 61, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 01:56:10,915 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=879942.0, ans=0.0 2024-09-26 01:56:15,662 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=879942.0, ans=0.125 2024-09-26 01:56:35,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=879988.6666666666, ans=0.0 2024-09-26 01:57:13,898 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.158e+02 1.314e+02 1.386e+02 1.460e+02 1.935e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-26 01:57:16,325 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=880128.6666666666, ans=0.05 2024-09-26 01:57:30,181 INFO [train.py:1198] (0/4) Epoch 49, batch 1600, loss[loss=0.1881, ctc_loss=0.1184, cr_loss=0.3485, over 17306.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1184, cr_loss=0.3345, over 3368627.69 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:57:51,812 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2024-09-26 01:58:00,855 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=880268.6666666666, ans=0.025 2024-09-26 01:58:10,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880268.6666666666, ans=0.1 2024-09-26 01:58:11,935 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880268.6666666666, ans=0.1 2024-09-26 01:58:11,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=880268.6666666666, ans=0.125 2024-09-26 01:58:29,040 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=880315.3333333334, ans=0.0 2024-09-26 01:58:30,609 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=880315.3333333334, ans=0.125 2024-09-26 01:58:37,657 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2024-09-26 01:58:40,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=880362.0, ans=0.125 2024-09-26 01:58:52,676 INFO [train.py:1198] (0/4) Epoch 49, batch 1650, loss[loss=0.2075, ctc_loss=0.1345, cr_loss=0.3646, over 16966.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1184, cr_loss=0.3349, over 3376846.05 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 01:58:52,913 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=880408.6666666666, ans=0.025 2024-09-26 01:59:04,334 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 01:59:21,720 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880455.3333333334, ans=0.1 2024-09-26 01:59:38,327 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=880502.0, ans=6.0 2024-09-26 01:59:52,535 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=880548.6666666666, ans=0.0 2024-09-26 02:00:01,889 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.295e+02 1.382e+02 1.560e+02 2.405e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-26 02:00:15,486 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2024-09-26 02:00:17,670 INFO [train.py:1198] (0/4) Epoch 49, batch 1700, loss[loss=0.1543, ctc_loss=0.09787, cr_loss=0.2819, over 17277.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1184, cr_loss=0.3351, over 3374255.05 frames. ], batch size: 44, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 02:00:30,891 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=880642.0, ans=0.125 2024-09-26 02:00:59,659 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=880735.3333333334, ans=0.2 2024-09-26 02:01:26,703 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=880828.6666666666, ans=0.125 2024-09-26 02:01:37,518 INFO [train.py:1198] (0/4) Epoch 49, batch 1750, loss[loss=0.1743, ctc_loss=0.1121, cr_loss=0.3109, over 17151.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1182, cr_loss=0.335, over 3372001.58 frames. ], batch size: 48, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:01:47,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=880875.3333333334, ans=0.025 2024-09-26 02:02:12,262 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880968.6666666666, ans=0.1 2024-09-26 02:02:16,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=880968.6666666666, ans=0.5 2024-09-26 02:02:36,978 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.95 vs. limit=15.0 2024-09-26 02:02:37,155 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.46 vs. limit=5.0 2024-09-26 02:02:45,570 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.131e+02 1.296e+02 1.405e+02 1.536e+02 1.900e+02, threshold=2.809e+02, percent-clipped=0.0 2024-09-26 02:02:58,357 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=881108.6666666666, ans=0.125 2024-09-26 02:02:59,570 INFO [train.py:1198] (0/4) Epoch 49, batch 1800, loss[loss=0.1707, ctc_loss=0.1054, cr_loss=0.3264, over 17011.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1187, cr_loss=0.3358, over 3370741.62 frames. ], batch size: 56, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:03:01,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=881108.6666666666, ans=0.125 2024-09-26 02:03:56,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=881248.6666666666, ans=0.025 2024-09-26 02:04:01,500 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=881248.6666666666, ans=0.0 2024-09-26 02:04:20,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=881342.0, ans=0.125 2024-09-26 02:04:21,866 INFO [train.py:1198] (0/4) Epoch 49, batch 1850, loss[loss=0.17, ctc_loss=0.1059, cr_loss=0.3206, over 17100.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1188, cr_loss=0.3357, over 3361630.32 frames. ], batch size: 43, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:04:28,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=881342.0, ans=0.125 2024-09-26 02:04:31,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=881342.0, ans=0.125 2024-09-26 02:04:56,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=881388.6666666666, ans=0.125 2024-09-26 02:04:56,901 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2024-09-26 02:05:22,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=881482.0, ans=0.2 2024-09-26 02:05:26,767 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=881482.0, ans=0.0 2024-09-26 02:05:33,165 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.195e+02 1.348e+02 1.416e+02 1.489e+02 2.449e+02, threshold=2.831e+02, percent-clipped=0.0 2024-09-26 02:05:47,592 INFO [train.py:1198] (0/4) Epoch 49, batch 1900, loss[loss=0.1407, ctc_loss=0.08499, cr_loss=0.2784, over 16723.00 frames. ], tot_loss[loss=0.1845, ctc_loss=0.1177, cr_loss=0.334, over 3370977.83 frames. ], batch size: 37, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:05:47,885 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=881575.3333333334, ans=0.05 2024-09-26 02:05:52,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=881575.3333333334, ans=0.125 2024-09-26 02:06:06,228 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2024-09-26 02:06:16,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=881622.0, ans=0.125 2024-09-26 02:07:10,386 INFO [train.py:1198] (0/4) Epoch 49, batch 1950, loss[loss=0.1384, ctc_loss=0.08555, cr_loss=0.2643, over 16966.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.3349, over 3368608.78 frames. ], batch size: 42, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:07:10,636 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=881808.6666666666, ans=0.125 2024-09-26 02:07:11,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-26 02:07:17,624 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.43 vs. limit=15.0 2024-09-26 02:07:20,764 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=12.0 2024-09-26 02:07:29,006 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-26 02:07:33,150 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=881855.3333333334, ans=0.125 2024-09-26 02:07:36,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=881855.3333333334, ans=0.125 2024-09-26 02:07:57,594 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-09-26 02:08:12,736 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=881995.3333333334, ans=0.0 2024-09-26 02:08:18,104 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.175e+02 1.309e+02 1.411e+02 1.494e+02 2.442e+02, threshold=2.822e+02, percent-clipped=0.0 2024-09-26 02:08:32,535 INFO [train.py:1198] (0/4) Epoch 49, batch 2000, loss[loss=0.2007, ctc_loss=0.1263, cr_loss=0.3722, over 17032.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1182, cr_loss=0.3349, over 3368019.51 frames. ], batch size: 51, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 02:08:35,983 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=882042.0, ans=0.125 2024-09-26 02:08:39,189 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=882042.0, ans=0.0 2024-09-26 02:08:50,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=882088.6666666666, ans=0.025 2024-09-26 02:08:50,498 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=882088.6666666666, ans=0.1 2024-09-26 02:09:08,039 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=15.0 2024-09-26 02:09:46,392 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=882228.6666666666, ans=0.5 2024-09-26 02:09:56,107 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=882275.3333333334, ans=0.125 2024-09-26 02:09:57,473 INFO [train.py:1198] (0/4) Epoch 49, batch 2050, loss[loss=0.2024, ctc_loss=0.1301, cr_loss=0.3617, over 16698.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1178, cr_loss=0.3347, over 3373778.84 frames. ], batch size: 61, lr: 2.42e-03, grad_scale: 32.0 2024-09-26 02:10:00,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=882275.3333333334, ans=0.125 2024-09-26 02:10:13,698 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=882322.0, ans=0.1 2024-09-26 02:10:30,013 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-26 02:10:54,003 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=882415.3333333334, ans=0.125 2024-09-26 02:11:01,936 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:11:04,656 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.134e+02 1.334e+02 1.409e+02 1.527e+02 2.352e+02, threshold=2.819e+02, percent-clipped=0.0 2024-09-26 02:11:17,572 INFO [train.py:1198] (0/4) Epoch 49, batch 2100, loss[loss=0.187, ctc_loss=0.1242, cr_loss=0.314, over 11973.00 frames. ], tot_loss[loss=0.1852, ctc_loss=0.1182, cr_loss=0.3352, over 3359228.03 frames. ], batch size: 123, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:11:38,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=882555.3333333334, ans=0.0 2024-09-26 02:12:00,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=882602.0, ans=0.125 2024-09-26 02:12:01,110 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=22.5 2024-09-26 02:12:27,497 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=882695.3333333334, ans=0.0 2024-09-26 02:12:29,296 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:12:39,853 INFO [train.py:1198] (0/4) Epoch 49, batch 2150, loss[loss=0.2243, ctc_loss=0.1533, cr_loss=0.3552, over 11561.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1185, cr_loss=0.3359, over 3361557.48 frames. ], batch size: 124, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:12:50,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=12.0 2024-09-26 02:12:51,807 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-09-26 02:13:34,530 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.80 vs. limit=10.0 2024-09-26 02:13:43,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=882882.0, ans=0.0 2024-09-26 02:13:49,690 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.094e+02 1.355e+02 1.416e+02 1.495e+02 3.199e+02, threshold=2.832e+02, percent-clipped=1.0 2024-09-26 02:13:50,320 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2024-09-26 02:13:58,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=882928.6666666666, ans=0.025 2024-09-26 02:14:02,464 INFO [train.py:1198] (0/4) Epoch 49, batch 2200, loss[loss=0.1742, ctc_loss=0.11, cr_loss=0.3207, over 17087.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.118, cr_loss=0.3354, over 3367013.32 frames. ], batch size: 43, lr: 2.42e-03, grad_scale: 16.0 2024-09-26 02:15:20,113 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=883162.0, ans=0.0 2024-09-26 02:15:27,837 INFO [train.py:1198] (0/4) Epoch 49, batch 2250, loss[loss=0.1544, ctc_loss=0.09807, cr_loss=0.2816, over 17084.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1178, cr_loss=0.3348, over 3364735.80 frames. ], batch size: 43, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:15:45,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=883255.3333333334, ans=0.125 2024-09-26 02:15:45,776 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=883255.3333333334, ans=0.025 2024-09-26 02:15:56,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=883255.3333333334, ans=0.0 2024-09-26 02:16:14,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=883348.6666666666, ans=0.125 2024-09-26 02:16:34,926 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.163e+02 1.298e+02 1.376e+02 1.462e+02 1.948e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-26 02:16:47,763 INFO [train.py:1198] (0/4) Epoch 49, batch 2300, loss[loss=0.1682, ctc_loss=0.1048, cr_loss=0.3171, over 17087.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3346, over 3363967.86 frames. ], batch size: 40, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:17:07,645 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.61 vs. limit=15.0 2024-09-26 02:17:26,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=883535.3333333334, ans=0.125 2024-09-26 02:18:08,474 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=883628.6666666666, ans=0.125 2024-09-26 02:18:13,002 INFO [train.py:1198] (0/4) Epoch 49, batch 2350, loss[loss=0.1958, ctc_loss=0.123, cr_loss=0.3643, over 17155.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1179, cr_loss=0.3346, over 3364720.60 frames. ], batch size: 48, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:18:23,740 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2024-09-26 02:18:28,008 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=883722.0, ans=0.125 2024-09-26 02:18:36,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=883722.0, ans=0.125 2024-09-26 02:18:59,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=883815.3333333334, ans=0.125 2024-09-26 02:19:01,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=883815.3333333334, ans=0.125 2024-09-26 02:19:03,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=883815.3333333334, ans=0.0 2024-09-26 02:19:06,257 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=883815.3333333334, ans=0.125 2024-09-26 02:19:20,303 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.282e+02 1.377e+02 1.471e+02 1.818e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 02:19:23,852 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=883862.0, ans=0.125 2024-09-26 02:19:25,602 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:19:32,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=883862.0, ans=0.125 2024-09-26 02:19:35,628 INFO [train.py:1198] (0/4) Epoch 49, batch 2400, loss[loss=0.1756, ctc_loss=0.1138, cr_loss=0.3088, over 17027.00 frames. ], tot_loss[loss=0.1844, ctc_loss=0.1176, cr_loss=0.3339, over 3359777.71 frames. ], batch size: 44, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:19:48,195 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=883908.6666666666, ans=0.125 2024-09-26 02:19:48,643 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=12.0 2024-09-26 02:20:34,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=884048.6666666666, ans=0.0 2024-09-26 02:20:37,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=884048.6666666666, ans=0.0 2024-09-26 02:20:37,948 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.19 vs. limit=15.0 2024-09-26 02:20:55,314 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=884095.3333333334, ans=0.125 2024-09-26 02:20:58,139 INFO [train.py:1198] (0/4) Epoch 49, batch 2450, loss[loss=0.2073, ctc_loss=0.1378, cr_loss=0.3475, over 12158.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1177, cr_loss=0.3343, over 3340739.55 frames. ], batch size: 123, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:22:06,727 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=884328.6666666666, ans=0.125 2024-09-26 02:22:09,462 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.309e+02 1.374e+02 1.463e+02 2.759e+02, threshold=2.748e+02, percent-clipped=1.0 2024-09-26 02:22:20,947 INFO [train.py:1198] (0/4) Epoch 49, batch 2500, loss[loss=0.1508, ctc_loss=0.09464, cr_loss=0.2807, over 17183.00 frames. ], tot_loss[loss=0.1838, ctc_loss=0.1172, cr_loss=0.333, over 3346157.32 frames. ], batch size: 41, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:22:23,659 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2024-09-26 02:22:25,374 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.04 vs. limit=15.0 2024-09-26 02:22:31,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=884375.3333333334, ans=0.2 2024-09-26 02:22:50,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=884422.0, ans=0.0 2024-09-26 02:22:52,096 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=884468.6666666666, ans=10.0 2024-09-26 02:23:20,319 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=884515.3333333334, ans=0.2 2024-09-26 02:23:21,951 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=884515.3333333334, ans=0.0 2024-09-26 02:23:25,340 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=884515.3333333334, ans=0.1 2024-09-26 02:23:44,124 INFO [train.py:1198] (0/4) Epoch 49, batch 2550, loss[loss=0.1906, ctc_loss=0.1234, cr_loss=0.3359, over 17139.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1173, cr_loss=0.3331, over 3353383.52 frames. ], batch size: 48, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:23:44,379 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=884608.6666666666, ans=0.0 2024-09-26 02:24:31,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=884702.0, ans=0.0 2024-09-26 02:24:55,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=884795.3333333334, ans=0.125 2024-09-26 02:24:58,343 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.313e+02 1.367e+02 1.472e+02 2.257e+02, threshold=2.734e+02, percent-clipped=0.0 2024-09-26 02:25:09,545 INFO [train.py:1198] (0/4) Epoch 49, batch 2600, loss[loss=0.1626, ctc_loss=0.1029, cr_loss=0.2985, over 17265.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1177, cr_loss=0.3343, over 3362718.56 frames. ], batch size: 42, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:25:45,009 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=884935.3333333334, ans=0.2 2024-09-26 02:25:48,401 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=884935.3333333334, ans=0.1 2024-09-26 02:26:07,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=884982.0, ans=0.125 2024-09-26 02:26:29,482 INFO [train.py:1198] (0/4) Epoch 49, batch 2650, loss[loss=0.1633, ctc_loss=0.1034, cr_loss=0.2994, over 17290.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3347, over 3372958.35 frames. ], batch size: 46, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:26:34,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885075.3333333334, ans=0.1 2024-09-26 02:26:42,707 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=885075.3333333334, ans=0.125 2024-09-26 02:27:12,332 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885168.6666666666, ans=0.1 2024-09-26 02:27:22,014 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=885215.3333333334, ans=0.125 2024-09-26 02:27:33,830 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=885215.3333333334, ans=22.5 2024-09-26 02:27:36,911 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-09-26 02:27:40,642 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.183e+02 1.322e+02 1.427e+02 1.515e+02 2.814e+02, threshold=2.853e+02, percent-clipped=1.0 2024-09-26 02:27:41,706 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2024-09-26 02:27:50,559 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=885308.6666666666, ans=0.125 2024-09-26 02:27:51,960 INFO [train.py:1198] (0/4) Epoch 49, batch 2700, loss[loss=0.1828, ctc_loss=0.1178, cr_loss=0.3247, over 16710.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1184, cr_loss=0.335, over 3361486.20 frames. ], batch size: 61, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:28:07,846 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.82 vs. limit=10.0 2024-09-26 02:28:15,583 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=885355.3333333334, ans=0.125 2024-09-26 02:28:16,187 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2024-09-26 02:28:17,124 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=885355.3333333334, ans=0.07 2024-09-26 02:28:39,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885402.0, ans=0.1 2024-09-26 02:29:14,280 INFO [train.py:1198] (0/4) Epoch 49, batch 2750, loss[loss=0.1983, ctc_loss=0.1264, cr_loss=0.3594, over 17064.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.118, cr_loss=0.3342, over 3357754.20 frames. ], batch size: 52, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:29:45,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=885588.6666666666, ans=0.1 2024-09-26 02:29:54,858 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885635.3333333334, ans=0.1 2024-09-26 02:30:18,629 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=885682.0, ans=0.125 2024-09-26 02:30:28,233 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.217e+02 1.341e+02 1.441e+02 1.583e+02 2.930e+02, threshold=2.882e+02, percent-clipped=1.0 2024-09-26 02:30:34,781 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=885728.6666666666, ans=0.0 2024-09-26 02:30:34,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=885728.6666666666, ans=0.125 2024-09-26 02:30:39,198 INFO [train.py:1198] (0/4) Epoch 49, batch 2800, loss[loss=0.1934, ctc_loss=0.1277, cr_loss=0.3284, over 16506.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1178, cr_loss=0.3343, over 3361448.29 frames. ], batch size: 66, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:30:39,447 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=885775.3333333334, ans=0.0 2024-09-26 02:30:46,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2024-09-26 02:30:50,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=885775.3333333334, ans=0.125 2024-09-26 02:31:31,528 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.58 vs. limit=15.0 2024-09-26 02:32:01,787 INFO [train.py:1198] (0/4) Epoch 49, batch 2850, loss[loss=0.1982, ctc_loss=0.1267, cr_loss=0.3576, over 17004.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1175, cr_loss=0.3333, over 3355240.53 frames. ], batch size: 53, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:32:19,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=886055.3333333334, ans=0.125 2024-09-26 02:32:41,101 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:32:50,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=886148.6666666666, ans=0.0 2024-09-26 02:33:13,930 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.143e+02 1.301e+02 1.414e+02 1.534e+02 2.528e+02, threshold=2.827e+02, percent-clipped=0.0 2024-09-26 02:33:20,644 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:33:25,146 INFO [train.py:1198] (0/4) Epoch 49, batch 2900, loss[loss=0.1875, ctc_loss=0.1184, cr_loss=0.3457, over 17157.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1175, cr_loss=0.3339, over 3342800.71 frames. ], batch size: 45, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:33:32,018 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=22.5 2024-09-26 02:33:33,356 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=886242.0, ans=0.125 2024-09-26 02:33:36,382 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=886242.0, ans=0.0 2024-09-26 02:34:49,795 INFO [train.py:1198] (0/4) Epoch 49, batch 2950, loss[loss=0.1478, ctc_loss=0.09498, cr_loss=0.264, over 16294.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3346, over 3344669.04 frames. ], batch size: 36, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:35:05,079 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.50 vs. limit=10.0 2024-09-26 02:35:11,023 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=886522.0, ans=0.125 2024-09-26 02:35:25,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=886568.6666666666, ans=0.0 2024-09-26 02:35:33,378 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=886568.6666666666, ans=0.125 2024-09-26 02:35:38,333 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2024-09-26 02:35:49,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=886615.3333333334, ans=0.125 2024-09-26 02:35:58,783 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.303e+02 1.409e+02 1.487e+02 2.405e+02, threshold=2.817e+02, percent-clipped=0.0 2024-09-26 02:36:09,978 INFO [train.py:1198] (0/4) Epoch 49, batch 3000, loss[loss=0.1891, ctc_loss=0.1205, cr_loss=0.343, over 17209.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1179, cr_loss=0.3342, over 3342617.23 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:36:09,979 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-26 02:36:24,601 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.4856, 2.8028, 3.2272, 2.9971, 3.6706, 3.4288, 3.5772, 2.6493], device='cuda:0') 2024-09-26 02:36:25,651 INFO [train.py:1230] (0/4) Epoch 49, validation: loss=0.03501, ctc_loss=0.03501, cr_loss=1.043e-14, over 944034.00 frames. 2024-09-26 02:36:25,652 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-26 02:36:40,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=886755.3333333334, ans=0.125 2024-09-26 02:37:06,389 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=886802.0, ans=0.125 2024-09-26 02:37:09,608 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=886802.0, ans=0.2 2024-09-26 02:37:39,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2024-09-26 02:37:47,487 INFO [train.py:1198] (0/4) Epoch 49, batch 3050, loss[loss=0.1597, ctc_loss=0.1009, cr_loss=0.2939, over 17172.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1176, cr_loss=0.3334, over 3355688.33 frames. ], batch size: 41, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:38:00,839 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.27 vs. limit=10.0 2024-09-26 02:38:17,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=887035.3333333334, ans=0.125 2024-09-26 02:38:31,595 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=887035.3333333334, ans=0.125 2024-09-26 02:38:39,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=887082.0, ans=0.0 2024-09-26 02:38:54,743 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.193e+02 1.323e+02 1.385e+02 1.464e+02 2.781e+02, threshold=2.770e+02, percent-clipped=0.0 2024-09-26 02:38:58,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=887128.6666666666, ans=0.0 2024-09-26 02:39:05,809 INFO [train.py:1198] (0/4) Epoch 49, batch 3100, loss[loss=0.1566, ctc_loss=0.09737, cr_loss=0.2963, over 16968.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1175, cr_loss=0.3339, over 3362372.54 frames. ], batch size: 42, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:39:06,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=887175.3333333334, ans=0.0 2024-09-26 02:39:23,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=887222.0, ans=0.125 2024-09-26 02:39:44,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=887268.6666666666, ans=0.025 2024-09-26 02:39:59,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=887315.3333333334, ans=0.2 2024-09-26 02:40:01,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=887315.3333333334, ans=0.125 2024-09-26 02:40:05,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=887315.3333333334, ans=0.2 2024-09-26 02:40:26,987 INFO [train.py:1198] (0/4) Epoch 49, batch 3150, loss[loss=0.2092, ctc_loss=0.136, cr_loss=0.3658, over 17027.00 frames. ], tot_loss[loss=0.1832, ctc_loss=0.1169, cr_loss=0.3317, over 3353022.82 frames. ], batch size: 52, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:40:30,437 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=887408.6666666666, ans=0.125 2024-09-26 02:40:48,096 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-26 02:41:01,746 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:41:16,190 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=887548.6666666666, ans=15.0 2024-09-26 02:41:25,015 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=887548.6666666666, ans=0.0 2024-09-26 02:41:33,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=887595.3333333334, ans=0.025 2024-09-26 02:41:35,931 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.172e+02 1.299e+02 1.369e+02 1.479e+02 2.318e+02, threshold=2.737e+02, percent-clipped=0.0 2024-09-26 02:41:45,298 INFO [train.py:1198] (0/4) Epoch 49, batch 3200, loss[loss=0.1773, ctc_loss=0.1123, cr_loss=0.3251, over 17038.00 frames. ], tot_loss[loss=0.1824, ctc_loss=0.1162, cr_loss=0.3306, over 3362872.28 frames. ], batch size: 39, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:42:27,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=887735.3333333334, ans=6.0 2024-09-26 02:42:47,145 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=887782.0, ans=0.125 2024-09-26 02:42:56,478 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=887828.6666666666, ans=0.125 2024-09-26 02:43:05,708 INFO [train.py:1198] (0/4) Epoch 49, batch 3250, loss[loss=0.1816, ctc_loss=0.1127, cr_loss=0.3446, over 16801.00 frames. ], tot_loss[loss=0.1825, ctc_loss=0.1164, cr_loss=0.3309, over 3372373.70 frames. ], batch size: 61, lr: 2.41e-03, grad_scale: 32.0 2024-09-26 02:43:12,177 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=887875.3333333334, ans=0.125 2024-09-26 02:44:07,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=888015.3333333334, ans=0.2 2024-09-26 02:44:18,279 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.305e+02 1.378e+02 1.462e+02 1.990e+02, threshold=2.757e+02, percent-clipped=0.0 2024-09-26 02:44:26,133 INFO [train.py:1198] (0/4) Epoch 49, batch 3300, loss[loss=0.2336, ctc_loss=0.1528, cr_loss=0.4039, over 16523.00 frames. ], tot_loss[loss=0.1836, ctc_loss=0.1171, cr_loss=0.3325, over 3370852.36 frames. ], batch size: 66, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:44:34,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=888108.6666666666, ans=0.125 2024-09-26 02:44:38,854 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=888108.6666666666, ans=0.0 2024-09-26 02:45:05,592 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=888202.0, ans=0.025 2024-09-26 02:45:35,671 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.77 vs. limit=10.0 2024-09-26 02:45:43,989 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2024-09-26 02:45:44,480 INFO [train.py:1198] (0/4) Epoch 49, batch 3350, loss[loss=0.1931, ctc_loss=0.1212, cr_loss=0.3597, over 17202.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1174, cr_loss=0.3331, over 3372610.18 frames. ], batch size: 55, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:46:05,389 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-09-26 02:46:08,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=888388.6666666666, ans=0.0 2024-09-26 02:46:12,798 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=888388.6666666666, ans=0.125 2024-09-26 02:46:23,847 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=888435.3333333334, ans=0.0 2024-09-26 02:46:27,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=888435.3333333334, ans=0.125 2024-09-26 02:46:54,936 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.181e+02 1.320e+02 1.389e+02 1.476e+02 1.760e+02, threshold=2.779e+02, percent-clipped=0.0 2024-09-26 02:47:02,762 INFO [train.py:1198] (0/4) Epoch 49, batch 3400, loss[loss=0.2124, ctc_loss=0.1353, cr_loss=0.3855, over 16988.00 frames. ], tot_loss[loss=0.1845, ctc_loss=0.1178, cr_loss=0.3335, over 3358497.75 frames. ], batch size: 53, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:47:20,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=888622.0, ans=0.125 2024-09-26 02:47:31,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=888622.0, ans=0.125 2024-09-26 02:47:46,108 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=888668.6666666666, ans=0.0 2024-09-26 02:47:58,649 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=888715.3333333334, ans=0.07 2024-09-26 02:48:23,487 INFO [train.py:1198] (0/4) Epoch 49, batch 3450, loss[loss=0.1786, ctc_loss=0.1122, cr_loss=0.3319, over 17024.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1176, cr_loss=0.3331, over 3361417.99 frames. ], batch size: 52, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:48:31,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=888808.6666666666, ans=0.2 2024-09-26 02:48:43,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=888855.3333333334, ans=0.1 2024-09-26 02:49:20,679 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=888948.6666666666, ans=0.1 2024-09-26 02:49:34,581 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.147e+02 1.304e+02 1.377e+02 1.518e+02 2.125e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 02:49:40,817 INFO [train.py:1198] (0/4) Epoch 49, batch 3500, loss[loss=0.1837, ctc_loss=0.1154, cr_loss=0.3413, over 17019.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1179, cr_loss=0.3346, over 3355983.56 frames. ], batch size: 44, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:49:54,061 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:49:55,626 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=889042.0, ans=0.0 2024-09-26 02:50:05,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=889088.6666666666, ans=0.125 2024-09-26 02:50:06,654 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=889088.6666666666, ans=0.0 2024-09-26 02:50:09,795 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=889088.6666666666, ans=0.2 2024-09-26 02:50:25,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=889135.3333333334, ans=0.0 2024-09-26 02:50:35,042 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=889182.0, ans=0.0 2024-09-26 02:50:35,128 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=889182.0, ans=0.125 2024-09-26 02:50:55,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=889228.6666666666, ans=0.0 2024-09-26 02:51:01,306 INFO [train.py:1198] (0/4) Epoch 49, batch 3550, loss[loss=0.1829, ctc_loss=0.1168, cr_loss=0.3305, over 17282.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1181, cr_loss=0.3352, over 3366906.34 frames. ], batch size: 46, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:51:14,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=889275.3333333334, ans=0.0 2024-09-26 02:51:31,885 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2024-09-26 02:51:51,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=889415.3333333334, ans=0.1 2024-09-26 02:51:58,256 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2024-09-26 02:52:13,582 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.119e+02 1.292e+02 1.382e+02 1.482e+02 3.513e+02, threshold=2.765e+02, percent-clipped=1.0 2024-09-26 02:52:20,001 INFO [train.py:1198] (0/4) Epoch 49, batch 3600, loss[loss=0.1568, ctc_loss=0.0987, cr_loss=0.2903, over 17287.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1181, cr_loss=0.3348, over 3371877.24 frames. ], batch size: 42, lr: 2.41e-03, grad_scale: 16.0 2024-09-26 02:52:47,388 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.18 vs. limit=10.0 2024-09-26 02:52:47,480 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-09-26 02:53:42,724 INFO [train.py:1198] (0/4) Epoch 49, batch 3650, loss[loss=0.2388, ctc_loss=0.159, cr_loss=0.399, over 15285.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1181, cr_loss=0.335, over 3363441.61 frames. ], batch size: 89, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:53:54,038 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=889742.0, ans=0.125 2024-09-26 02:54:23,692 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=889835.3333333334, ans=0.0 2024-09-26 02:54:26,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=889835.3333333334, ans=0.0 2024-09-26 02:54:28,387 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=889882.0, ans=0.125 2024-09-26 02:54:31,632 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=889882.0, ans=0.2 2024-09-26 02:54:56,317 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.170e+02 1.314e+02 1.382e+02 1.463e+02 2.217e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-26 02:55:01,084 INFO [train.py:1198] (0/4) Epoch 49, batch 3700, loss[loss=0.1929, ctc_loss=0.1264, cr_loss=0.3326, over 17016.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1181, cr_loss=0.3345, over 3361373.61 frames. ], batch size: 53, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:55:19,122 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=890022.0, ans=0.125 2024-09-26 02:55:42,521 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=890068.6666666666, ans=0.5 2024-09-26 02:56:00,283 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=890115.3333333334, ans=0.035 2024-09-26 02:56:20,584 INFO [train.py:1198] (0/4) Epoch 49, batch 3750, loss[loss=0.2137, ctc_loss=0.1387, cr_loss=0.3749, over 14971.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3366, over 3364873.11 frames. ], batch size: 89, lr: 2.41e-03, grad_scale: 8.0 2024-09-26 02:56:35,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890255.3333333334, ans=0.1 2024-09-26 02:57:35,557 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.349e+02 1.442e+02 1.558e+02 6.975e+02, threshold=2.884e+02, percent-clipped=1.0 2024-09-26 02:57:40,295 INFO [train.py:1198] (0/4) Epoch 49, batch 3800, loss[loss=0.1855, ctc_loss=0.1175, cr_loss=0.34, over 17361.00 frames. ], tot_loss[loss=0.1887, ctc_loss=0.1208, cr_loss=0.3398, over 3330888.42 frames. ], batch size: 48, lr: 2.40e-03, grad_scale: 8.0 2024-09-26 02:57:59,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=890488.6666666666, ans=0.125 2024-09-26 02:58:58,408 INFO [train.py:1198] (0/4) Epoch 49, batch 3850, loss[loss=0.198, ctc_loss=0.1256, cr_loss=0.3619, over 15063.00 frames. ], tot_loss[loss=0.191, ctc_loss=0.1226, cr_loss=0.3422, over 3284944.49 frames. ], batch size: 89, lr: 2.40e-03, grad_scale: 8.0 2024-09-26 02:59:11,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=890675.3333333334, ans=0.125 2024-09-26 02:59:17,726 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 02:59:28,005 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=890768.6666666666, ans=0.125 2024-09-26 02:59:44,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=890815.3333333334, ans=0.0 2024-09-26 02:59:50,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=890815.3333333334, ans=0.0 2024-09-26 03:00:08,451 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-49.pt 2024-09-26 03:00:56,112 INFO [train.py:1198] (0/4) Epoch 50, batch 0, loss[loss=0.2241, ctc_loss=0.1458, cr_loss=0.3916, over 15982.00 frames. ], tot_loss[loss=0.2241, ctc_loss=0.1458, cr_loss=0.3916, over 15982.00 frames. ], batch size: 74, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:00:56,113 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-26 03:01:04,760 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3671, 4.1890, 3.8033, 4.6169], device='cuda:0') 2024-09-26 03:01:12,097 INFO [train.py:1230] (0/4) Epoch 50, validation: loss=0.03452, ctc_loss=0.03452, cr_loss=1.145e-14, over 944034.00 frames. 2024-09-26 03:01:12,098 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-26 03:01:13,604 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.288e+02 1.424e+02 1.584e+02 1.731e+02 2.410e+02, threshold=3.169e+02, percent-clipped=0.0 2024-09-26 03:01:18,794 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=890890.0, ans=0.07 2024-09-26 03:01:32,373 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.43 vs. limit=22.5 2024-09-26 03:02:34,254 INFO [train.py:1198] (0/4) Epoch 50, batch 50, loss[loss=0.2082, ctc_loss=0.1312, cr_loss=0.3848, over 17238.00 frames. ], tot_loss[loss=0.1834, ctc_loss=0.1166, cr_loss=0.3339, over 749923.94 frames. ], batch size: 55, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:02:56,444 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2024-09-26 03:03:19,588 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2024-09-26 03:03:31,099 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.09 vs. limit=22.5 2024-09-26 03:03:35,161 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=891263.3333333334, ans=0.0 2024-09-26 03:03:57,322 INFO [train.py:1198] (0/4) Epoch 50, batch 100, loss[loss=0.2084, ctc_loss=0.1337, cr_loss=0.3732, over 17296.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1175, cr_loss=0.3338, over 1325059.82 frames. ], batch size: 49, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:03:58,964 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.109e+02 1.291e+02 1.362e+02 1.431e+02 2.417e+02, threshold=2.724e+02, percent-clipped=0.0 2024-09-26 03:04:21,879 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=891403.3333333334, ans=15.0 2024-09-26 03:04:24,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=891403.3333333334, ans=0.125 2024-09-26 03:04:26,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=891403.3333333334, ans=0.0 2024-09-26 03:04:36,956 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=891450.0, ans=0.025 2024-09-26 03:04:44,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=891450.0, ans=0.125 2024-09-26 03:04:46,590 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=891496.6666666666, ans=0.2 2024-09-26 03:04:49,542 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=891496.6666666666, ans=0.0 2024-09-26 03:05:17,559 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=15.0 2024-09-26 03:05:22,552 INFO [train.py:1198] (0/4) Epoch 50, batch 150, loss[loss=0.1881, ctc_loss=0.119, cr_loss=0.3454, over 17293.00 frames. ], tot_loss[loss=0.1856, ctc_loss=0.1187, cr_loss=0.3347, over 1772225.20 frames. ], batch size: 51, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:06:22,417 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=891730.0, ans=0.125 2024-09-26 03:06:22,512 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=891730.0, ans=0.2 2024-09-26 03:06:45,782 INFO [train.py:1198] (0/4) Epoch 50, batch 200, loss[loss=0.1935, ctc_loss=0.1216, cr_loss=0.3596, over 17223.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1178, cr_loss=0.3337, over 2125583.12 frames. ], batch size: 47, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:06:47,297 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.325e+02 1.401e+02 1.516e+02 2.050e+02, threshold=2.802e+02, percent-clipped=0.0 2024-09-26 03:06:52,458 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=891823.3333333334, ans=0.125 2024-09-26 03:06:55,725 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=891823.3333333334, ans=0.0 2024-09-26 03:07:16,188 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=891916.6666666666, ans=0.125 2024-09-26 03:07:24,747 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2024-09-26 03:07:26,335 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-26 03:07:38,650 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=891963.3333333334, ans=0.05 2024-09-26 03:07:38,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=891963.3333333334, ans=0.125 2024-09-26 03:07:51,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=892010.0, ans=0.025 2024-09-26 03:07:52,042 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-09-26 03:07:56,244 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892010.0, ans=0.1 2024-09-26 03:08:02,518 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=892010.0, ans=0.125 2024-09-26 03:08:05,419 INFO [train.py:1198] (0/4) Epoch 50, batch 250, loss[loss=0.1772, ctc_loss=0.1153, cr_loss=0.3093, over 17223.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1189, cr_loss=0.3355, over 2393148.01 frames. ], batch size: 47, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:08:11,905 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:08:19,690 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=892056.6666666666, ans=0.025 2024-09-26 03:08:21,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=892056.6666666666, ans=0.125 2024-09-26 03:08:32,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=892103.3333333334, ans=0.125 2024-09-26 03:08:40,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=892150.0, ans=0.125 2024-09-26 03:08:40,760 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=892150.0, ans=0.125 2024-09-26 03:08:53,421 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2024-09-26 03:09:07,899 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2024-09-26 03:09:28,276 INFO [train.py:1198] (0/4) Epoch 50, batch 300, loss[loss=0.1656, ctc_loss=0.1061, cr_loss=0.2976, over 17146.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1179, cr_loss=0.3336, over 2611818.36 frames. ], batch size: 48, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:09:29,766 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.113e+02 1.289e+02 1.359e+02 1.478e+02 2.731e+02, threshold=2.717e+02, percent-clipped=0.0 2024-09-26 03:10:26,566 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=892430.0, ans=0.125 2024-09-26 03:10:55,340 INFO [train.py:1198] (0/4) Epoch 50, batch 350, loss[loss=0.1954, ctc_loss=0.1272, cr_loss=0.341, over 17116.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.118, cr_loss=0.3339, over 2769450.52 frames. ], batch size: 49, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:11:20,721 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2024-09-26 03:11:31,741 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=15.0 2024-09-26 03:11:37,064 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=892616.6666666666, ans=0.0 2024-09-26 03:11:57,405 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=892663.3333333334, ans=0.5 2024-09-26 03:12:05,465 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=892710.0, ans=0.125 2024-09-26 03:12:17,701 INFO [train.py:1198] (0/4) Epoch 50, batch 400, loss[loss=0.2076, ctc_loss=0.1322, cr_loss=0.3771, over 15988.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1179, cr_loss=0.334, over 2897459.30 frames. ], batch size: 74, lr: 2.38e-03, grad_scale: 32.0 2024-09-26 03:12:19,211 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.189e+02 1.302e+02 1.364e+02 1.437e+02 1.796e+02, threshold=2.729e+02, percent-clipped=0.0 2024-09-26 03:12:54,814 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2024-09-26 03:13:35,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=892943.3333333334, ans=0.2 2024-09-26 03:13:40,004 INFO [train.py:1198] (0/4) Epoch 50, batch 450, loss[loss=0.1529, ctc_loss=0.09621, cr_loss=0.2836, over 17259.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1188, cr_loss=0.3353, over 2990393.33 frames. ], batch size: 42, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:13:50,613 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2024-09-26 03:13:54,328 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=893036.6666666666, ans=0.125 2024-09-26 03:14:51,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=893176.6666666666, ans=0.0 2024-09-26 03:15:02,698 INFO [train.py:1198] (0/4) Epoch 50, batch 500, loss[loss=0.1903, ctc_loss=0.1238, cr_loss=0.3326, over 17216.00 frames. ], tot_loss[loss=0.1874, ctc_loss=0.1199, cr_loss=0.3375, over 3068591.69 frames. ], batch size: 55, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:15:05,803 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.155e+02 1.293e+02 1.377e+02 1.474e+02 1.981e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 03:15:30,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=893270.0, ans=0.025 2024-09-26 03:15:35,900 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-09-26 03:15:48,297 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=893316.6666666666, ans=0.0 2024-09-26 03:15:54,737 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=893363.3333333334, ans=0.2 2024-09-26 03:16:23,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=893410.0, ans=0.07 2024-09-26 03:16:24,114 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.77 vs. limit=10.0 2024-09-26 03:16:26,696 INFO [train.py:1198] (0/4) Epoch 50, batch 550, loss[loss=0.1922, ctc_loss=0.1236, cr_loss=0.343, over 17335.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1192, cr_loss=0.3366, over 3143238.43 frames. ], batch size: 51, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:17:17,801 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=893596.6666666666, ans=0.0 2024-09-26 03:17:26,673 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.73 vs. limit=10.0 2024-09-26 03:17:49,350 INFO [train.py:1198] (0/4) Epoch 50, batch 600, loss[loss=0.1515, ctc_loss=0.09264, cr_loss=0.2942, over 17104.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3342, over 3186527.42 frames. ], batch size: 40, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:17:52,579 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.148e+02 1.327e+02 1.372e+02 1.501e+02 3.825e+02, threshold=2.745e+02, percent-clipped=1.0 2024-09-26 03:18:15,491 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-26 03:18:21,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=893736.6666666666, ans=0.025 2024-09-26 03:18:27,779 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=893783.3333333334, ans=0.125 2024-09-26 03:18:29,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=893783.3333333334, ans=0.0 2024-09-26 03:18:33,596 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=8.0 2024-09-26 03:18:38,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=893830.0, ans=0.125 2024-09-26 03:18:48,366 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=893830.0, ans=0.2 2024-09-26 03:18:53,167 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=893830.0, ans=0.0 2024-09-26 03:18:58,450 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-09-26 03:19:12,060 INFO [train.py:1198] (0/4) Epoch 50, batch 650, loss[loss=0.1976, ctc_loss=0.129, cr_loss=0.343, over 17033.00 frames. ], tot_loss[loss=0.1862, ctc_loss=0.119, cr_loss=0.3361, over 3226873.86 frames. ], batch size: 52, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:19:30,208 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=893970.0, ans=0.125 2024-09-26 03:20:09,605 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=894063.3333333334, ans=0.0 2024-09-26 03:20:11,248 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=894063.3333333334, ans=0.0 2024-09-26 03:20:14,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=894063.3333333334, ans=0.0 2024-09-26 03:20:37,744 INFO [train.py:1198] (0/4) Epoch 50, batch 700, loss[loss=0.1735, ctc_loss=0.1107, cr_loss=0.3138, over 17310.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1178, cr_loss=0.3341, over 3241237.74 frames. ], batch size: 49, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:20:40,921 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.133e+02 1.326e+02 1.434e+02 1.550e+02 1.872e+02, threshold=2.869e+02, percent-clipped=0.0 2024-09-26 03:20:55,841 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894203.3333333334, ans=0.1 2024-09-26 03:20:59,324 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2024-09-26 03:21:18,045 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=894250.0, ans=0.0 2024-09-26 03:21:19,620 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=894250.0, ans=0.125 2024-09-26 03:21:28,082 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2024-09-26 03:21:29,674 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2024-09-26 03:21:34,777 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=894296.6666666666, ans=0.125 2024-09-26 03:21:50,824 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=894343.3333333334, ans=0.125 2024-09-26 03:22:00,169 INFO [train.py:1198] (0/4) Epoch 50, batch 750, loss[loss=0.186, ctc_loss=0.1175, cr_loss=0.3425, over 17088.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1173, cr_loss=0.3332, over 3269936.77 frames. ], batch size: 43, lr: 2.38e-03, grad_scale: 16.0 2024-09-26 03:22:13,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=894390.0, ans=0.2 2024-09-26 03:22:22,694 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=894436.6666666666, ans=0.125 2024-09-26 03:22:38,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=894483.3333333334, ans=0.1 2024-09-26 03:22:54,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=894530.0, ans=0.125 2024-09-26 03:23:19,132 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=894576.6666666666, ans=0.2 2024-09-26 03:23:22,024 INFO [train.py:1198] (0/4) Epoch 50, batch 800, loss[loss=0.2079, ctc_loss=0.1386, cr_loss=0.3464, over 11998.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1175, cr_loss=0.3335, over 3288862.72 frames. ], batch size: 123, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:23:25,275 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.307e+02 1.380e+02 1.476e+02 1.772e+02, threshold=2.760e+02, percent-clipped=0.0 2024-09-26 03:23:32,170 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=894623.3333333334, ans=0.2 2024-09-26 03:23:45,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=22.5 2024-09-26 03:24:28,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=894810.0, ans=0.2 2024-09-26 03:24:34,756 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-09-26 03:24:45,378 INFO [train.py:1198] (0/4) Epoch 50, batch 850, loss[loss=0.1931, ctc_loss=0.1264, cr_loss=0.3335, over 16750.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1174, cr_loss=0.3331, over 3293044.05 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:25:49,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894996.6666666666, ans=0.1 2024-09-26 03:25:56,049 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=895043.3333333334, ans=0.125 2024-09-26 03:26:08,944 INFO [train.py:1198] (0/4) Epoch 50, batch 900, loss[loss=0.1702, ctc_loss=0.1073, cr_loss=0.3143, over 17112.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1176, cr_loss=0.3337, over 3309785.21 frames. ], batch size: 49, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:26:09,301 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=895090.0, ans=0.0 2024-09-26 03:26:13,764 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.310e+02 1.400e+02 1.511e+02 3.836e+02, threshold=2.800e+02, percent-clipped=1.0 2024-09-26 03:26:17,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=895090.0, ans=0.125 2024-09-26 03:27:01,469 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=895230.0, ans=0.0 2024-09-26 03:27:04,713 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:27:04,764 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=895230.0, ans=0.0 2024-09-26 03:27:07,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=895230.0, ans=0.125 2024-09-26 03:27:24,790 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=15.0 2024-09-26 03:27:31,735 INFO [train.py:1198] (0/4) Epoch 50, batch 950, loss[loss=0.1764, ctc_loss=0.1127, cr_loss=0.3183, over 17147.00 frames. ], tot_loss[loss=0.1845, ctc_loss=0.1179, cr_loss=0.3332, over 3305731.06 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:27:38,491 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=895323.3333333334, ans=0.125 2024-09-26 03:27:38,640 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=895323.3333333334, ans=0.0 2024-09-26 03:27:43,913 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=15.0 2024-09-26 03:27:47,739 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=895370.0, ans=0.0 2024-09-26 03:27:49,331 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=895370.0, ans=0.0 2024-09-26 03:27:51,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=895370.0, ans=0.2 2024-09-26 03:27:57,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=895370.0, ans=0.125 2024-09-26 03:27:58,168 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=4.88 vs. limit=10.0 2024-09-26 03:28:17,493 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=895416.6666666666, ans=0.2 2024-09-26 03:28:48,024 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=895510.0, ans=0.2 2024-09-26 03:28:54,019 INFO [train.py:1198] (0/4) Epoch 50, batch 1000, loss[loss=0.2074, ctc_loss=0.1361, cr_loss=0.3564, over 16732.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.1181, cr_loss=0.3342, over 3325798.81 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:28:58,809 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.214e+02 1.309e+02 1.391e+02 1.502e+02 1.865e+02, threshold=2.782e+02, percent-clipped=0.0 2024-09-26 03:29:05,871 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=12.0 2024-09-26 03:29:08,942 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895603.3333333334, ans=0.1 2024-09-26 03:29:09,174 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-09-26 03:29:13,754 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=895603.3333333334, ans=0.0 2024-09-26 03:29:17,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=895603.3333333334, ans=0.0 2024-09-26 03:29:23,282 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=895603.3333333334, ans=0.125 2024-09-26 03:29:41,554 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=895650.0, ans=0.0 2024-09-26 03:29:43,866 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=22.5 2024-09-26 03:29:51,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=895696.6666666666, ans=0.125 2024-09-26 03:29:52,990 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=895696.6666666666, ans=0.0 2024-09-26 03:29:58,525 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2024-09-26 03:30:18,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=895790.0, ans=0.0 2024-09-26 03:30:19,368 INFO [train.py:1198] (0/4) Epoch 50, batch 1050, loss[loss=0.1758, ctc_loss=0.1119, cr_loss=0.3194, over 17302.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1176, cr_loss=0.3337, over 3338910.06 frames. ], batch size: 51, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:30:19,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=895790.0, ans=0.0 2024-09-26 03:30:48,724 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:30:52,031 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=895883.3333333334, ans=0.1 2024-09-26 03:31:09,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=895930.0, ans=0.1 2024-09-26 03:31:17,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=895930.0, ans=0.125 2024-09-26 03:31:32,846 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/checkpoint-192000.pt 2024-09-26 03:31:43,311 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.33 vs. limit=22.5 2024-09-26 03:31:44,418 INFO [train.py:1198] (0/4) Epoch 50, batch 1100, loss[loss=0.2002, ctc_loss=0.1294, cr_loss=0.3536, over 17150.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3348, over 3350772.42 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:31:49,233 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.142e+02 1.312e+02 1.377e+02 1.461e+02 2.179e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 03:32:16,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=896116.6666666666, ans=0.125 2024-09-26 03:32:20,144 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2024-09-26 03:32:42,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=896163.3333333334, ans=0.2 2024-09-26 03:33:06,912 INFO [train.py:1198] (0/4) Epoch 50, batch 1150, loss[loss=0.1905, ctc_loss=0.1187, cr_loss=0.359, over 17293.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.118, cr_loss=0.3344, over 3354455.88 frames. ], batch size: 51, lr: 2.37e-03, grad_scale: 8.0 2024-09-26 03:33:13,628 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=896256.6666666666, ans=0.2 2024-09-26 03:33:28,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.56 vs. limit=15.0 2024-09-26 03:33:47,210 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:34:14,884 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=896443.3333333334, ans=0.0 2024-09-26 03:34:17,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=896443.3333333334, ans=15.0 2024-09-26 03:34:29,986 INFO [train.py:1198] (0/4) Epoch 50, batch 1200, loss[loss=0.2067, ctc_loss=0.1321, cr_loss=0.3727, over 17353.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.118, cr_loss=0.3346, over 3360295.90 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:34:36,175 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.154e+02 1.309e+02 1.371e+02 1.493e+02 2.008e+02, threshold=2.741e+02, percent-clipped=0.0 2024-09-26 03:34:36,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=896490.0, ans=0.125 2024-09-26 03:34:38,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896490.0, ans=0.1 2024-09-26 03:34:39,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=896490.0, ans=0.0 2024-09-26 03:34:42,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=896490.0, ans=0.125 2024-09-26 03:35:05,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=896583.3333333334, ans=0.2 2024-09-26 03:35:52,908 INFO [train.py:1198] (0/4) Epoch 50, batch 1250, loss[loss=0.1944, ctc_loss=0.1236, cr_loss=0.3537, over 17146.00 frames. ], tot_loss[loss=0.1839, ctc_loss=0.1172, cr_loss=0.3332, over 3368291.64 frames. ], batch size: 48, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:36:12,672 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=896770.0, ans=0.2 2024-09-26 03:36:20,593 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=896770.0, ans=0.025 2024-09-26 03:37:15,375 INFO [train.py:1198] (0/4) Epoch 50, batch 1300, loss[loss=0.1974, ctc_loss=0.1276, cr_loss=0.3489, over 17213.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1173, cr_loss=0.3336, over 3372399.21 frames. ], batch size: 47, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:37:16,257 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2024-09-26 03:37:21,701 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.165e+02 1.310e+02 1.392e+02 1.499e+02 2.433e+02, threshold=2.784e+02, percent-clipped=0.0 2024-09-26 03:37:21,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=896956.6666666666, ans=0.125 2024-09-26 03:37:35,312 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=22.5 2024-09-26 03:38:17,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=897096.6666666666, ans=0.125 2024-09-26 03:38:21,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=897143.3333333334, ans=0.125 2024-09-26 03:38:38,312 INFO [train.py:1198] (0/4) Epoch 50, batch 1350, loss[loss=0.1818, ctc_loss=0.1161, cr_loss=0.3285, over 17328.00 frames. ], tot_loss[loss=0.1836, ctc_loss=0.117, cr_loss=0.3333, over 3375495.12 frames. ], batch size: 51, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:38:48,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=897190.0, ans=0.125 2024-09-26 03:38:51,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=897190.0, ans=0.0 2024-09-26 03:39:20,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=897283.3333333334, ans=0.125 2024-09-26 03:39:21,932 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=897283.3333333334, ans=0.025 2024-09-26 03:39:40,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=897330.0, ans=0.0 2024-09-26 03:39:50,268 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:39:55,736 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-09-26 03:40:01,286 INFO [train.py:1198] (0/4) Epoch 50, batch 1400, loss[loss=0.1722, ctc_loss=0.1095, cr_loss=0.3131, over 17068.00 frames. ], tot_loss[loss=0.1833, ctc_loss=0.1167, cr_loss=0.3329, over 3368979.78 frames. ], batch size: 46, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:40:07,585 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.145e+02 1.312e+02 1.402e+02 1.516e+02 2.766e+02, threshold=2.805e+02, percent-clipped=0.0 2024-09-26 03:40:19,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=897470.0, ans=0.125 2024-09-26 03:40:52,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=897563.3333333334, ans=0.0 2024-09-26 03:41:03,990 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2024-09-26 03:41:24,327 INFO [train.py:1198] (0/4) Epoch 50, batch 1450, loss[loss=0.1501, ctc_loss=0.0924, cr_loss=0.2886, over 17283.00 frames. ], tot_loss[loss=0.1815, ctc_loss=0.1154, cr_loss=0.3304, over 3370237.44 frames. ], batch size: 42, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:41:46,489 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=897703.3333333334, ans=0.125 2024-09-26 03:41:46,703 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2024-09-26 03:42:05,517 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=897750.0, ans=0.125 2024-09-26 03:42:26,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=897796.6666666666, ans=0.0 2024-09-26 03:42:40,269 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-09-26 03:42:46,931 INFO [train.py:1198] (0/4) Epoch 50, batch 1500, loss[loss=0.1934, ctc_loss=0.122, cr_loss=0.3575, over 17249.00 frames. ], tot_loss[loss=0.1823, ctc_loss=0.1161, cr_loss=0.3314, over 3363729.98 frames. ], batch size: 44, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:42:48,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=897890.0, ans=0.125 2024-09-26 03:42:53,353 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.088e+02 1.299e+02 1.380e+02 1.479e+02 2.541e+02, threshold=2.761e+02, percent-clipped=0.0 2024-09-26 03:44:09,794 INFO [train.py:1198] (0/4) Epoch 50, batch 1550, loss[loss=0.1586, ctc_loss=0.09993, cr_loss=0.2934, over 16746.00 frames. ], tot_loss[loss=0.1831, ctc_loss=0.1167, cr_loss=0.3321, over 3364148.20 frames. ], batch size: 37, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:44:11,788 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=898123.3333333334, ans=0.2 2024-09-26 03:44:47,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=898216.6666666666, ans=0.0 2024-09-26 03:44:54,302 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:44:55,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=898216.6666666666, ans=0.0 2024-09-26 03:45:05,695 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-26 03:45:35,384 INFO [train.py:1198] (0/4) Epoch 50, batch 1600, loss[loss=0.1637, ctc_loss=0.1028, cr_loss=0.3045, over 16685.00 frames. ], tot_loss[loss=0.1833, ctc_loss=0.1169, cr_loss=0.3321, over 3358932.66 frames. ], batch size: 37, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:45:41,792 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.345e+02 1.418e+02 1.518e+02 2.144e+02, threshold=2.837e+02, percent-clipped=0.0 2024-09-26 03:46:37,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898496.6666666666, ans=0.1 2024-09-26 03:46:53,353 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=898543.3333333334, ans=0.125 2024-09-26 03:46:57,840 INFO [train.py:1198] (0/4) Epoch 50, batch 1650, loss[loss=0.2255, ctc_loss=0.1458, cr_loss=0.3985, over 17234.00 frames. ], tot_loss[loss=0.1834, ctc_loss=0.117, cr_loss=0.332, over 3355415.07 frames. ], batch size: 55, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:47:13,134 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2024-09-26 03:47:22,569 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:47:48,418 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2024-09-26 03:47:49,455 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=898730.0, ans=0.2 2024-09-26 03:47:52,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=898730.0, ans=10.0 2024-09-26 03:48:11,579 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=898776.6666666666, ans=0.125 2024-09-26 03:48:20,713 INFO [train.py:1198] (0/4) Epoch 50, batch 1700, loss[loss=0.2177, ctc_loss=0.142, cr_loss=0.3789, over 15254.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1174, cr_loss=0.3336, over 3356822.96 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 03:48:27,051 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.137e+02 1.297e+02 1.382e+02 1.469e+02 2.239e+02, threshold=2.764e+02, percent-clipped=0.0 2024-09-26 03:48:36,818 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=898870.0, ans=0.125 2024-09-26 03:48:45,249 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=898870.0, ans=0.95 2024-09-26 03:48:59,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=898916.6666666666, ans=0.125 2024-09-26 03:49:08,911 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=898963.3333333334, ans=10.0 2024-09-26 03:49:41,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=899056.6666666666, ans=0.125 2024-09-26 03:49:42,837 INFO [train.py:1198] (0/4) Epoch 50, batch 1750, loss[loss=0.1469, ctc_loss=0.09105, cr_loss=0.2793, over 17045.00 frames. ], tot_loss[loss=0.185, ctc_loss=0.1179, cr_loss=0.335, over 3360695.41 frames. ], batch size: 39, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:49:56,247 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=22.5 2024-09-26 03:49:56,426 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=22.5 2024-09-26 03:50:07,356 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2024-09-26 03:50:10,255 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=899103.3333333334, ans=0.0 2024-09-26 03:50:21,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=899150.0, ans=0.0 2024-09-26 03:50:57,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=899243.3333333334, ans=0.125 2024-09-26 03:51:05,455 INFO [train.py:1198] (0/4) Epoch 50, batch 1800, loss[loss=0.1619, ctc_loss=0.1004, cr_loss=0.3073, over 17207.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1177, cr_loss=0.3345, over 3370192.11 frames. ], batch size: 41, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:51:09,728 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2024-09-26 03:51:10,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=899290.0, ans=0.125 2024-09-26 03:51:13,299 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.324e+02 1.397e+02 1.479e+02 2.541e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-26 03:51:15,182 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=899290.0, ans=0.0 2024-09-26 03:51:20,363 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 03:52:28,318 INFO [train.py:1198] (0/4) Epoch 50, batch 1850, loss[loss=0.1791, ctc_loss=0.1109, cr_loss=0.341, over 17234.00 frames. ], tot_loss[loss=0.1836, ctc_loss=0.117, cr_loss=0.3331, over 3367658.81 frames. ], batch size: 44, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:52:49,677 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=899570.0, ans=0.05 2024-09-26 03:52:59,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=899616.6666666666, ans=0.1 2024-09-26 03:53:00,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=899616.6666666666, ans=0.125 2024-09-26 03:53:32,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=899663.3333333334, ans=0.125 2024-09-26 03:53:51,423 INFO [train.py:1198] (0/4) Epoch 50, batch 1900, loss[loss=0.1871, ctc_loss=0.1204, cr_loss=0.3333, over 17015.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1177, cr_loss=0.3348, over 3359591.35 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:53:58,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899756.6666666666, ans=0.1 2024-09-26 03:53:59,444 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.136e+02 1.288e+02 1.384e+02 1.484e+02 2.274e+02, threshold=2.768e+02, percent-clipped=0.0 2024-09-26 03:54:25,058 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=899850.0, ans=0.125 2024-09-26 03:54:32,952 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=899850.0, ans=0.2 2024-09-26 03:54:37,661 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=899850.0, ans=0.0 2024-09-26 03:54:42,448 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=899896.6666666666, ans=0.025 2024-09-26 03:55:14,171 INFO [train.py:1198] (0/4) Epoch 50, batch 1950, loss[loss=0.1766, ctc_loss=0.1101, cr_loss=0.3326, over 17065.00 frames. ], tot_loss[loss=0.1848, ctc_loss=0.1178, cr_loss=0.3348, over 3367293.00 frames. ], batch size: 46, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:55:52,345 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=900083.3333333334, ans=0.5 2024-09-26 03:56:13,313 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=22.5 2024-09-26 03:56:18,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=900130.0, ans=0.0 2024-09-26 03:56:30,277 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=900176.6666666666, ans=0.125 2024-09-26 03:56:30,681 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=22.5 2024-09-26 03:56:31,790 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=900176.6666666666, ans=0.0 2024-09-26 03:56:39,350 INFO [train.py:1198] (0/4) Epoch 50, batch 2000, loss[loss=0.1609, ctc_loss=0.1026, cr_loss=0.2916, over 16665.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3366, over 3350657.59 frames. ], batch size: 37, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:56:47,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=900223.3333333334, ans=0.0 2024-09-26 03:56:48,774 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.198e+02 1.333e+02 1.399e+02 1.506e+02 2.393e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-26 03:56:52,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900223.3333333334, ans=0.125 2024-09-26 03:57:13,142 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=900316.6666666666, ans=0.0 2024-09-26 03:57:25,625 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=900363.3333333334, ans=0.0 2024-09-26 03:57:58,653 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=900410.0, ans=0.0 2024-09-26 03:58:01,483 INFO [train.py:1198] (0/4) Epoch 50, batch 2050, loss[loss=0.1578, ctc_loss=0.09705, cr_loss=0.3039, over 17028.00 frames. ], tot_loss[loss=0.1847, ctc_loss=0.1179, cr_loss=0.3344, over 3356256.00 frames. ], batch size: 39, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:58:05,581 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2024-09-26 03:58:08,715 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2024-09-26 03:58:11,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=900456.6666666666, ans=0.125 2024-09-26 03:58:17,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=900503.3333333334, ans=0.125 2024-09-26 03:58:18,207 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=900503.3333333334, ans=10.0 2024-09-26 03:58:19,097 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=900503.3333333334, ans=0.125 2024-09-26 03:58:30,303 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=900503.3333333334, ans=0.0 2024-09-26 03:58:34,887 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=900550.0, ans=0.1 2024-09-26 03:58:59,143 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-26 03:59:00,604 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=900596.6666666666, ans=0.2 2024-09-26 03:59:06,870 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=900643.3333333334, ans=0.125 2024-09-26 03:59:10,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900643.3333333334, ans=0.1 2024-09-26 03:59:12,093 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=900643.3333333334, ans=0.125 2024-09-26 03:59:12,408 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2024-09-26 03:59:24,073 INFO [train.py:1198] (0/4) Epoch 50, batch 2100, loss[loss=0.1586, ctc_loss=0.09999, cr_loss=0.2932, over 17190.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3345, over 3346592.67 frames. ], batch size: 41, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 03:59:30,836 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=900690.0, ans=0.125 2024-09-26 03:59:31,001 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=900690.0, ans=0.125 2024-09-26 03:59:33,715 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.174e+02 1.304e+02 1.389e+02 1.531e+02 2.569e+02, threshold=2.777e+02, percent-clipped=0.0 2024-09-26 03:59:46,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=900736.6666666666, ans=0.125 2024-09-26 04:00:09,438 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=900783.3333333334, ans=0.125 2024-09-26 04:00:45,639 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=900923.3333333334, ans=0.0 2024-09-26 04:00:46,960 INFO [train.py:1198] (0/4) Epoch 50, batch 2150, loss[loss=0.183, ctc_loss=0.1157, cr_loss=0.3366, over 17022.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1178, cr_loss=0.3344, over 3354543.74 frames. ], batch size: 44, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:00:55,172 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=900923.3333333334, ans=0.125 2024-09-26 04:00:58,488 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=900923.3333333334, ans=0.0 2024-09-26 04:01:03,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=900970.0, ans=0.125 2024-09-26 04:01:19,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2024-09-26 04:01:29,961 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=901016.6666666666, ans=0.2 2024-09-26 04:01:50,363 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.62 vs. limit=10.0 2024-09-26 04:01:57,619 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901110.0, ans=0.1 2024-09-26 04:02:10,066 INFO [train.py:1198] (0/4) Epoch 50, batch 2200, loss[loss=0.1671, ctc_loss=0.1073, cr_loss=0.2991, over 16963.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1184, cr_loss=0.3355, over 3352788.53 frames. ], batch size: 42, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:02:19,445 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.179e+02 1.317e+02 1.377e+02 1.488e+02 2.594e+02, threshold=2.754e+02, percent-clipped=0.0 2024-09-26 04:02:25,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=901203.3333333334, ans=0.125 2024-09-26 04:02:37,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=901203.3333333334, ans=0.0 2024-09-26 04:02:42,286 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2024-09-26 04:02:44,178 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-26 04:03:12,108 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2024-09-26 04:03:22,774 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=901343.3333333334, ans=0.0 2024-09-26 04:03:32,032 INFO [train.py:1198] (0/4) Epoch 50, batch 2250, loss[loss=0.1448, ctc_loss=0.08801, cr_loss=0.2838, over 17116.00 frames. ], tot_loss[loss=0.1859, ctc_loss=0.1187, cr_loss=0.3358, over 3354436.32 frames. ], batch size: 40, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:04:31,075 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=901530.0, ans=0.1 2024-09-26 04:04:45,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=901576.6666666666, ans=0.125 2024-09-26 04:04:47,268 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=901576.6666666666, ans=0.0 2024-09-26 04:04:54,691 INFO [train.py:1198] (0/4) Epoch 50, batch 2300, loss[loss=0.174, ctc_loss=0.1094, cr_loss=0.3227, over 17157.00 frames. ], tot_loss[loss=0.1854, ctc_loss=0.1183, cr_loss=0.3357, over 3353416.43 frames. ], batch size: 45, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:05:04,460 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.310e+02 1.388e+02 1.508e+02 2.945e+02, threshold=2.775e+02, percent-clipped=1.0 2024-09-26 04:05:18,596 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=901670.0, ans=0.025 2024-09-26 04:05:23,135 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=901670.0, ans=0.125 2024-09-26 04:05:29,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=901716.6666666666, ans=0.0 2024-09-26 04:05:38,945 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=901716.6666666666, ans=0.125 2024-09-26 04:05:42,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=901716.6666666666, ans=0.2 2024-09-26 04:05:51,848 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=901763.3333333334, ans=0.125 2024-09-26 04:06:03,245 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=901810.0, ans=0.5 2024-09-26 04:06:07,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=901810.0, ans=0.0 2024-09-26 04:06:19,857 INFO [train.py:1198] (0/4) Epoch 50, batch 2350, loss[loss=0.1982, ctc_loss=0.1257, cr_loss=0.3624, over 17023.00 frames. ], tot_loss[loss=0.1865, ctc_loss=0.1191, cr_loss=0.3373, over 3352224.82 frames. ], batch size: 52, lr: 2.37e-03, grad_scale: 16.0 2024-09-26 04:06:20,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=901856.6666666666, ans=0.125 2024-09-26 04:06:25,134 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:06:50,558 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=901950.0, ans=0.125 2024-09-26 04:06:59,163 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2024-09-26 04:07:26,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=902043.3333333334, ans=0.125 2024-09-26 04:07:26,963 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=902043.3333333334, ans=0.125 2024-09-26 04:07:39,159 INFO [train.py:1198] (0/4) Epoch 50, batch 2400, loss[loss=0.1937, ctc_loss=0.1242, cr_loss=0.3472, over 17224.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.1189, cr_loss=0.3371, over 3353412.70 frames. ], batch size: 55, lr: 2.37e-03, grad_scale: 32.0 2024-09-26 04:07:51,372 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.173e+02 1.310e+02 1.369e+02 1.448e+02 2.051e+02, threshold=2.738e+02, percent-clipped=0.0 2024-09-26 04:08:09,423 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=902136.6666666666, ans=0.2 2024-09-26 04:08:46,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=902276.6666666666, ans=0.125 2024-09-26 04:09:02,440 INFO [train.py:1198] (0/4) Epoch 50, batch 2450, loss[loss=0.2025, ctc_loss=0.1315, cr_loss=0.3552, over 17288.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1184, cr_loss=0.3362, over 3354554.84 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:09:02,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=902323.3333333334, ans=0.09899494936611666 2024-09-26 04:09:38,075 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.29 vs. limit=10.0 2024-09-26 04:09:54,708 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=902463.3333333334, ans=0.1 2024-09-26 04:10:16,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=902510.0, ans=0.2 2024-09-26 04:10:21,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=902510.0, ans=0.125 2024-09-26 04:10:27,376 INFO [train.py:1198] (0/4) Epoch 50, batch 2500, loss[loss=0.1925, ctc_loss=0.1237, cr_loss=0.3443, over 17351.00 frames. ], tot_loss[loss=0.1863, ctc_loss=0.119, cr_loss=0.3366, over 3355103.11 frames. ], batch size: 48, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:10:38,429 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.191e+02 1.330e+02 1.408e+02 1.529e+02 2.285e+02, threshold=2.816e+02, percent-clipped=0.0 2024-09-26 04:10:43,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=902603.3333333334, ans=0.125 2024-09-26 04:11:49,379 INFO [train.py:1198] (0/4) Epoch 50, batch 2550, loss[loss=0.1896, ctc_loss=0.1227, cr_loss=0.3349, over 17216.00 frames. ], tot_loss[loss=0.1861, ctc_loss=0.1189, cr_loss=0.3363, over 3362678.09 frames. ], batch size: 47, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:11:56,061 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=902790.0, ans=0.0 2024-09-26 04:12:22,134 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=902883.3333333334, ans=0.0 2024-09-26 04:12:26,962 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=902883.3333333334, ans=0.0 2024-09-26 04:12:45,663 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=902930.0, ans=0.125 2024-09-26 04:12:58,258 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=902976.6666666666, ans=0.2 2024-09-26 04:12:59,706 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=902976.6666666666, ans=0.0 2024-09-26 04:13:12,205 INFO [train.py:1198] (0/4) Epoch 50, batch 2600, loss[loss=0.1588, ctc_loss=0.1037, cr_loss=0.2753, over 16970.00 frames. ], tot_loss[loss=0.1855, ctc_loss=0.1185, cr_loss=0.3354, over 3363302.96 frames. ], batch size: 42, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:13:14,184 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=903023.3333333334, ans=0.125 2024-09-26 04:13:25,160 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.157e+02 1.312e+02 1.402e+02 1.467e+02 1.641e+02, threshold=2.804e+02, percent-clipped=0.0 2024-09-26 04:13:36,781 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:13:43,746 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=903116.6666666666, ans=15.0 2024-09-26 04:13:46,388 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=903116.6666666666, ans=0.025 2024-09-26 04:13:56,247 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=903116.6666666666, ans=0.125 2024-09-26 04:13:57,686 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=903116.6666666666, ans=0.2 2024-09-26 04:14:11,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=903163.3333333334, ans=0.125 2024-09-26 04:14:34,860 INFO [train.py:1198] (0/4) Epoch 50, batch 2650, loss[loss=0.1857, ctc_loss=0.1193, cr_loss=0.3319, over 17304.00 frames. ], tot_loss[loss=0.186, ctc_loss=0.1188, cr_loss=0.3363, over 3366762.77 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:14:49,495 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=903303.3333333334, ans=0.2 2024-09-26 04:15:00,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=903303.3333333334, ans=0.0 2024-09-26 04:15:41,316 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:15:57,315 INFO [train.py:1198] (0/4) Epoch 50, batch 2700, loss[loss=0.1996, ctc_loss=0.1316, cr_loss=0.3403, over 17081.00 frames. ], tot_loss[loss=0.1857, ctc_loss=0.1185, cr_loss=0.3362, over 3366287.40 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:16:09,702 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=903490.0, ans=0.0 2024-09-26 04:16:12,404 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.141e+02 1.330e+02 1.437e+02 1.530e+02 2.387e+02, threshold=2.875e+02, percent-clipped=0.0 2024-09-26 04:16:30,360 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=903583.3333333334, ans=0.125 2024-09-26 04:16:30,373 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903583.3333333334, ans=0.1 2024-09-26 04:16:30,454 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=903583.3333333334, ans=0.0 2024-09-26 04:16:30,824 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.39 vs. limit=10.0 2024-09-26 04:16:36,805 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=903583.3333333334, ans=0.0 2024-09-26 04:16:48,271 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=903630.0, ans=0.0 2024-09-26 04:16:49,622 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=903630.0, ans=0.125 2024-09-26 04:16:59,446 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=903630.0, ans=0.2 2024-09-26 04:17:20,229 INFO [train.py:1198] (0/4) Epoch 50, batch 2750, loss[loss=0.1762, ctc_loss=0.1101, cr_loss=0.3305, over 17058.00 frames. ], tot_loss[loss=0.1851, ctc_loss=0.1181, cr_loss=0.3349, over 3373821.74 frames. ], batch size: 39, lr: 2.36e-03, grad_scale: 8.0 2024-09-26 04:17:28,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=903723.3333333334, ans=0.09899494936611666 2024-09-26 04:18:09,494 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=903863.3333333334, ans=0.125 2024-09-26 04:18:41,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=903956.6666666666, ans=0.125 2024-09-26 04:18:42,702 INFO [train.py:1198] (0/4) Epoch 50, batch 2800, loss[loss=0.1875, ctc_loss=0.1193, cr_loss=0.3406, over 17072.00 frames. ], tot_loss[loss=0.1843, ctc_loss=0.1176, cr_loss=0.3337, over 3370737.71 frames. ], batch size: 46, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:18:44,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=903956.6666666666, ans=0.125 2024-09-26 04:18:52,415 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=903956.6666666666, ans=0.0 2024-09-26 04:18:53,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903956.6666666666, ans=0.1 2024-09-26 04:18:57,718 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.167e+02 1.318e+02 1.422e+02 1.529e+02 1.768e+02, threshold=2.845e+02, percent-clipped=0.0 2024-09-26 04:19:20,556 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=904050.0, ans=0.0 2024-09-26 04:20:04,786 INFO [train.py:1198] (0/4) Epoch 50, batch 2850, loss[loss=0.1894, ctc_loss=0.1213, cr_loss=0.3408, over 17212.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1174, cr_loss=0.3337, over 3376203.21 frames. ], batch size: 55, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:21:07,228 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=904330.0, ans=0.1 2024-09-26 04:21:23,281 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=904376.6666666666, ans=0.0 2024-09-26 04:21:29,404 INFO [train.py:1198] (0/4) Epoch 50, batch 2900, loss[loss=0.2282, ctc_loss=0.1445, cr_loss=0.4186, over 17042.00 frames. ], tot_loss[loss=0.1832, ctc_loss=0.1167, cr_loss=0.3322, over 3370770.62 frames. ], batch size: 52, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:21:42,285 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.130e+02 1.318e+02 1.386e+02 1.483e+02 2.505e+02, threshold=2.771e+02, percent-clipped=0.0 2024-09-26 04:21:48,765 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=904470.0, ans=0.035 2024-09-26 04:22:03,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=904516.6666666666, ans=0.2 2024-09-26 04:22:05,436 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.36 vs. limit=6.0 2024-09-26 04:22:10,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=904516.6666666666, ans=15.0 2024-09-26 04:22:15,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=904563.3333333334, ans=0.035 2024-09-26 04:22:52,048 INFO [train.py:1198] (0/4) Epoch 50, batch 2950, loss[loss=0.1905, ctc_loss=0.121, cr_loss=0.3475, over 17157.00 frames. ], tot_loss[loss=0.1834, ctc_loss=0.1169, cr_loss=0.3324, over 3371810.27 frames. ], batch size: 45, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:23:34,735 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=22.5 2024-09-26 04:24:14,376 INFO [train.py:1198] (0/4) Epoch 50, batch 3000, loss[loss=0.1676, ctc_loss=0.105, cr_loss=0.3129, over 17107.00 frames. ], tot_loss[loss=0.1835, ctc_loss=0.1169, cr_loss=0.3329, over 3366140.38 frames. ], batch size: 40, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:24:14,377 INFO [train.py:1221] (0/4) Computing validation loss 2024-09-26 04:24:30,367 INFO [train.py:1230] (0/4) Epoch 50, validation: loss=0.03495, ctc_loss=0.03495, cr_loss=1.037e-14, over 944034.00 frames. 2024-09-26 04:24:30,368 INFO [train.py:1231] (0/4) Maximum memory allocated so far is 21211MB 2024-09-26 04:24:42,934 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.332e+02 1.410e+02 1.508e+02 3.404e+02, threshold=2.821e+02, percent-clipped=1.0 2024-09-26 04:24:54,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=904936.6666666666, ans=0.0 2024-09-26 04:24:59,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=904936.6666666666, ans=0.125 2024-09-26 04:25:28,926 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=905030.0, ans=0.125 2024-09-26 04:25:48,679 INFO [train.py:1198] (0/4) Epoch 50, batch 3050, loss[loss=0.1889, ctc_loss=0.1203, cr_loss=0.3434, over 17289.00 frames. ], tot_loss[loss=0.184, ctc_loss=0.1172, cr_loss=0.3337, over 3353700.05 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:25:58,749 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-09-26 04:26:38,133 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=905263.3333333334, ans=0.125 2024-09-26 04:26:40,219 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=22.5 2024-09-26 04:27:05,340 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-09-26 04:27:09,373 INFO [train.py:1198] (0/4) Epoch 50, batch 3100, loss[loss=0.176, ctc_loss=0.1109, cr_loss=0.3252, over 17289.00 frames. ], tot_loss[loss=0.1846, ctc_loss=0.1178, cr_loss=0.3341, over 3351770.02 frames. ], batch size: 49, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:27:11,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=905356.6666666666, ans=0.125 2024-09-26 04:27:15,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=905356.6666666666, ans=0.0 2024-09-26 04:27:21,791 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.114e+02 1.333e+02 1.400e+02 1.486e+02 1.966e+02, threshold=2.799e+02, percent-clipped=0.0 2024-09-26 04:27:22,145 INFO [scaling.py:1120] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-26 04:28:03,508 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=905496.6666666666, ans=0.125 2024-09-26 04:28:25,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=905543.3333333334, ans=0.015 2024-09-26 04:28:27,202 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=905543.3333333334, ans=0.125 2024-09-26 04:28:29,961 INFO [train.py:1198] (0/4) Epoch 50, batch 3150, loss[loss=0.1957, ctc_loss=0.1276, cr_loss=0.3406, over 17210.00 frames. ], tot_loss[loss=0.1838, ctc_loss=0.1173, cr_loss=0.3326, over 3358760.39 frames. ], batch size: 47, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:28:45,949 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=905636.6666666666, ans=0.0 2024-09-26 04:28:53,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=905636.6666666666, ans=0.025 2024-09-26 04:29:11,053 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=905683.3333333334, ans=0.125 2024-09-26 04:29:12,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=905683.3333333334, ans=0.125 2024-09-26 04:29:20,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=905730.0, ans=0.125 2024-09-26 04:29:42,719 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=905776.6666666666, ans=0.1 2024-09-26 04:29:48,815 INFO [train.py:1198] (0/4) Epoch 50, batch 3200, loss[loss=0.1921, ctc_loss=0.1217, cr_loss=0.3522, over 17140.00 frames. ], tot_loss[loss=0.1837, ctc_loss=0.1172, cr_loss=0.3326, over 3356887.51 frames. ], batch size: 48, lr: 2.36e-03, grad_scale: 32.0 2024-09-26 04:29:50,678 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=905823.3333333334, ans=0.125 2024-09-26 04:30:01,215 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.176e+02 1.309e+02 1.397e+02 1.487e+02 2.037e+02, threshold=2.793e+02, percent-clipped=0.0 2024-09-26 04:30:04,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=905870.0, ans=0.09899494936611666 2024-09-26 04:31:07,458 INFO [train.py:1198] (0/4) Epoch 50, batch 3250, loss[loss=0.1653, ctc_loss=0.1039, cr_loss=0.3069, over 17163.00 frames. ], tot_loss[loss=0.1825, ctc_loss=0.1163, cr_loss=0.3311, over 3360925.55 frames. ], batch size: 45, lr: 2.36e-03, grad_scale: 32.0 2024-09-26 04:31:07,815 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=906056.6666666666, ans=0.0 2024-09-26 04:31:53,334 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=906150.0, ans=0.125 2024-09-26 04:32:00,030 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-09-26 04:32:15,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=906243.3333333334, ans=0.0 2024-09-26 04:32:27,560 INFO [train.py:1198] (0/4) Epoch 50, batch 3300, loss[loss=0.1476, ctc_loss=0.09147, cr_loss=0.2804, over 16720.00 frames. ], tot_loss[loss=0.1834, ctc_loss=0.117, cr_loss=0.3323, over 3344772.88 frames. ], batch size: 37, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:32:32,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=906290.0, ans=0.125 2024-09-26 04:32:41,662 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.177e+02 1.281e+02 1.376e+02 1.500e+02 2.072e+02, threshold=2.752e+02, percent-clipped=0.0 2024-09-26 04:33:10,212 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=906383.3333333334, ans=0.125 2024-09-26 04:33:19,696 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=906430.0, ans=0.0 2024-09-26 04:33:46,251 INFO [train.py:1198] (0/4) Epoch 50, batch 3350, loss[loss=0.1797, ctc_loss=0.1141, cr_loss=0.3279, over 17077.00 frames. ], tot_loss[loss=0.183, ctc_loss=0.1166, cr_loss=0.3318, over 3355033.85 frames. ], batch size: 46, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:33:52,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=906523.3333333334, ans=0.0 2024-09-26 04:33:54,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=906523.3333333334, ans=0.07 2024-09-26 04:34:00,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906570.0, ans=0.1 2024-09-26 04:34:14,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=906570.0, ans=0.0 2024-09-26 04:34:28,147 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=906616.6666666666, ans=0.2 2024-09-26 04:34:43,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=906663.3333333334, ans=0.2 2024-09-26 04:34:59,411 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906710.0, ans=0.1 2024-09-26 04:35:06,874 INFO [train.py:1198] (0/4) Epoch 50, batch 3400, loss[loss=0.2011, ctc_loss=0.1302, cr_loss=0.3541, over 17296.00 frames. ], tot_loss[loss=0.1839, ctc_loss=0.1172, cr_loss=0.3333, over 3348953.84 frames. ], batch size: 46, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:35:20,635 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.200e+02 1.315e+02 1.394e+02 1.507e+02 2.409e+02, threshold=2.789e+02, percent-clipped=0.0 2024-09-26 04:35:20,940 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=906803.3333333334, ans=0.125 2024-09-26 04:35:44,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=906850.0, ans=0.0 2024-09-26 04:35:49,710 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=22.5 2024-09-26 04:35:50,717 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=906850.0, ans=0.125 2024-09-26 04:35:58,473 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=906896.6666666666, ans=0.0 2024-09-26 04:36:11,034 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=906943.3333333334, ans=0.125 2024-09-26 04:36:24,726 INFO [train.py:1198] (0/4) Epoch 50, batch 3450, loss[loss=0.1317, ctc_loss=0.0832, cr_loss=0.2427, over 17086.00 frames. ], tot_loss[loss=0.183, ctc_loss=0.1166, cr_loss=0.3322, over 3357611.93 frames. ], batch size: 40, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:36:42,863 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=12.0 2024-09-26 04:36:53,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=907036.6666666666, ans=0.125 2024-09-26 04:37:07,242 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=907083.3333333334, ans=0.0 2024-09-26 04:37:10,419 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=907083.3333333334, ans=0.0 2024-09-26 04:37:44,970 INFO [train.py:1198] (0/4) Epoch 50, batch 3500, loss[loss=0.1987, ctc_loss=0.1274, cr_loss=0.3564, over 17020.00 frames. ], tot_loss[loss=0.1842, ctc_loss=0.1175, cr_loss=0.3336, over 3347290.62 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:37:48,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=907223.3333333334, ans=0.0 2024-09-26 04:37:48,292 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=907223.3333333334, ans=0.0 2024-09-26 04:37:58,718 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.132e+02 1.340e+02 1.405e+02 1.556e+02 2.203e+02, threshold=2.811e+02, percent-clipped=0.0 2024-09-26 04:38:40,237 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=907363.3333333334, ans=0.04949747468305833 2024-09-26 04:38:49,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=907410.0, ans=0.1 2024-09-26 04:38:51,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=907410.0, ans=0.125 2024-09-26 04:39:05,129 INFO [train.py:1198] (0/4) Epoch 50, batch 3550, loss[loss=0.1855, ctc_loss=0.1164, cr_loss=0.3455, over 16945.00 frames. ], tot_loss[loss=0.1849, ctc_loss=0.118, cr_loss=0.3346, over 3352718.86 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:39:27,028 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=907503.3333333334, ans=0.2 2024-09-26 04:39:35,169 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=22.5 2024-09-26 04:39:37,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=907550.0, ans=0.125 2024-09-26 04:39:39,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=907550.0, ans=0.125 2024-09-26 04:39:48,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=907550.0, ans=0.125 2024-09-26 04:40:21,529 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=907690.0, ans=0.1 2024-09-26 04:40:22,839 INFO [train.py:1198] (0/4) Epoch 50, batch 3600, loss[loss=0.1509, ctc_loss=0.09392, cr_loss=0.2851, over 16237.00 frames. ], tot_loss[loss=0.1838, ctc_loss=0.1172, cr_loss=0.3329, over 3357433.38 frames. ], batch size: 36, lr: 2.36e-03, grad_scale: 32.0 2024-09-26 04:40:36,919 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.188e+02 1.290e+02 1.354e+02 1.454e+02 2.078e+02, threshold=2.707e+02, percent-clipped=0.0 2024-09-26 04:40:55,784 INFO [scaling.py:1024] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2024-09-26 04:40:56,390 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=907783.3333333334, ans=0.2 2024-09-26 04:41:03,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=907783.3333333334, ans=0.125 2024-09-26 04:41:07,856 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=15.0 2024-09-26 04:41:27,532 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907876.6666666666, ans=0.1 2024-09-26 04:41:31,139 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=22.5 2024-09-26 04:41:32,783 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-09-26 04:41:42,942 INFO [train.py:1198] (0/4) Epoch 50, batch 3650, loss[loss=0.1377, ctc_loss=0.08464, cr_loss=0.2653, over 17116.00 frames. ], tot_loss[loss=0.1837, ctc_loss=0.1171, cr_loss=0.3327, over 3351024.59 frames. ], batch size: 40, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:42:48,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=908110.0, ans=0.125 2024-09-26 04:43:01,652 INFO [train.py:1198] (0/4) Epoch 50, batch 3700, loss[loss=0.1667, ctc_loss=0.1061, cr_loss=0.3026, over 17132.00 frames. ], tot_loss[loss=0.1841, ctc_loss=0.1175, cr_loss=0.3332, over 3350096.05 frames. ], batch size: 40, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:43:17,257 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.236e+02 1.309e+02 1.373e+02 1.463e+02 1.965e+02, threshold=2.746e+02, percent-clipped=0.0 2024-09-26 04:43:29,352 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2024-09-26 04:43:36,252 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=908250.0, ans=0.0 2024-09-26 04:43:54,350 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=908296.6666666666, ans=15.0 2024-09-26 04:44:18,174 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=908343.3333333334, ans=0.125 2024-09-26 04:44:21,192 INFO [train.py:1198] (0/4) Epoch 50, batch 3750, loss[loss=0.1887, ctc_loss=0.1177, cr_loss=0.3549, over 17195.00 frames. ], tot_loss[loss=0.1858, ctc_loss=0.1187, cr_loss=0.3355, over 3347239.28 frames. ], batch size: 45, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:44:51,123 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=908483.3333333334, ans=0.125 2024-09-26 04:45:11,545 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=908530.0, ans=0.0 2024-09-26 04:45:33,618 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=908576.6666666666, ans=0.125 2024-09-26 04:45:39,937 INFO [train.py:1198] (0/4) Epoch 50, batch 3800, loss[loss=0.1618, ctc_loss=0.1006, cr_loss=0.3057, over 17071.00 frames. ], tot_loss[loss=0.1853, ctc_loss=0.1184, cr_loss=0.3346, over 3330359.73 frames. ], batch size: 39, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:45:40,221 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=908623.3333333334, ans=0.125 2024-09-26 04:45:45,301 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-09-26 04:45:55,610 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.149e+02 1.341e+02 1.416e+02 1.505e+02 2.139e+02, threshold=2.833e+02, percent-clipped=0.0 2024-09-26 04:46:03,631 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=908670.0, ans=0.125 2024-09-26 04:46:06,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=908670.0, ans=0.1 2024-09-26 04:46:13,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=908716.6666666666, ans=0.125 2024-09-26 04:46:23,275 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2024-09-26 04:46:38,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=908763.3333333334, ans=0.125 2024-09-26 04:46:45,128 INFO [scaling.py:1024] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-09-26 04:46:58,241 INFO [train.py:1198] (0/4) Epoch 50, batch 3850, loss[loss=0.1747, ctc_loss=0.1116, cr_loss=0.3158, over 17290.00 frames. ], tot_loss[loss=0.1866, ctc_loss=0.1194, cr_loss=0.336, over 3298824.51 frames. ], batch size: 51, lr: 2.36e-03, grad_scale: 16.0 2024-09-26 04:48:10,186 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp-cr-loss-scale-0.2-time-mask-ratio-2.5-scaled-masked-1-4/epoch-50.pt 2024-09-26 04:48:11,945 INFO [train.py:1496] (0/4) Done!